L597: Structural Data Mining and Modeling (Fall 2005)
Course Description | Grade | Course Outline | Resources
Lecture: Tue Tue
1p-2:15p & 2:30p-3:45p, LI 001
Office hours: Mon 3-4p,
LI 019 (Katy), Fri 2-3p, LI 020 (Weimao)
Prerequisites: L401 or equivalent
Class Listserver: katy_L597-L@listserv.indiana.edu
Class Web Page: http://ella.slis.indiana.edu/~katy/L597-Fall05
Project Handin Webpage: http://ella.slis.indiana.edu/~katy/handin/L597-F05/cgi/handinlogin.cgi
Web Resources: MS Excel Primer, Pajek tutorial, Perl Scripts
Recorded Talks: Information Processing Talks 2002-2003, Sackler Colloquium on Mapping Knowledge Domains Talks, Digital Biology Talks, Complexity Digest 2000.31,
Talk Series on Networks and Complex Systems, Spring 05, Fall 04.
Students will learn how to frame the research question, collect the data, run the analysis, and interpret the results. In addition, they will learn how to design and evaluate descriptive and predictive models of diverse complex networks to improve their understanding of the underlying principles.
Upon taking this course students will be able to analyze and describe real networks (power grids, WWW, social networks, etc.) as well as relevant phenomena such as disease propagation, search, organizational performance, social power, and the diffusion of innovations.
Course Format
A combination of lectures and 4-5
labs will be utilized to introduce material from such diverse domains as mathematics
(graph theory, matrix algebra, chaos theory), computer science (data mining,
knowledge discovery, structural pattern matching, complex network theory), information
science (bibliometrics, webometrics, social network analysis), physics (nonlinear
dynamics, statistical physics), biology (genomics, food webs), etc. MS Excel,
Matlab and Pajek will be employed for data mining and modeling. Students will
improve their collaborative skills by working in topic dependent teams.
Students will be required to do weekly readings of journal articles and participate in class discussions. Class participation is evaluated on the frequency of relevant, constructive contributions that reflect a close reading of assigned materials and thoughtful reflection on the topic during class as well as via the class majordomo list and handin web pages.
The 15 min presentation
will address a specific topic/question and will be based on readings
from the literature or internet. Sources will be provided. If you can
find
more that's great. See Preparation of
Presentations for more details.
You are expected to
use the office hours the week before you will give the presentation to
discuss your preparation with the instructor. Prepare your presentation
as well as any specific questions you may have in advance.
All students will be
expected to participate in class by reading the assigned
material plus asking and answering questions. Reading materials are
assigned for study in preparation for class discussion. Thus, class 2
material should be completed before attending the second class. Please
do surf the listed web pages so that we can discuss them
in class.
Diverse small projects are designed to create competence in using taught methods and the software effectively. A final research project (which may be done in groups and in collaboration with clients) is due at the end of the semester. Detailed descriptions of projects will be provided. Results have to be submitted via the handin web page.
The final grade breakdown will be as follows: class participation (20%), presentation of selected readings (10%), small projects (30%), and final project (40%). Grades are assigned according to the grading standards of SLIS.
Credits: 3 for L597
Introduction
Class 1 (08-30-2005)
[dm-class1.ppt]
Lecture: Course
Description & Outline, Class Format, Grades, Resources
Introduction to Structural Data
Mining and Modeling.
Structural Data Collection (the importance of high quality data) &
Data Analysis Basics
Project
1: Personal Web page that tells about
you, your expertise, and your expectations on this course. It will be used
to adjust course materials to the programming skills and (departmenal) background
of registered students and to setup the handin web pages. In addition, you will
be asked to phrase a scientific question that network science and/or complex
systems approaches can help answer; select an appropriate dataset; and discuss
how this question might be solved.
Due 09-05-2005
at 8pm (~1 week).
Class 2 (09-06-2005)
[dm-class2.ppt,
dm-lab1.ppt]
Lecture: Introduction to
Graph Theory
Readings: Strogatz
S. H. Exploring complex networks. Nature 410, 268-276
(2001).
Hayes (2000) Graph theory in practice: Part
I and Part II, American Scientist, Volume 88,
No. 2, 9-13 and 104-109.
Lab 1 in Shepherd Lab,
Main Library 030: Data Analysis Basics - Introduction to the IVC
Software Framework
Project
2: Analyze a structural data set.
Due 09-26-2005
at 8pm (~ 3 weeks).
Class 3 (09-13-2005)
[dm-class3.ppt,
dm-lab2.ppt]
Lecture: Matrix Algebra
Primer
Readings: Andrei Broder,
Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie
Stata, Andrew Tomkins & Janet Wiener (2002) Graph structure in the web: Experiments
and models. Computer Networks, 33, pp. 309-320.
Lab 2 in Shepherd Lab,
Main Library 030: Network Analysis
Toolkit & Pajek.
Structural Data Mining
Class 4 (9-20-2005)
[dm-class4.ppt,
dm-lab3.ppt]
Lecture: Data Mining vs.
Structural Data Mining
Reading:
Kristin P. Bennett &
Colin Campbell (2000) Support Vector Machines: Hype or
Hallelujah? SIGKDD Explorations, 2, 2, pp. 1-13.
J.
Kleinberg (1998) Authoritative sources
in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.
Extended version in Journal of the ACM 46 (1999). Also appeared as IBM Research
Report RJ 10076, May 1997.
Listening: Talks by Monika
Henzinger & Steve Lawrence
Presentation:
Pedro Domingos, Matt Richardson.
The Intelligent Surfer: Probabilistic Combination of
Link and Content Information in PageRank. Advances in Neural Information
Processing Systems 14, 2002.
F.
Menczer (2002) Growing and Navigating
the Small World Web by Local Content. Proc. Natl. Acad. Sci. USA 99(22):
14014-14019.
[John Burgoon]
Lab 3 in Shepherd Lab,
Main Library 030: UCINET
Class 5: (9-27-2005)
[dm-class5.ppt,
dm-lab4.ppt]
Lecture: Random Networks
vs. Small World Networks vs. Scale Free Networks, Power Laws in Graphs
Reading: Watts D. J. and
Strogatz S. H. Collective dynamics of
'small-world' networks. Nature 393, 440-442 (1998).
Presentations: Reka Albert,
Albert-Laszlo Barabasi (2002). Statistical mechanics of complex
networks. cond-mat/0106096:
Small World Networks [Todd
Holloway]
Scale Free Model [Muzaffer
Ozakca]
Lab 4 in Shepherd Lab,
Main Library 030: Chizu
Project
3: Analyze and visualize a scholarly data set.
Due 10-17-2005
at 8pm (~ 3 weeks).
10-03-2005 InfoVis Lab Open House, 4-6p, Lilly Library, IUB.
Class 6: (10-04-2005)
[dm-class6.ppt]
Lecture: Structural Similarity
Measures
Reading: Goldsmith, T. E.
& Davenport, D. M. (1990). Assessing structural
similarity of graphs. In R. W. Schvaneveldt (Ed.), Pathfinder associative
networks: Studies in knowledge organization (pp. 75-87), Norwood, NJ: Ablex.
Surf: References
for Graph Similarity
Presentation: Krumhansl
(1978) Concerning the applicability of geometric models
to similarity data: The interrelationship between similarity and spatial density.
Psychological Review, 85, 445-463.
Pruzansky, Tversky & Carroll
(1982) Spatial versus tree representations of proximity
data. Psychometrika, 47, 3-24. [Vsevolod
Kapatsinski]
Lab 5 in Shepherd Lab,
Main Library 030: VxOrd and VxInsight
and other tools compiled by John Burgoon.
Class 7: (10-11-2005)
[dm-class7.ppt]
Lecture: Dimensionality
Reduction Techniques & Clustering Techniques
Readings: R. N. Shepard.
Multidimensional scaling, tree fitting, and clustering.
Science, 210:390-398 (1980).
Presentation: C.T. Zahn.
Graph-Theoretical Methods for Detecting and Describing Gestalt
Clusters. IEEE Transactions on Computers, 20:68-86, 1971.
[Gavin
LaRowe]
Lab in 001: Hands-on
session: Coupling network analysis and visualization.
Descriptive and Predictive Data Modeling
Class 8: (10-18-2005)
[dm-class8.ppt]
Descriptive Modeling Techniques
vs. Predictive/Generative Modeling Techniques
Readings: Dorogovtsev, S. N. & Mendes, J. F.
F. (2002). Evolution of networks.
cond-mat/0106144
Reka Albert, Albert-Laszlo Barabasi
(2002). Statistical mechanics of complex
networks. cond-mat/0106096: Evolving Networks
Final
Project 4: Data analysis and modeling. The
project can (1) test hypotheses derived from grand theory, (2) investigate the
relationships among a set of variables, and tell a story (i.e., construct a
theory) based on the results, or (3) use structural data mining and modeling
to diagnose a problem and prescribe a solution based on the diagnosis.
Due 11-28-2005
at 8pm.
Class 9: (10-25-2005)
[dm-class9.ppt]
Guest Lecture by Larry Yaeger
Lecture: Complex Adaptive
Systems.
Reading:
Ball, P. (1999). The self-made tapestry. Oxford, England: Oxford University
Press, chapter on Communities.
Presentation:
Robert
Axelrod (1998) Disseminating Culture - The Dissemination
of Culture: A Model with Local Convergence and Global Polarization. In The
Complexity of Cooperation: Agent-Based Models of Competition and Collaboration.
Princeton University Press, chapter 7.
Robert Axelrod (1997) Replication of Agent Based Models. In The Complexity of
Cooperation. Princeton University Press. [Adity
Upoma]
Network Dynamics
Class 10: (11-01-2005)
[dm-class10.ppt]
Lecture:
Predictive/Generative Modeling Techniques. Network Structure vs. Global Behavior.
Reading: Katy Börner, Jeegar Maru, and Robert Goldstone (2004).
The Simultaneous
Evolution of Author and Paper Networks. PNAS, 101(Suppl_1) 5266-5273.
Presentation:
Presentation:
M. Granovetter. The strength of weak ties. American
Journal of Sociology, 78(6), 1360-1380 (1973). & Mark Granovetter (1983)
The strength of weak ties. A Network Theory Revisited.
Sociological Theory. 1, 201-233. [Josepth
Cottam]
Lab:
Present your Final Project goals together
with first results.
Class 11: (11-8-2005)
Lecture: Modeling
Diffusion in Evolving Network Ecologies. [dm-class11.ppt]
Reading:
Peter Sheridan Dodds, Duncan J. Watts, and Charles F. Sabel (2003). Information
exchange and the robustness of organizational networks. PNAS, vol. 100,
12516-12521.
Presentation
1: Torsten Hagerstrand. A Monte Carlo Approach to Diffusion. In Spatial Analysis
- A Reader in Statistical Geography. Brian J.L. Berry & Duane F. Marble
(eds.), Prentice Hall, pp.368-384. [Justin
Donaldson]
Presentation 2:
Romualdo Pastor-Satorras and Alessandro Vespignani (2003). Epidemics
and Immunization in Scale Free Networks. In Handbook of Graphs and Networks:
From the Genome to the Internet. Stefan Bornholdt and Hans Georg Schuster (Eds.),
Wiley-VCH, pp. 111-130. [Bryan
Hasty]
Selected Applications
Class 12: (11-15-2005)
[dm-class12.ppt,
p2psearch.ppt]
Lecture & Lab: Selected
Applications
Knowledge
Discovery, Content Filtering
& Search
Reading:
D. Gibson, J. Kleinberg & P. Raghavan
(1998) Inferring Web communities from link topology.
Proc. 9th ACM Conference on Hypertext and Hypermedia.
Gary Flake, Steve Lawrence, C.
Lee Giles & Frans Coetzee (2002) Self-Organization
and Identification of Web Communities. IEEE Computer, 35:3. See http://webselforganization.com/
S. Chakrabarti, B. Dom,
D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan & A. Tomkins,
Hypersearching the
Web. Scientific American, June 1999.
Presentation:
Search on Small World Networks: J. Kleinberg. The
small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium
on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical
Report 99-1776, October 1999.
Search on Scale-Free Networks:
Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani & Bernardo A. Huberman
(2001) Search in Power-Law
Networks. Phys. Rev. E, 64 46135.
Optional Reading:
Fletcher, George , Sheth, Hardik and Börner, Katy. (2004). Unstructured
Peer-to-Peer Networks: Topological Properties and Search Performance. Third
International Joint Conference on Autonomous Agents and Multi-Agent Systems.
W6: Agents and Peer-to-Peer Computing, Moro, Gianluca, Bergmanschi, Sonia and
Aberer, Karl, Eds., New York, July 19-23, pp. 2-13.
Thanksgiving Recess
Class 14: (11-29-2005)
Lecture & Lab: Selected
Applications
Social Network Analysis
Reading:
Wasserman, S. & Pattison, P. (2003). Network
analysis. In Lewis-Beck, M., Bryman, A., and Liao, T.F. (eds.) Encyclopedia
of Social Science Research Methods, to appear. Oregon, OH: Sage Publications.
Mark E. J. Newman (2001) Who
is the best connected scientist? A study of scientific coauthorship networks.
Phys.Rev. E64 (2001) 016131; Phys.Rev. E64 (2001) 016132.
Presentation 1: Duncan
J. Watts, Peter Sheridan Dodds & M. E. J. Newman. Identity and Search in Social Networks. Science 296, 1302-1305
(2002).
M. Girvan and M. E. J. Newman.
Community structure in social and biological networks.
Proc. Natl. Acad. Sci. USA 99, 8271-8276 (2002). [Jonathan
Klinginsmith]
Presentation 2: Valente
Tom W. (1996) Social network thresholds
in the diffusion of innovations. Social Networks, 18, 69-89 (1996). [Aaron
Curtis]
Bioinformatics
Guest Lecture by Luis Rocha on "Bibliome Informatics for
Protein Function Prediction"
Reading:
Maguitman,
A. G., Rechtsteiner, A., Verspoor, K., Strauss, C.E., Rocha, L.M. Large-Scale
Testing Of Bibliome Informatics Using Pfam Protein Families. In: Pacific
Symposium on Bioinformatics 2006. In Press.
Optional Reading:
Papers by Uri
Alon & Albert László Barabási.
Lada A.
Adamic, Dennis Wilkinson, Bernardo A. Huberman, Eytan Adar.
A Literature Based Method for Identifying Gene-Disease
Connections. IEEE Computer Society Bioinformatics Conference (CSB'02), 2002,
pp. 109-118.
Class 15: (12-06-2005)
Lecture: Final Project
Presentations
Software:
Additional Reading:
Excellent Review
articles:
Mark E. J. Newman (2001a).
Scientific collaboration networks. I. Network construction and
fundamental results. Physical Review E, 64, 016131.
Mark E. J. Newman (2001b).
Scientific collaboration networks. II. Shortest paths, weighted
networks, and centrality. Physical Review E, 64, 016132.
Graph Theory
Harary, F. (1969) Graph
Theory. Addison-Wesley.
Knoke and Kuklinski
(1982) Network Analysis: Sage Publications.
Data Mining
David J. Hand, Heikki
Mannila, Padhraic Smyth (2001) Principles of Data Mining: MIT Press.
(on reserve)
Jiawei Han, Micheline
Kamber. Data Mining (2000) Concepts and Techniques: Morgan Kaufmann
Publishers.
Ian H. Witten &
Eibe Frank. Data Mining (1999) Practical
Machine Learning Tools and Techniques with Java Implementations.
Morgan Kaufmann Publishers.
Social Network Analysis
Linton
C. Freeman (2004) The development of social network
analysis: a study in the sociology of science. Empirical Press.
A. Degenne and M. Forse (1999). Introducing Social Networks: London, Sage Publications.
Stanley Wasserman, Katherine Faust,
Dawn Iacobucci (1994). Social Network Analysis: Methods and Applications: Cambridge
Univ Press.
Wasserman and Faust (1994). Social
Network Analysis: Cambridge University Press.
Wasserman, S. & Galaskiewicz,
J. (1994). Advances in Social Network Analysis: Sage.
Everett M. Rogers (1995). The Diffusion
of Innovations: Free Press.
Bibliometrics
Braam, R. B. (1991).
Mapping of Science: Foci of Intellectual Interest in Scientific
Literature: DSWO Press, University of Leiden.
Craine, D. (1972).
Invisible Colleges: Diffusion of Knowledge in Scientific Communities:
The University of Chicago Press.
Donohue, J. C. (1973).
Understanding Scientific Literature: A Bibliometric Approach: MIT Press.
Garfield, E. (1979).
Citation Indexing - Its Theory and Application in Science, Technology
and Humanities: ISI Press, Philadelphia.
Irvine, J. & Martin,
B. (1989). Research Foresight: Pinter Publisher.
Valente, T.W. (1995).
Network Models of the Diffusion of Innovations: Hampton Press.
Egghe, L. & Rousseau,
R. (1990). Introduction to Informetrics. Elsevier.
Dynamic Systems
Buchanan, M. (2002).
Nexus: Small Worlds and the Groundbreaking Science of Networks: Norton
& Company.
Johnson, S. (2001).
Emergence: The Connected Lives of Ants, Brains, Cities, and Software:
Scribner.
Kennedy, J., Russell C.
Eberhart, R. C., Shi, Y. (2001) Butterfly Economics: A New General
Theory of Social and Economic Behavior: Basic Books.
Duncan J. & Watts, D.
J. (1999). Small Worlds: Princeton Univ Press.
Albert-László
Barabási (2002). Linked: The New Science of Networks: Perseus
Publishing.
Information
Diffusion
Valente, Tom W. (1995).
Network Models of the Diffusion of Innovations. Hampton Press.
Everett M. Rogers (1995).
The Diffusion of Innovations. Free Press.
Online Resources:
International Network for Social
Network Analysis
SocioSite
Networks, Groups and Social Interaction
Computing Services
& Software:
University Computing Services Help
Online
Searching is a trick you
can learn.
Related Courses at IU:
L597 Statistics
for Information Science and Usability by John C. Paolillo, IUB.
Data Mining and Analysis by F. Chris Allbright, Kelley School of Business,
IUB.
P747 Advanced Model Fitting
and Model Comparison: Computer techniques and theoretical issues by John
K. Kruschke, Psychology, IUB.
P747/Q700: Bayesian and Data Mining by Richard M. Shiffrin, PSY, IUB.
Complex Adaptive Systems by Robert Goldstone, PSY, IUB.
I400/I590 Biologically Inspired Computing by Luis M. Rocha, SoI, IUB (Fall 2005).
Network Analysis by Stanley
Wasserman for Sociology, IUB (Spring 2005).
Multivariate Methods
by Stanley Wasserman for Psychology, IUB (Spring 2005).
I400/590 Topics in
Informatics - Games and Gossip (3 cr) by Marco Janssen, Informatics, IUB
(Spring 2005).
I400/I590 Artificial
Life as an approach to Artificial Intelligence (3
cr) by Larry Yaeger, Informatics, IUB (Spring 2005).
B669: Managing &
Mining Scientific Data, Topics in Database and Information Systems by Ed
Robertson and Dirk Van Gucht, CS, IUB.
CSCI B659 Topics in
Artificial Intelligence: Web Mining by Filippo Menczer, Informatics, IUB
P 700 Fractals and Pattern Formation (1 cr) by
Y. Hayakawa, Physics, IUB.
INFO I512 Scientific Data Management and Analysis I590 High Performance Data
Management and Processing in Informatics, IUB by Zdzisaw Meglicki, UITS.
T610 Networked
Society by Harmeet Sawhney,
Telecommunications, IUB.
T603 Communication Networks by J. Alison Bryant, Telecommunications, IUB.
(Starts in Fall 23005)
CSCI590: Data Mining by Snehasis Mukhopadhyay, CS, IUPUI.
Related Courses off Campus:
Networks and Complexity
in Social Systems by D. J. Watts, Columbia University.
Models of Social Dynamics
by David Gibson, Harvard University.
Introduction
to Social Network Methods by Robert A. Hanneman, University of California.
Social Network Analysis
by Robert A. Hanneman, Department of Sociology, University of California.
Scaling, Power Laws, and the Small
World Phenomena in Networks by Don Towsley, University of Massachusetts
The Structure of Information
Networks by Jon Kleinberg, Cornell University
Biological Applications
of Physics by Paul Ginsparg, Cornell University
Complex
Systems by Mark Newman,
University of Michigan
Seminar on
Social Networks by James
Moody, The Ohio State University
Complex Human
Networks Reading Group
Study Group on Complex
Networks
Workshops &
Conferences
International Conference
on Computability and Complexity in Analysis to be held in Cincinnati, August
28-30, 2003.
http://ella.slis.indiana.edu/~katy/L597
Last modified: 09/16/2005