L597: Structural Data Mining and Modeling (Fall 2005)
Course Description | Grade | Course Outline | Resources
Lecture: Tue Tue
1p-2:15p & 2:30p-3:45p, LI 001
Office hours: Mon 3-4p, LI 019 (Katy), Fri 2-3p, LI 020 (Weimao)
Prerequisites: L401 or equivalent
Class Listserver: katy_L597-L@listserv.indiana.edu
Class Web Page: http://ella.slis.indiana.edu/~katy/L597-Fall05
Project Handin Webpage: http://ella.slis.indiana.edu/~katy/handin/L597-F05/cgi/handinlogin.cgi
Web Resources: MS Excel Primer, Pajek tutorial, Perl Scripts
Recorded Talks: Information Processing Talks 2002-2003, Sackler Colloquium on Mapping Knowledge Domains Talks, Digital Biology Talks, Complexity Digest 2000.31,
Talk Series on Networks and Complex Systems, Spring 05, Fall 04.
Students will learn how to frame the research question, collect the data, run the analysis, and interpret the results. In addition, they will learn how to design and evaluate descriptive and predictive models of diverse complex networks to improve their understanding of the underlying principles.
Upon taking this course students will be able to analyze and describe real networks (power grids, WWW, social networks, etc.) as well as relevant phenomena such as disease propagation, search, organizational performance, social power, and the diffusion of innovations.
A combination of lectures and 4-5 labs will be utilized to introduce material from such diverse domains as mathematics (graph theory, matrix algebra, chaos theory), computer science (data mining, knowledge discovery, structural pattern matching, complex network theory), information science (bibliometrics, webometrics, social network analysis), physics (nonlinear dynamics, statistical physics), biology (genomics, food webs), etc. MS Excel, Matlab and Pajek will be employed for data mining and modeling. Students will improve their collaborative skills by working in topic dependent teams.
Students will be required to do weekly readings of journal articles and participate in class discussions. Class participation is evaluated on the frequency of relevant, constructive contributions that reflect a close reading of assigned materials and thoughtful reflection on the topic during class as well as via the class majordomo list and handin web pages.
The 15 min presentation
will address a specific topic/question and will be based on readings
from the literature or internet. Sources will be provided. If you can
more that's great. See Preparation of
Presentations for more details.
You are expected to use the office hours the week before you will give the presentation to discuss your preparation with the instructor. Prepare your presentation as well as any specific questions you may have in advance.
All students will be expected to participate in class by reading the assigned material plus asking and answering questions. Reading materials are assigned for study in preparation for class discussion. Thus, class 2 material should be completed before attending the second class. Please do surf the listed web pages so that we can discuss them in class.
Diverse small projects are designed to create competence in using taught methods and the software effectively. A final research project (which may be done in groups and in collaboration with clients) is due at the end of the semester. Detailed descriptions of projects will be provided. Results have to be submitted via the handin web page.
The final grade breakdown will be as follows: class participation (20%), presentation of selected readings (10%), small projects (30%), and final project (40%). Grades are assigned according to the grading standards of SLIS.
Credits: 3 for L597
Class 1 (08-30-2005)
Lecture: Course Description & Outline, Class Format, Grades, Resources
Introduction to Structural Data Mining and Modeling.
Structural Data Collection (the importance of high quality data) & Data Analysis Basics
1: Personal Web page that tells about
you, your expertise, and your expectations on this course. It will be used
to adjust course materials to the programming skills and (departmenal) background
of registered students and to setup the handin web pages. In addition, you will
be asked to phrase a scientific question that network science and/or complex
systems approaches can help answer; select an appropriate dataset; and discuss
how this question might be solved.
Due 09-05-2005 at 8pm (~1 week).
Class 2 (09-06-2005)
Lecture: Introduction to Graph Theory
Readings: Strogatz S. H. Exploring complex networks. Nature 410, 268-276 (2001).
Hayes (2000) Graph theory in practice: Part I and Part II, American Scientist, Volume 88, No. 2, 9-13 and 104-109.
Lab 1 in Shepherd Lab, Main Library 030: Data Analysis Basics - Introduction to the IVC Software Framework
2: Analyze a structural data set.
Due 09-26-2005 at 8pm (~ 3 weeks).
Class 3 (09-13-2005)
Lecture: Matrix Algebra Primer
Readings: Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins & Janet Wiener (2002) Graph structure in the web: Experiments and models. Computer Networks, 33, pp. 309-320.
Lab 2 in Shepherd Lab, Main Library 030: Network Analysis Toolkit & Pajek.
Structural Data Mining
Class 4 (9-20-2005)
Lecture: Data Mining vs. Structural Data Mining
Kristin P. Bennett & Colin Campbell (2000) Support Vector Machines: Hype or Hallelujah? SIGKDD Explorations, 2, 2, pp. 1-13.
J. Kleinberg (1998) Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms. Extended version in Journal of the ACM 46 (1999). Also appeared as IBM Research Report RJ 10076, May 1997.
Listening: Talks by Monika Henzinger & Steve Lawrence
Presentation: Pedro Domingos, Matt Richardson. The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. Advances in Neural Information Processing Systems 14, 2002.
F. Menczer (2002) Growing and Navigating the Small World Web by Local Content. Proc. Natl. Acad. Sci. USA 99(22): 14014-14019. [John Burgoon]
Lab 3 in Shepherd Lab, Main Library 030: UCINET
Class 5: (9-27-2005)
Lecture: Random Networks vs. Small World Networks vs. Scale Free Networks, Power Laws in Graphs
Reading: Watts D. J. and Strogatz S. H. Collective dynamics of 'small-world' networks. Nature 393, 440-442 (1998).
Presentations: Reka Albert, Albert-Laszlo Barabasi (2002). Statistical mechanics of complex networks. cond-mat/0106096:
Small World Networks [Todd Holloway]
Scale Free Model [Muzaffer Ozakca]
Lab 4 in Shepherd Lab, Main Library 030: Chizu
3: Analyze and visualize a scholarly data set.
Due 10-17-2005 at 8pm (~ 3 weeks).
10-03-2005 InfoVis Lab Open House, 4-6p, Lilly Library, IUB.
Class 6: (10-04-2005)
Lecture: Structural Similarity Measures
Reading: Goldsmith, T. E. & Davenport, D. M. (1990). Assessing structural similarity of graphs. In R. W. Schvaneveldt (Ed.), Pathfinder associative networks: Studies in knowledge organization (pp. 75-87), Norwood, NJ: Ablex. Surf: References for Graph Similarity
Presentation: Krumhansl (1978) Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density. Psychological Review, 85, 445-463.
Pruzansky, Tversky & Carroll (1982) Spatial versus tree representations of proximity data. Psychometrika, 47, 3-24. [Vsevolod Kapatsinski]
Lab 5 in Shepherd Lab, Main Library 030: VxOrd and VxInsight and other tools compiled by John Burgoon.
Class 7: (10-11-2005)
Lecture: Dimensionality Reduction Techniques & Clustering Techniques
Readings: R. N. Shepard. Multidimensional scaling, tree fitting, and clustering. Science, 210:390-398 (1980).
Presentation: C.T. Zahn. Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions on Computers, 20:68-86, 1971. [Gavin LaRowe]
Lab in 001: Hands-on session: Coupling network analysis and visualization.
Descriptive and Predictive Data Modeling
Class 8: (10-18-2005)
Descriptive Modeling Techniques vs. Predictive/Generative Modeling Techniques
Readings: Dorogovtsev, S. N. & Mendes, J. F. F. (2002). Evolution of networks. cond-mat/0106144
Reka Albert, Albert-Laszlo Barabasi (2002). Statistical mechanics of complex networks. cond-mat/0106096: Evolving Networks
Project 4: Data analysis and modeling. The
project can (1) test hypotheses derived from grand theory, (2) investigate the
relationships among a set of variables, and tell a story (i.e., construct a
theory) based on the results, or (3) use structural data mining and modeling
to diagnose a problem and prescribe a solution based on the diagnosis.
Due 11-28-2005 at 8pm.
Class 9: (10-25-2005)
Guest Lecture by Larry Yaeger
Lecture: Complex Adaptive Systems.
Reading: Ball, P. (1999). The self-made tapestry. Oxford, England: Oxford University Press, chapter on Communities.
Presentation: Robert Axelrod (1998) Disseminating Culture - The Dissemination of Culture: A Model with Local Convergence and Global Polarization. In The Complexity of Cooperation: Agent-Based Models of Competition and Collaboration. Princeton University Press, chapter 7.
Robert Axelrod (1997) Replication of Agent Based Models. In The Complexity of Cooperation. Princeton University Press. [Adity Upoma]
Class 10: (11-01-2005)
Lecture: Predictive/Generative Modeling Techniques. Network Structure vs. Global Behavior.
Reading: Katy Börner, Jeegar Maru, and Robert Goldstone (2004). The Simultaneous Evolution of Author and Paper Networks. PNAS, 101(Suppl_1) 5266-5273.
Presentation: Presentation: M. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6), 1360-1380 (1973). & Mark Granovetter (1983) The strength of weak ties. A Network Theory Revisited. Sociological Theory. 1, 201-233. [Josepth Cottam]
Lab: Present your Final Project goals together with first results.
Class 11: (11-8-2005)
Lecture: Modeling Diffusion in Evolving Network Ecologies. [dm-class11.ppt]
Reading: Peter Sheridan Dodds, Duncan J. Watts, and Charles F. Sabel (2003). Information exchange and the robustness of organizational networks. PNAS, vol. 100, 12516-12521.
Presentation 1: Torsten Hagerstrand. A Monte Carlo Approach to Diffusion. In Spatial Analysis - A Reader in Statistical Geography. Brian J.L. Berry & Duane F. Marble (eds.), Prentice Hall, pp.368-384. [Justin Donaldson]
Presentation 2: Romualdo Pastor-Satorras and Alessandro Vespignani (2003). Epidemics and Immunization in Scale Free Networks. In Handbook of Graphs and Networks: From the Genome to the Internet. Stefan Bornholdt and Hans Georg Schuster (Eds.), Wiley-VCH, pp. 111-130. [Bryan Hasty]
Class 12: (11-15-2005)
Lecture & Lab: Selected Applications
Knowledge Discovery, Content Filtering & Search
D. Gibson, J. Kleinberg & P. Raghavan (1998) Inferring Web communities from link topology. Proc. 9th ACM Conference on Hypertext and Hypermedia.
Gary Flake, Steve Lawrence, C. Lee Giles & Frans Coetzee (2002) Self-Organization and Identification of Web Communities. IEEE Computer, 35:3. See http://webselforganization.com/
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan & A. Tomkins, Hypersearching the Web. Scientific American, June 1999.
Presentation: Search on Small World Networks: J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776, October 1999.
Search on Scale-Free Networks: Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani & Bernardo A. Huberman (2001) Search in Power-Law Networks. Phys. Rev. E, 64 46135.
Optional Reading: Fletcher, George , Sheth, Hardik and Börner, Katy. (2004). Unstructured Peer-to-Peer Networks: Topological Properties and Search Performance. Third International Joint Conference on Autonomous Agents and Multi-Agent Systems. W6: Agents and Peer-to-Peer Computing, Moro, Gianluca, Bergmanschi, Sonia and Aberer, Karl, Eds., New York, July 19-23, pp. 2-13.
Class 14: (11-29-2005)
Lecture & Lab: Selected Applications
Social Network Analysis
Wasserman, S. & Pattison, P. (2003). Network analysis. In Lewis-Beck, M., Bryman, A., and Liao, T.F. (eds.) Encyclopedia of Social Science Research Methods, to appear. Oregon, OH: Sage Publications.
Mark E. J. Newman (2001) Who is the best connected scientist? A study of scientific coauthorship networks. Phys.Rev. E64 (2001) 016131; Phys.Rev. E64 (2001) 016132.
Presentation 1: Duncan J. Watts, Peter Sheridan Dodds & M. E. J. Newman. Identity and Search in Social Networks. Science 296, 1302-1305 (2002).
M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 8271-8276 (2002). [Jonathan Klinginsmith]
Presentation 2: Valente Tom W. (1996) Social network thresholds in the diffusion of innovations. Social Networks, 18, 69-89 (1996). [Aaron Curtis]
Guest Lecture by Luis Rocha on "Bibliome Informatics for Protein Function Prediction"
Reading: Maguitman, A. G., Rechtsteiner, A., Verspoor, K., Strauss, C.E., Rocha, L.M. Large-Scale Testing Of Bibliome Informatics Using Pfam Protein Families. In: Pacific Symposium on Bioinformatics 2006. In Press.
Papers by Uri Alon & Albert László Barabási.
Lada A. Adamic, Dennis Wilkinson, Bernardo A. Huberman, Eytan Adar. A Literature Based Method for Identifying Gene-Disease Connections. IEEE Computer Society Bioinformatics Conference (CSB'02), 2002, pp. 109-118.
Class 15: (12-06-2005)
Lecture: Final Project Presentations
Excellent Review articles:
Mark E. J. Newman (2001a). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64, 016131.
Mark E. J. Newman (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64, 016132.
Harary, F. (1969) Graph Theory. Addison-Wesley.
Knoke and Kuklinski (1982) Network Analysis: Sage Publications.
David J. Hand, Heikki Mannila, Padhraic Smyth (2001) Principles of Data Mining: MIT Press. (on reserve)
Jiawei Han, Micheline Kamber. Data Mining (2000) Concepts and Techniques: Morgan Kaufmann Publishers.
Ian H. Witten & Eibe Frank. Data Mining (1999) Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers.
Social Network Analysis
Linton C. Freeman (2004) The development of social network analysis: a study in the sociology of science. Empirical Press.
A. Degenne and M. Forse (1999). Introducing Social Networks: London, Sage Publications.
Stanley Wasserman, Katherine Faust, Dawn Iacobucci (1994). Social Network Analysis: Methods and Applications: Cambridge Univ Press.
Wasserman and Faust (1994). Social Network Analysis: Cambridge University Press.
Wasserman, S. & Galaskiewicz, J. (1994). Advances in Social Network Analysis: Sage.
Everett M. Rogers (1995). The Diffusion of Innovations: Free Press.
Braam, R. B. (1991). Mapping of Science: Foci of Intellectual Interest in Scientific Literature: DSWO Press, University of Leiden.
Craine, D. (1972). Invisible Colleges: Diffusion of Knowledge in Scientific Communities: The University of Chicago Press.
Donohue, J. C. (1973). Understanding Scientific Literature: A Bibliometric Approach: MIT Press.
Garfield, E. (1979). Citation Indexing - Its Theory and Application in Science, Technology and Humanities: ISI Press, Philadelphia.
Irvine, J. & Martin, B. (1989). Research Foresight: Pinter Publisher.
Valente, T.W. (1995). Network Models of the Diffusion of Innovations: Hampton Press.
Egghe, L. & Rousseau, R. (1990). Introduction to Informetrics. Elsevier.
Buchanan, M. (2002). Nexus: Small Worlds and the Groundbreaking Science of Networks: Norton & Company.
Johnson, S. (2001). Emergence: The Connected Lives of Ants, Brains, Cities, and Software: Scribner.
Kennedy, J., Russell C. Eberhart, R. C., Shi, Y. (2001) Butterfly Economics: A New General Theory of Social and Economic Behavior: Basic Books.
Duncan J. & Watts, D. J. (1999). Small Worlds: Princeton Univ Press.
Albert-László Barabási (2002). Linked: The New Science of Networks: Perseus Publishing.
Valente, Tom W. (1995). Network Models of the Diffusion of Innovations. Hampton Press.
Everett M. Rogers (1995). The Diffusion of Innovations. Free Press.
International Network for Social Network Analysis
SocioSite Networks, Groups and Social Interaction
University Computing Services Help Online
Searching is a trick you can learn.
Related Courses at IU:
L597 Statistics for Information Science and Usability by John C. Paolillo, IUB.
Data Mining and Analysis by F. Chris Allbright, Kelley School of Business, IUB.
P747 Advanced Model Fitting and Model Comparison: Computer techniques and theoretical issues by John K. Kruschke, Psychology, IUB.
P747/Q700: Bayesian and Data Mining by Richard M. Shiffrin, PSY, IUB.
Complex Adaptive Systems by Robert Goldstone, PSY, IUB.
I400/I590 Biologically Inspired Computing by Luis M. Rocha, SoI, IUB (Fall 2005).
Network Analysis by Stanley Wasserman for Sociology, IUB (Spring 2005).
Multivariate Methods by Stanley Wasserman for Psychology, IUB (Spring 2005).
I400/590 Topics in Informatics - Games and Gossip (3 cr) by Marco Janssen, Informatics, IUB (Spring 2005).
I400/I590 Artificial Life as an approach to Artificial Intelligence (3 cr) by Larry Yaeger, Informatics, IUB (Spring 2005).
B669: Managing & Mining Scientific Data, Topics in Database and Information Systems by Ed Robertson and Dirk Van Gucht, CS, IUB.
CSCI B659 Topics in Artificial Intelligence: Web Mining by Filippo Menczer, Informatics, IUB
P 700 Fractals and Pattern Formation (1 cr) by Y. Hayakawa, Physics, IUB.
INFO I512 Scientific Data Management and Analysis I590 High Performance Data Management and Processing in Informatics, IUB by Zdzisaw Meglicki, UITS.
T610 Networked Society by Harmeet Sawhney, Telecommunications, IUB.
T603 Communication Networks by J. Alison Bryant, Telecommunications, IUB. (Starts in Fall 23005)
CSCI590: Data Mining by Snehasis Mukhopadhyay, CS, IUPUI.
Related Courses off Campus:
Networks and Complexity in Social Systems by D. J. Watts, Columbia University.
Models of Social Dynamics by David Gibson, Harvard University.
Introduction to Social Network Methods by Robert A. Hanneman, University of California.
Social Network Analysis by Robert A. Hanneman, Department of Sociology, University of California.
Scaling, Power Laws, and the Small World Phenomena in Networks by Don Towsley, University of Massachusetts
The Structure of Information Networks by Jon Kleinberg, Cornell University
Biological Applications of Physics by Paul Ginsparg, Cornell University
Complex Systems by Mark Newman, University of Michigan
Seminar on Social Networks by James Moody, The Ohio State University
Networks Reading Group
Study Group on Complex Networks
International Conference on Computability and Complexity in Analysis to be held in Cincinnati, August 28-30, 2003.
Home Faculty Katy Börner L597
Last modified: 09/16/2005