Community Detection & Mining in Social Media
Abstract
The past decade has witnessed the emergence of participatory Web and social media, bringing people together in many creative ways. Millions of users are playing, tagging, working, and socializing online, demonstrating new forms of collaboration, communication, and intelligence that were hardly imaginable just a short time ago. Social media also helps reshape business models, sway opinions and emotions, and opens up numerous possibilities to study human interaction and collective behavior in an unparalleled scale. This lecture, from a data mining perspective, introduces characteristics of social media, reviews representative tasks of computing with social media, and illustrates associated challenges. It introduces basic concepts, presents state-of-the-art algorithms with easy-to-understand examples, and recommends effective evaluation methods. In particular, we discuss graph-based community detection techniques and many important extensions that handle dynamic, heterogeneous networks in social media. We also demonstrate how discovered patterns of communities can be used for social media mining. The concepts, algorithms, and methods presented in this lecture can help harness the power of social media and support building socially-intelligent systems. This book is an accessible introduction to the study of community detection and mining in social media. It is an essential reading for students, researchers, and practitioners in disciplines and applications where social media is a key source of data that piques our curiosity to understand, manage, innovate, and excel.
Table of content
1.Social Media and Social Computing
1.1 Social Media
1.2 Concepts and Definitions
1.2.1 Networks and Representations
1.2.2 Properties of Large-Scale Networks
1.3 Challenges
1.4 Social Computing Tasks
1.4.1 Network Modeling
1.4.2 Centrality Analysis and Influence Modeling
1.4.3 Community Detection
1.4.4 Classification and Recommendation
1.4.5 Privacy, Spam and Security
1.5 Summary
2 Nodes, Ties, and Influence
2.1 Importance of Nodes
2.2 Strengths of Ties
2.2.1 Learning from Network Topology
2.2.2 Learning from Attributes and Interactions
2.2.3 Learning from Sequence of User Activities
2.3 Influence Modeling
2.3.1 Linear Threshold Model (LTM)
2.3.2 Independent Cascade Model (ICM)
2.3.3 Influence Maximization
2.3.4 Distinguish Influence and Correlation
3. Community Detection and Evaluation
3.1 Node-Centric Community Detection
3.1.1 Complete Mutuality
3.1.2 Reachability
3.2 Group-Centric Community Detection
3.3 Network-Centric Community Detection
3.3.1 Vertex Similarity
3.3.2 Latent Space Models
3.3.3 Block Model Approximation
3.3.4 Spectral Clustering
3.3.5 Modularity Maximization
3.3.6 A Unified Process
3.4 Hierarchy-Centric Community Detection
3.4.1 Divisive Hierarchical Clustering
3.4.2 Agglomerative Hierarchical Clustering
3.5 Community Evaluation
4. Communities in Heterogeneous Networks
4.1 Heterogeneous Networks
4.2 Multi-Dimensional Networks
4.2.1 Network Integration
4.2.2 Utility Integration
4.2.3 Feature Integration
4.2.4 Partition Integration
4.3 Multi-Mode Networks
4.3.1 Co-Clustering on Two-Mode Networks
4.3.2 Generalization to Multi-Mode Networks
5. Social Media Mining
5.1 Evolution Patterns in Social Media
5.1.1 A Naïve Approach to Studying Community Evolution
5.1.2 Community Evolution in Smoothly Evolving Networks
5.1.3 Segment-based Clustering with Evolving Networks
5.2 Classification with Network Data
5.2.1 Collective Classification
5.2.2 Community-based Learning
5.2.3 Summary
Appendix
Getting the Book
Digital copies can be downloaded from Morgan & Claypool;
Please feel free to contact the authors if you find any good, bad or ugly in the book.
Lecture Materials
Data Sets
Here provides several toy data sets used in the book. The data sets are specified in txt format, which can be loaded into commonly uses software for network analysis.
the toy network (fig 1.1)
the multi-dimensional network (fig 4.4) or the network snapshots (fig 5.4)
the 2-mode network (fig 4.6)
Related Tutorial/Lecture Slides
Below are related tutorial/lecture presentations based on the book materials:
An invited talk @UC Berkely Social Computing Class: Community Detection for Social Computing, 2011
Some book materials were used for teaching at HKUST and Gatech.
An invited talk with brief coverage of implementations in Hadoop, presented at SVForum Software Architecture and Platform SIG: Large Scale Community Detection for Social Computing, with Implementations in Hadoop, 2011.
Part of the materials in the book is presented in the tutorial on Community Detection and Behavior Study for Social Computing, at the 1st IEEE International Conference on Social Computing (SocialCom’09), 2009.
Errata
P25, \delta(v) ==> \delta({v})
P34, in the equation below k-clique, d(v_i, v_j) ==> g(v_i, v_j)
P47, in "since all shortest paths from node 2 to any node in {4, 5, 6, 7, 8, 9} has either to pass e(1, 2) or e(1, 3)", e(1,3) should be e(2,3)
P65, the last equation, on the right hand, it should be \frac{1}{p} YY^T
P67, in "clearly, the second column of H2 encodes...", H2 ==> \bar{H}
Further Reading and Related Books
This file lists the references mentioned in the book.
David Kempe's lecture notes on "Structure and Dynamics of Information in Networks", focusing more on theory and approximation.
D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World, focusing more on network structure and game theory.