Community Detection & Mining in Social Media

Morgan & Claypool Publishers, 2010

by

Lei Tang, Director of Data Science at Lyft. [Website]
Huan Liu, Arizona State University [Website]

Visit the book's GitHub repo for more information.

Abstract

The past decade has witnessed the emergence of participatory Web and social media, bringing people together in many creative ways. Millions of users are playing, tagging, working, and socializing online, demonstrating new forms of collaboration, communication, and intelligence that were hardly imaginable just a short time ago. Social media also helps reshape business models, sway opinions and emotions, and opens up numerous possibilities to study human interaction and collective behavior in an unparalleled scale. This lecture, from a data mining perspective, introduces characteristics of social media, reviews representative tasks of computing with social media, and illustrates associated challenges. It introduces basic concepts, presents state-of-the-art algorithms with easy-to-understand examples, and recommends effective evaluation methods. In particular, we discuss graph-based community detection techniques and many important extensions that handle dynamic, heterogeneous networks in social media. We also demonstrate how discovered patterns of communities can be used for social media mining. The concepts, algorithms, and methods presented in this lecture can help harness the power of social media and support building socially-intelligent systems. This book is an accessible introduction to the study of community detection and mining in social media. It is an essential reading for students, researchers, and practitioners in disciplines and applications where social media is a key source of data that piques our curiosity to understand, manage, innovate, and excel.

Table of content

1.Social Media and Social Computing

1.1 Social Media

1.2 Concepts and Definitions

1.2.1 Networks and Representations

1.2.2 Properties of Large-Scale Networks

1.3 Challenges

1.4 Social Computing Tasks

1.4.1 Network Modeling

1.4.2 Centrality Analysis and Influence Modeling

1.4.3 Community Detection

1.4.4 Classification and Recommendation

1.4.5 Privacy, Spam and Security

1.5 Summary

2 Nodes, Ties, and Influence

2.1 Importance of Nodes

2.2 Strengths of Ties

2.2.1 Learning from Network Topology

2.2.2 Learning from Attributes and Interactions

2.2.3 Learning from Sequence of User Activities

2.3 Influence Modeling

2.3.1 Linear Threshold Model (LTM)

2.3.2 Independent Cascade Model (ICM)

2.3.3 Influence Maximization

2.3.4 Distinguish Influence and Correlation

3. Community Detection and Evaluation

3.1 Node-Centric Community Detection

3.1.1 Complete Mutuality

3.1.2 Reachability

3.2 Group-Centric Community Detection

3.3 Network-Centric Community Detection

3.3.1 Vertex Similarity

3.3.2 Latent Space Models

3.3.3 Block Model Approximation

3.3.4 Spectral Clustering

3.3.5 Modularity Maximization

3.3.6 A Unified Process

3.4 Hierarchy-Centric Community Detection

3.4.1 Divisive Hierarchical Clustering

3.4.2 Agglomerative Hierarchical Clustering

3.5 Community Evaluation

4. Communities in Heterogeneous Networks

4.1 Heterogeneous Networks

4.2 Multi-Dimensional Networks

4.2.1 Network Integration

4.2.2 Utility Integration

4.2.3 Feature Integration

4.2.4 Partition Integration

4.3 Multi-Mode Networks

4.3.1 Co-Clustering on Two-Mode Networks

4.3.2 Generalization to Multi-Mode Networks

5. Social Media Mining

5.1 Evolution Patterns in Social Media

5.1.1 A Naïve Approach to Studying Community Evolution

5.1.2 Community Evolution in Smoothly Evolving Networks

5.1.3 Segment-based Clustering with Evolving Networks

5.2 Classification with Network Data

5.2.1 Collective Classification

5.2.2 Community-based Learning

5.2.3 Summary

Appendix

Getting the Book

Digital copies can be downloaded from Morgan & Claypool;

Print copies can be ordered through Amazon.
Chinese edition: Amazon.cn, china-pub, dangdang

Please feel free to contact the authors if you find any good, bad or ugly in the book.

Lecture Materials

If the links does not work, please visit the book's GitHub repo and let us know.

Data Sets

Here provides several toy data sets used in the book. The data sets are specified in txt format, which can be loaded into commonly uses software for network analysis.

the toy network (fig 1.1)
the multi-dimensional network (fig 4.4) or the network snapshots (fig 5.4)
the 2-mode network (fig 4.6)

Errata

P25, \delta(v) ==> \delta({v})
P34, in the equation below k-clique, d(v_i, v_j) ==> g(v_i, v_j)
P47, in "since all shortest paths from node 2 to any node in {4, 5, 6, 7, 8, 9} has either to pass e(1, 2) or e(1, 3)", e(1,3) should be e(2,3)
P65, the last equation, on the right hand, it should be \frac{1}{p} YY^T
P67, in "clearly, the second column of H2 encodes...", H2 ==> \bar{H}