Community Detection and Mining in Social Media

Morgan & Claypool Publishers, 2010

by

Lei Tang, Yahoo! Labs

Huan Liu, Arizona State University

book cover

Abstract

The past decade has witnessed the emergence of participatory Web and social media, bringing people together in many creative ways. Millions of users are playing, tagging, working, and socializing online, demonstrating new forms of collaboration, communication, and intelligence that were hardly imaginable just a short time ago. Social media also helps reshape business models, sway opinions and emotions, and opens up numerous possibilities to study human interaction and collective behavior in an unparalleled scale. This lecture, from a data mining perspective, introduces characteristics of social media, reviews representative tasks of computing with social media, and illustrates associated challenges. It introduces basic concepts, presents state-of-the-art algorithms with easy-to-understand examples, and recommends effective evaluation methods. In particular, we discuss graph-based com- munity detection techniques and many important extensions that handle dynamic, heterogeneous networks in social media. We also demonstrate how discovered patterns of communities can be used for social media mining. The concepts, algorithms, and methods presented in this lecture can help harness the power of social media and support building socially-intelligent systems. This book is an accessible introduction to the study of community detection and mining in social media. It is an essential reading for students, researchers, and practitioners in disciplines and applications where social media is a key source of data that piques our curiosity to understand, manage, innovate, and excel.


Getting the Book

Please feel free to contact the authors if you find any good, bad or ugly in the book.

Lecture Materials

Here provides some lecture materials, including lecture slides in both pptx and pdf format, all figures (except algorithms) and some toy data sets in both matlab and csv format.
Chapter lecture slides figures
1. Social Media & Social Computing pptx pdf fig-ch1
2. Nodes, Ties & Influence pptx pdf fig-ch2
3. Community Detection & Evaluation pptx pdf fig-ch3
4. Communities in Heterogeneous Networks pptx pdf fig-ch4
5. Social Media Mining pptx pdf fig-ch5
Appendix fig-appendix
All Chapters pptx pdf fig-cdm

Data Sets

Here provides several toy data sets used in the book. The data sets are specified in txt format, which can be loaded into commonly uses software for network analysis.

Related Tutorial/Lecture Slides

Below are related tutorial/lecture presentations based on the book materials:

Errata

  • P25, \delta(v) ==> \delta({v})
  • P34, in the equation below k-clique, d(v_i, v_j) ==> g(v_i, v_j)
  • P47, in "since all shortest paths from node 2 to any node in {4, 5, 6, 7, 8, 9} has either to pass e(1, 2) or e(1, 3)", e(1,3) should be e(2,3)
  • P65, the last equation, on the right hand, it should be \frac{1}{p} YY^T
  • P67, in "clearly, the second column of H2 encodes...", H2 ==> \bar{H}

Further Reading and Related Books


Table of Contents

1.Social Media and Social Computing
       1.1 Social Media
       1.2 Concepts and Definitions
             1.2.1 Networks and Representations
             1.2.2 Properties of Large-Scale Networks
       1.3 Challenges
       1.4 Social Computing Tasks
             1.4.1 Network Modeling
             1.4.2 Centrality Analysis and Influence Modeling
             1.4.3 Community Detection
             1.4.4 Classification and Recommendation
             1.4.5 Privacy, Spam and Security
       1.5 Summary
2 Nodes, Ties, and Influence
      2.1 Importance of Nodes
      2.2 Strengths of Ties
            2.2.1 Learning from Network Topology
            2.2.2 Learning from Attributes and Interactions
            2.2.3 Learning from Sequence of User Activities
      2.3 Influence Modeling
            2.3.1 Linear Threshold Model (LTM)
            2.3.2 Independent Cascade Model (ICM)
            2.3.3 Influence Maximization
            2.3.4 Distinguish Influence and Correlation
3. Community Detection and Evaluation
      3.1 Node-Centric Community Detection
            3.1.1 Complete Mutuality
            3.1.2 Reachability
      3.2 Group-Centric Community Detection
      3.3 Network-Centric Community Detection
            3.3.1 Vertex Similarity
            3.3.2 Latent Space Models
            3.3.3 Block Model Approximation
            3.3.4 Spectral Clustering
            3.3.5 Modularity Maximization
            3.3.6 A Unified Process
      3.4 Hierarchy-Centric Community Detection
            3.4.1 Divisive Hierarchical Clustering
            3.4.2 Agglomerative Hierarchical Clustering
      3.5 Community Evaluation
4. Communities in Heterogeneous Networks
      4.1 Heterogeneous Networks
      4.2 Multi-Dimensional Networks
            4.2.1 Network Integration
            4.2.2 Utility Integration
            4.2.3 Feature Integration
            4.2.4 Partition Integration
      4.3 Multi-Mode Networks
            4.3.1 Co-Clustering on Two-Mode Networks
            4.3.2 Generalization to Multi-Mode Networks
5. Social Media Mining
      5.1 Evolution Patterns in Social Media
            5.1.1 A Naïve Approach to Studying Community Evolution
            5.1.2 Community Evolution in Smoothly Evolving Networks
            5.1.3 Segment-based Clustering with Evolving Networks
      5.2 Classification with Network Data
            5.2.1 Collective Classification
            5.2.2 Community-based Learning
            5.2.3 Summary
Appendix