Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.
- What is clustering? What are some applications of clustering in text mining and analysis?
- How can we use a mixture model to do document clustering? 1. How many parameters are there in such a model?
- How is the mixture model for document clustering related to a topic model such as PLSA? In what way are they similar? Where are they different?
- How do we determine the cluster for each document after estimating all the parameters of a mixture model?
- How does hierarchical agglomerative clustering work? How do single-link, complete-link, and average-link work for computing group similarity? Which of these three ways of computing group similarity is least sensitive to outliers in the data?
- How do we evaluate clustering results?
- What is text categorization? What are some applications of text categorization?
- What does the training data for categorization look like?
- How does the Naïve Bayes classifier work?
- Why do we often use logarithm in the scoring function for Naïve Bayes?
4.1 Text Clustering: Motivation
image.png image.png image.png4.2 Text Clustering: Generative Probabilistic Models Part 1
image.png image.png每篇文章只有一个主题,才可以做 Cluster
image.png image.png image.png image.png image.png- 对于文章中的每个词: Cluster Model 选择文档只选择一次;Topic Model 每个词都选择一次
- Cluster Model: Word Distribution 产生文章中的每一个词;Topic Model 不一定Word Distribution 就能产生所有文章中的词,可以在别的 Topic 中产生
L:#文章中的单词数
4.3 Text Clustering: Generative Probabilistic Models Part 2
image.png如何从2个 Cluster拓展到 N 个 Cluster
image.png image.png
网友评论