美文网首页
Week 4 (Text Mining)

Week 4 (Text Mining)

作者: woodwood2000 | 来源:发表于2018-01-02 14:27 被阅读0次

Guiding Questions
Develop your answers to the following guiding questions while watching the video lectures throughout the week.

  1. What is clustering? What are some applications of clustering in text mining and analysis?
  2. How can we use a mixture model to do document clustering? 1. How many parameters are there in such a model?
  3. How is the mixture model for document clustering related to a topic model such as PLSA? In what way are they similar? Where are they different?
  4. How do we determine the cluster for each document after estimating all the parameters of a mixture model?
  5. How does hierarchical agglomerative clustering work? How do single-link, complete-link, and average-link work for computing group similarity? Which of these three ways of computing group similarity is least sensitive to outliers in the data?
  6. How do we evaluate clustering results?
  7. What is text categorization? What are some applications of text categorization?
  8. What does the training data for categorization look like?
  9. How does the Naïve Bayes classifier work?
  10. Why do we often use logarithm in the scoring function for Naïve Bayes?

4.1 Text Clustering: Motivation

image.png image.png image.png

4.2 Text Clustering: Generative Probabilistic Models Part 1

image.png image.png

每篇文章只有一个主题,才可以做 Cluster

image.png image.png image.png image.png image.png
  1. 对于文章中的每个词: Cluster Model 选择文档只选择一次;Topic Model 每个词都选择一次
  2. Cluster Model: Word Distribution 产生文章中的每一个词;Topic Model 不一定Word Distribution 就能产生所有文章中的词,可以在别的 Topic 中产生
image.png

L:#文章中的单词数

4.3 Text Clustering: Generative Probabilistic Models Part 2

image.png

如何从2个 Cluster拓展到 N 个 Cluster

image.png image.png

相关文章

网友评论

      本文标题:Week 4 (Text Mining)

      本文链接:https://www.haomeiwen.com/subject/rzsfnxtx.html