美文网首页
NLP复习(Lecture 9-15)

NLP复习(Lecture 9-15)

作者: zcyzcy | 来源:发表于2018-11-02 09:47 被阅读0次

    Lecture 9 Language Model

    语言模型分两类——概率语言模型结构语言模型

    1. N-Gram Models
    • Estimate probability of each word given prior context.
    • Number of parameters required grows exponentially with the number of words of prior context
    • An N-gram model uses only N-1 words of prior context
      — unigram: P(phone)
      — Bigram: P(phone | cell)
      — Trigram: P(phone | your cell)
    2. Smoothing/Back-off
    3. Linear Interpolation
    • Linearly combine estimates of N-gram models of increasing order.


    Lecture 11 Part of Speech Tagging

    1. Hidden Markov Model
    • Sometimes it is not possible to know precisely which states the model passes through
    • We may observe some phenomena that occurs corresponding to state with probability distribution.
    • the state transition is hidden
    • the stochastic process of obervation is stochastic function of hidden state transition process
    • HMM Example
      — Two observations: 'Rain' and 'Dry'
      — Two hideen states: "Low' and 'High' 高气压和低气压
      — 转移概率(Transition probabilities 隐藏状态之间)



      — Observation probabilities



      — Initial probabilities: P('Low')=0.4 P('High')=0.6
    2. HMM 三种问题
    • Elvalution: Give the observation sequence and HMM model(A,B,π), how do we compute the probability of O given the model;给定观测序列和模型,计算观测序列的生成概率。(Forward-Backward algorithm)


    • Decoding:Given the observation sequence and an HMM model, how do we find the state sequence that best explains the observations;给定观测序列和模型,输出最有可能的隐藏序列。(Viterbi algorithm)
      — find the global optimal results through find stage optimal
      — if the best path ending in q_k = s_j goes through q_{k-1} = s_i then it should coincide with the best path ending in q_{k-1}=s_i
    • Learning:How do we adjust the model parameters to maximize P(O|model)
    • N-best algorithm is similar to HMM, and it keeps the N best paths.

    Lecture 12 Parsing

    1. Parsing Approaches for Context-Free Grammar(CFG)

    — Top-down Approach
    — Bottom-up Approach

    2. Regular Grammar
    • A regular is denoted as G=(V,T,P,S)
      — V are finite set of non-terminal variables
      — T are finite set of terminal variables
      — S are start symbol
      — P is a finite set of productions. Consist of productions like V->B
    • Left hand side: one non-terminal symbol
    • Right hand side: empty string / a terminal symbol/ a non-terminal sysbol following a terminal symbol
    3. Top-down Parsing
    • Begin with the start symbol S and produce the right hand side of the rule
    • Match the left-hand side of CFG rules to non-terminals in the string, replacing them with the right-hand side of the rule.
    • Continue untill all the non-terminals are replaced by terminals, such that they correspond to the symbols in the sentence.
    • The parse succeeds when all words in the sentence are generated.

    Lecture 14 Text Categorization

    1. Term Selection
    • Examples
      — Chi Square
      — Mutual Information
      — Information Gain
      — Information Ratio
      — Odd Ratio
    2. Feature Generation
    • Latent Semantic Indexing(LSI)
    • Explicit Semantic Indexing(ESI)
    3. Nearest-Neighbor Learning Algorithm
    • Compute similarity between x and all examples in D.
    • Assign x the category of the most similar example in D.
    4. Category Scoring for weighted-sum
    • The score for a category is the sum of the similarity scores between the point to be classified and all of its k-neighbors that belong to the given category.


    相关文章

      网友评论

          本文标题:NLP复习(Lecture 9-15)

          本文链接:https://www.haomeiwen.com/subject/lqagtqtx.html