美文网首页
2. NLP with Probabilistic Models

2. NLP with Probabilistic Models

作者: Kevin不会创作 | 来源:发表于2020-12-03 08:33 被阅读0次

Table of Contents

  • Autocorrect
  • Part of Speech Tagging and Hidden Markov Models
  • Autocomplete and Language Models

Autocorrect

  • What is autocorrect?

    Autocorrect is an application that changes misspelled words into the correct ones.

  • How it works?

    1. Identify a misspelled word
    2. Find strings n edit distance away
    3. Filter candidates
    4. Calculate word probabilities
  • Identify a misspelled word

if word not in vocab:
    misspelled = True
  • Find strings n edit distance away

    • Edit: an operation performed on a string to change it
    • Edit distance counts the number of edit operations
    • n is usually one to three edits
    • Insert (add a letter)
      to: top, two
    • Delete (remove a letter)
      hat: ha, at, ht
    • Switch (swap 2 adjacent letters)
      eta: eat, tea
    • Replace (change 1 letter to another)
      jaw: jar, paw
  • Filter candidates

    Compare these strings to a known dictionary or vocabulary. If the string does not appear in the dictionary, remove it from the list of candidates.

  • Calculate word probabilities

    P(w)=\frac{C(w)}{V}

    • P(w) Probability of a word
    • C(w) Number of times the word appears
    • V Total size of the corpus

Minimum edit distance

  • How to evaluate similarity between 2 strings?

  • Minimum number of edits needed to transform 1 string into the other

  • Applications

    • Spelling correction
    • Document similarity
    • Machine translation
    • DNA sequencing
  • Edits (Edit cost)

    • Insert (1)
    • Delete (1)
    • Replace (2)
  • Example

    Source: play
    Target: stay

    • p\rightarrows : replace
    • l\rightarrowt : replace
    • Edit distance = 2 + 2 = 4

Part of Speech Tagging and Hidden Markov Models

  • What is part of speech tagging?

    Part of speech refers to the category of words or the lexical terms in the language. Examples of these lexical terms in the English language would be noun, verb, adjective, adverb, pronoun, preposition, etc. The process of assigning these tags to the words of a sentence or your corpus is referred to as parts of speech tagging, or POS tagging for short.

Markov Chains

A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

You can use a table called transition matrix to store the states and transition probabilities. Each row in the matrix represents transition probabilities of one state to all other states. Notice that for all the odds going transition probabilities from a given state, the sum of these transition probabilities should always be one. To assign a part of speech tag to the first word in the sentence, you can introduce what is known as an initial state by you include these probabilities in the table.

Transition Matrix

Hidden Markov Model

The name hidden Markov model implies that states are hidden or not directly observable. Going back to the Markov model that has the states for the parts of speech, such as noun, verb, or other, you can now think of these as hidden states because these are not directly observable from the text data.

The hidden Markov model also has additional probabilities known as emission probabilities. These describe the transition from the hidden states of your hidden Markov model, which are parts of speech seen here as circles for noun, verb, and other, to the observables or the words of your corpus.

Emission Matrix

Calculation

  • Calculate Transition Probabilities

    1. Count occurrences of tag pairs C(t_{i-1},t_i)

    2. Calculate probabilities using the counts
      P(t_i|t_{i-1})=\frac{C(t_{i-1},t_i)}{\sum_{j=1}^{N}C(t_{i-1},t_{j})}

  • Populating the Transition Matrix

    Transition Matrix

    To avoid a division by zero, you can change your formula slightly
    P(t_i|t_{i-1})=\frac{C(t_{i-1},t_i)+\epsilon}{\sum_{j=1}^{N}C(t_{i-1},t_{j})+N*{\epsilon}}

  • Populating the Emission Matrix

    Emission Matrix

    P(w_i|t_i)=\frac{C(t_{i},w_i)+\epsilon}{\sum_{j=1}^{V}C(t_{i},w_{j})+N*{\epsilon}}=\frac{C(t_{i},w_i)+\epsilon}{C(t_i)+N*{\epsilon}}

Viterbi Algorithm

If you're given a sentence like "Why not learn something?", what is the most likely sequence of parts of speech tags given the sentence in your model? The sequence can be computed using the Viterbi algorithm.

The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM).

Viterbi Algorithm
  • Initialization

    Given your transition and emission probabilities, you first populates and then use the auxiliary matrices C and D. The matrix C holds the intermediate optimal probabilities and matrix D, the indices of the visited states.

    In the initialization step, the first column of each of our matrices, C and D is populated. The first column of C represents probability of the transitions from the start states by in the graph to the first tag t_i and word W_1.

    Initialization

    In the D matrix, you store the labels that represent the different states you're traversing when finding the most likely sequence of parts of speech tags for the given sequence of words, W_1 all the way to W_k.

    Initialization
  • Forward Pass

    Forward
    • b_{i,cindex(w_2)} is the emission probability from tag t_1 toward w_2
    • a_{k,i} is the transition probability from the parts of speech tag t_k to the current tag t_1
    • c_{k,j-1} is the probability for the preceding path you've traversed
  • Backward Pass

    You'll retrieve the most likely sequence of parts of speech tags for your given sequence of words using Backward pass.

    Backward Backward

Autocomplete and Language Models

  • What language model can do?

    • Autocomplete a sentence

      • Estimate probability of word sequences
      • Estimate probability of a word following a sequence of words
    • Speech recognition
      Example: P(I saw a van) > P(eyes awe of an)

    • Spelling correction
      Example: P(entered the shop to buy) > P(entered the ship to buy)

    • Augmentative communication
      Predict most likely word from menu for people unable to physically talk or sign

N-grams Language Model

  • What is N-gram?

    An N-gram is a sequence of N words.

    • Example

      Corpus: I am happy because I am learning
      Unigrams: {I, am, happy, because, learning}
      Bigrams: {I am, am happy, happy because, ...}
      Trigrams: {I am happy, am happy because, ...}

  • Probability

    • Unigram probability
      P(w)=\frac{C(w)}{m}

    • Bigram probability
      P(y|x)=\frac{C(xy)}{\sum_{w}C(xw)}=\frac{C(xy)}{C(x)}

    • Trigram probability
      P(w_3|w_1^2)=\frac{C(w_1^2{w_3})}{C(w_1^2)}

    • N-gram probability
      P(w_N|w_1^{N-1})=\frac{C(w_1^{N-1}w_N)}{C(w_1^{N-{1}})}

  • Sequence Probability

相关文章

网友评论

      本文标题:2. NLP with Probabilistic Models

      本文链接:https://www.haomeiwen.com/subject/lpmpwktx.html