Table of Contents
- Autocorrect
- Part of Speech Tagging and Hidden Markov Models
- Autocomplete and Language Models
Autocorrect
-
What is autocorrect?
Autocorrect is an application that changes misspelled words into the correct ones.
-
How it works?
- Identify a misspelled word
- Find strings n edit distance away
- Filter candidates
- Calculate word probabilities
-
Identify a misspelled word
if word not in vocab:
misspelled = True
-
Find strings n edit distance away
- Edit: an operation performed on a string to change it
- Edit distance counts the number of edit operations
- n is usually one to three edits
- Insert (add a letter)
to:top
,two
- Delete (remove a letter)
hat:ha
,at
,ht
- Switch (swap 2 adjacent letters)
eta:eat
,tea
- Replace (change 1 letter to another)
jaw:jar
,paw
-
Filter candidates
Compare these strings to a known dictionary or vocabulary. If the string does not appear in the dictionary, remove it from the list of candidates.
-
Calculate word probabilities
-
Probability of a word
-
Number of times the word appears
-
Total size of the corpus
-
Minimum edit distance
-
How to evaluate similarity between 2 strings?
-
Minimum number of edits needed to transform 1 string into the other
-
Applications
- Spelling correction
- Document similarity
- Machine translation
- DNA sequencing
-
Edits (Edit cost)
- Insert (1)
- Delete (1)
- Replace (2)
-
Example
Source:
play
Target:stay
-
p
s
: replace -
l
t
: replace - Edit distance = 2 + 2 = 4
-
Part of Speech Tagging and Hidden Markov Models
-
What is part of speech tagging?
Part of speech refers to the category of words or the lexical terms in the language. Examples of these lexical terms in the English language would be noun, verb, adjective, adverb, pronoun, preposition, etc. The process of assigning these tags to the words of a sentence or your corpus is referred to as parts of speech tagging, or POS tagging for short.
Markov Chains
A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.
You can use a table called transition matrix to store the states and transition probabilities. Each row in the matrix represents transition probabilities of one state to all other states. Notice that for all the odds going transition probabilities from a given state, the sum of these transition probabilities should always be one. To assign a part of speech tag to the first word in the sentence, you can introduce what is known as an initial state by you include these probabilities in the table.

Hidden Markov Model
The name hidden Markov model implies that states are hidden or not directly observable. Going back to the Markov model that has the states for the parts of speech, such as noun, verb, or other, you can now think of these as hidden states because these are not directly observable from the text data.
The hidden Markov model also has additional probabilities known as emission probabilities. These describe the transition from the hidden states of your hidden Markov model, which are parts of speech seen here as circles for noun, verb, and other, to the observables or the words of your corpus.

Calculation
-
Calculate Transition Probabilities
-
Count occurrences of tag pairs
-
Calculate probabilities using the counts
-
-
Populating the Transition Matrix
Transition Matrix
To avoid a division by zero, you can change your formula slightly
-
Populating the Emission Matrix
Emission Matrix
Viterbi Algorithm
If you're given a sentence like "Why not learn something?", what is the most likely sequence of parts of speech tags given the sentence in your model? The sequence can be computed using the Viterbi algorithm.
The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM).

-
Initialization
Given your transition and emission probabilities, you first populates and then use the auxiliary matrices C and D. The matrix C holds the intermediate optimal probabilities and matrix D, the indices of the visited states.
In the initialization step, the first column of each of our matrices, C and D is populated. The first column of C represents probability of the transitions from the start states by in the graph to the first tag
and word
.
Initialization
In the D matrix, you store the labels that represent the different states you're traversing when finding the most likely sequence of parts of speech tags for the given sequence of words,
all the way to
.
Initialization
-
Forward Pass
Forward
-
is the emission probability from tag
toward
-
is the transition probability from the parts of speech tag
to the current tag
-
is the probability for the preceding path you've traversed
-
-
Backward Pass
You'll retrieve the most likely sequence of parts of speech tags for your given sequence of words using Backward pass.
Backward
Backward
Autocomplete and Language Models
-
What language model can do?
-
Autocomplete a sentence
- Estimate probability of word sequences
- Estimate probability of a word following a sequence of words
-
Speech recognition
Example: P(I saw a van) > P(eyes awe of an) -
Spelling correction
Example: P(entered the shop to buy) > P(entered the ship to buy) -
Augmentative communication
Predict most likely word from menu for people unable to physically talk or sign
-
N-grams Language Model
-
What is N-gram?
An N-gram is a sequence of N words.
-
Example
Corpus:
I am happy because I am learning
Unigrams: {I
,am
,happy
,because
,learning
}
Bigrams: {I am
,am happy
,happy because
, ...}
Trigrams: {I am happy
,am happy because
, ...}
-
-
Probability
-
Unigram probability
-
Bigram probability
-
Trigram probability
-
N-gram probability
-
-
Sequence Probability
网友评论