【1】MDP(S【状态集】,A【动作集】,{Psa}【状态转换分布】,γ【贴现因子】,R【奖励函数】)
【过程】
从状态0出发,选择一个动作a0,
![](https://img.haomeiwen.com/i11797539/ced9cd62f85e79c0.png)
选择a1,
![](https://img.haomeiwen.com/i11797539/e9b778b4951c8730.png)
总的回报:
![](https://img.haomeiwen.com/i11797539/3ea6eedf22c964c2.png)
选择活动使其最大:
![](https://img.haomeiwen.com/i11797539/3d6873165adf83da.png)
政策policy:
![](https://img.haomeiwen.com/i11797539/844bc16e339924dd.png)
定义值函数:
![](https://img.haomeiwen.com/i11797539/eb9fd313ff96a44b.png)
【2】隐马尔科夫模型
三要素 λ=(A,B,π)
两个基本假设:
(1)齐次马尔可夫性假设,隐马尔科夫链t的状态只和t-1状态有关。
![](https://img.haomeiwen.com/i11797539/d3028f936fe2dbe2.png)
(2)观测独立性假设,观测只和当前时刻状态有关。
![](https://img.haomeiwen.com/i11797539/8966d2be0e84850c.png)
观测序列生成:
输入:隐马尔科夫模型 λ=(A,B,π)观测序列长度T
![](https://img.haomeiwen.com/i11797539/166a185f6fe0ad1e.png)
![](https://img.haomeiwen.com/i11797539/888fb597078c2248.png)
(2)令t=1
![](https://img.haomeiwen.com/i11797539/9caa7628682f7d6a.png)
![](https://img.haomeiwen.com/i11797539/2a655b18bbd5b601.png)
(5)令t=t+1,如果t<T,转(3),否则终止。
隐马尔科夫三个基本问题:
(1)概率计算
【前向算法】
![](https://img.haomeiwen.com/i11797539/16f7ba717715483c.png)
输入:隐马尔科夫模型λ,观测序列O
输出:观测序列概率P(O|λ)
初值:
![](https://img.haomeiwen.com/i11797539/c422bc76630d00ee.png)
递推:
![](https://img.haomeiwen.com/i11797539/e5daf7dbb9c18611.png)
终止:
![](https://img.haomeiwen.com/i11797539/e7bc73fd5eb87580.png)
【后向算法】
![](https://img.haomeiwen.com/i11797539/f4d4024852ad310c.png)
输入:λ,O
输出:p(O|λ)
![](https://img.haomeiwen.com/i11797539/d6e132f481034e94.png)
(ii)对t=T-1,T-2,...,1
![](https://img.haomeiwen.com/i11797539/5662fcb4cc31aadd.png)
(iii)
![](https://img.haomeiwen.com/i11797539/2b65bd6571ab1cce.png)
(2)学习算法
【监督学习算法】
【baum-welch算法】
![](https://img.haomeiwen.com/i11797539/d3d38e7ccf20c008.png)
![](https://img.haomeiwen.com/i11797539/ad1c6eaa378d10da.png)
![](https://img.haomeiwen.com/i11797539/8ae1ab283f7f2d5f.png)
![](https://img.haomeiwen.com/i11797539/f4902c02583851e6.png)
![](https://img.haomeiwen.com/i11797539/048c371b14484a44.png)
![](https://img.haomeiwen.com/i11797539/20d7baae45c51d00.png)
![](https://img.haomeiwen.com/i11797539/6a25c7df2c20a448.png)
![](https://img.haomeiwen.com/i11797539/e1ee5d93e85b3dfd.png)
![](https://img.haomeiwen.com/i11797539/35592c5b69236c10.png)
(3)预测计算
【近似算法】
![](https://img.haomeiwen.com/i11797539/59869624f7ef27c7.png)
![](https://img.haomeiwen.com/i11797539/192145189eae3c81.png)
【维特比算法】
动态规划解概率最大路径,一个路径对应一个状态序列。
网友评论