HMM R包学习之 mHMMbayes

作者: 小贝学生信 | 来源:发表于2020-11-10 00:45 被阅读0次

HMM R包学习之 mHMMbayes
学习小组笔记Day6-Jocelyn
《学习小组Day6笔记--寒鹤》
R包学习之magrittr
R包学习之broom
学习小组Day5笔记--jam
学习小组Day6笔记——学生小幸
学习小组DAY6--帅炸
《学习小组day6笔记--（解杰）》
学习小组Day6笔记-田熙

参考官方教程：https://cran.r-project.org/web/packages/mHMMbayes/vignettes/tutorial-mhmm.html

mHMMbayes包特点

1、Multiple individuals simultaneously

Using a multilevel framework, we allow for heterogeneity in the model parameters (transition probability matrix and conditional distribution), while estimating one overall HMM.
即可同时对多组的观测数据进行预测。不仅会预测每组的参数，而且也能得出总体的HMM参数

2、Multivariate data with a categorical distribution

即允许有多类隐状态预测值，进行综合分析；
例如天气hidden stata对应天气潮湿情况、雨伞销量，骑车人数等多类观测结果。
但是至少对于mHMMbayes包而言，这些观测结果必须为分类资料

3、Bayesian estimation to parameter

Parameters are estimated using Bayesian estimation utilizing the forward-backward recursion within a hybrid Metropolis within Gibbs sampler
In the package mHMMbayes, authors chose to use Bayesian estimation because of its flexibility, which we will require in the multilevel framework of the model.

4、Viterbi algorithm

The package also includes a function to simulate data and a function to obtain the most likely hidden state sequence for each individual using the Viterbi algorithm.

R包分析实战

0、了解示例数据

#install.packages("mHMMbayes")
library(mHMMbayes)
data(nonverbal)
head(nonverbal)
table(nonverbal[,1])

如下图，示例数据是15分钟内，10对医患非言语的交流情况观测。一秒钟记录一次，共900次观测，共有四类观测。

p_verbalizing: verbalizing behavior of the patient, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
p_looking: looking behavior of the patient, consisting of 1 = not looking at therapist, 2 = looking at therapist.
t_verbalizing: verbalizing behavior of the therapist, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
t_looking: looking behavior of the therapist, consisting of 1 = not looking at patient, 2 = looking at patient. The top 6 rows of the dataset are provided below.

data

2、初始值设置

虽然设置的初始值在后续会使用Bayesian estimation调整优化；
最好还是根据对数据分布的理解以及预期结果，设置合适的初始值；
因为Using sensible starting values increases convergence speed, and often prevents a problem called ‘label switching’.

Label Switching：one should check if the algorithm reaches the same solution when a set of different (but often conceptually similar) starting values are used

#the number of states used
m <- 2
#the number of dependent variables in the dataset used to infer the hidden states
n_dep <- 4
#the number of categorical outcomes for each of the dependent variables
q_emiss <- c(3, 2, 3, 2)
#the transition probability matrix
start_TM <- diag(.8, m)
start_TM[lower.tri(start_TM) | upper.tri(start_TM)] <- .2
#the emission distribution(s)
start_EM <- list(matrix(c(0.05, 0.90, 0.05, 
                          0.90, 0.05, 0.05), byrow = TRUE,
                        nrow = m, ncol = q_emiss[1]), # vocalizing patient
                 matrix(c(0.1, 0.9, 
                          0.1, 0.9), byrow = TRUE, nrow = m,
                        ncol = q_emiss[2]), # looking patient
                 matrix(c(0.90, 0.05, 0.05, 
                          0.05, 0.90, 0.05), byrow = TRUE,
                        nrow = m, ncol = q_emiss[3]), # vocalizing therapist
                 matrix(c(0.1, 0.9, 
                          0.1, 0.9), byrow = TRUE, nrow = m,
                        ncol = q_emiss[4])) # looking therapist

如上start_EM为一个list，体现了Multivariate的特点

3、Fitting the model

# Run a model without covariate(s) and default priors:
set.seed(14532)
out_2st <- mHMM(s_data = nonverbal, 
                gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss), 
                start_val = c(list(start_TM), start_EM),
                mcmc = list(J = 1000, burn_in = 200))
#01:10:10 耗时比较久

The arguments needed for the MCMC algorithm are given in mcmc: J specifies the number of iterations used by the hybrid metropolis within Gibbs algorithm and burn_in specifies the number of iterations to discard when obtaining the model parameter summary statistics.

out_2st

out_2st

#总体参数
summary(out_2st)

可以根据下图的结果，视图解释两个stata的意义

summary(out_2st)

#想看看对每个subject的预测参数情况的话
obtain_gamma(out_2st, level = "subject")
obtain_gamma(out_2st, level = "subject")

4、Visualization

可分别对TM、EM矩阵进行可视化

EM

library(RColorBrewer)
Voc_col <- c(brewer.pal(3,"PuBuGn")[c(1,3,2)])
Voc_lab <- c("Not Speaking", "Speaking", "Back channeling")

plot(out_2st, component = "emiss", dep = 1, col = Voc_col, 
     parameter = "emiss", dep_lab = c("Patient vocalizing"), cat_lab = Voc_lab)

如下图，对第一类观测结果p_vocalization的结果可视化。虚线代表10个subject；实线代表总体值

TM

TM

# Transition probabilities at the group level and for subject number 1, respectively:
gamma_pop <- obtain_gamma(out_2st)
gamma_subj <- obtain_gamma(out_2st, level = "subject")
plot(gamma_pop, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
plot(gamma_subj, subj_nr = 1, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))

TM group level

TM subject1

5、Viterbi algorithm decoding stata seq

根据模型结果，并提供观测结果，利用维特比算法，推测每次观测最有可能的隐状态序列

state_seq <- vit_mHMM(out_2st, s_data = nonverbal)
head(state_seq)
head(nonverbal)

result

网友评论

收入即学习

本文标题：HMM R包学习之 mHMMbayes

本文链接：https://www.haomeiwen.com/subject/mrqdbktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

HMM R包学习之 mHMMbayes

mHMMbayes包特点

1、Multiple individuals simultaneously

2、Multivariate data with a categorical distribution

3、Bayesian estimation to parameter

4、Viterbi algorithm

R包分析实战

0、了解示例数据

2、初始值设置

3、Fitting the model

4、Visualization

5、Viterbi algorithm decoding stata seq

相关文章

HMM R包学习之 mHMMbayes

学习小组笔记Day6-Jocelyn

《学习小组Day6笔记--寒鹤》

R包学习之magrittr

R包学习之broom

学习小组Day5笔记--jam

学习小组Day6笔记——学生小幸

学习小组DAY6--帅炸

《学习小组day6笔记--（解杰）》

学习小组Day6笔记-田熙

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

收入即学习