参考官方教程:https://cran.r-project.org/web/packages/mHMMbayes/vignettes/tutorial-mhmm.html
mHMMbayes包特点
1、Multiple individuals simultaneously
- Using a multilevel framework, we allow for heterogeneity in the model parameters (transition probability matrix and conditional distribution), while estimating one overall HMM.
- 即可同时对多组的观测数据进行预测。不仅会预测每组的参数,而且也能得出总体的HMM参数
2、Multivariate data with a categorical distribution
- 即允许有多类隐状态预测值,进行综合分析;
- 例如天气hidden stata对应 天气潮湿情况、雨伞销量,骑车人数等多类观测结果。
- 但是至少对于mHMMbayes包而言,这些观测结果必须为分类资料
3、Bayesian estimation to parameter
- Parameters are estimated using Bayesian estimation utilizing the forward-backward recursion within a hybrid Metropolis within Gibbs sampler
- In the package mHMMbayes, authors chose to use Bayesian estimation because of its flexibility, which we will require in the multilevel framework of the model.
4、Viterbi algorithm
- The package also includes a function to simulate data and a function to obtain the most likely hidden state sequence for each individual using the Viterbi algorithm.
R包分析实战
0、了解示例数据
#install.packages("mHMMbayes")
library(mHMMbayes)
data(nonverbal)
head(nonverbal)
table(nonverbal[,1])
如下图,示例数据是15分钟内,10对医患非言语的交流情况观测。一秒钟记录一次,共900次观测,共有四类观测。
- p_verbalizing: verbalizing behavior of the patient, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
- p_looking: looking behavior of the patient, consisting of 1 = not looking at therapist, 2 = looking at therapist.
- t_verbalizing: verbalizing behavior of the therapist, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
- t_looking: looking behavior of the therapist, consisting of 1 = not looking at patient, 2 = looking at patient. The top 6 rows of the dataset are provided below.
2、初始值设置
- 虽然设置的初始值在后续会使用Bayesian estimation调整优化;
- 最好还是根据对数据分布的理解以及预期结果,设置合适的初始值;
- 因为Using sensible starting values increases convergence speed, and often prevents a problem called ‘label switching’.
Label Switching:one should check if the algorithm reaches the same solution when a set of different (but often conceptually similar) starting values are used
#the number of states used
m <- 2
#the number of dependent variables in the dataset used to infer the hidden states
n_dep <- 4
#the number of categorical outcomes for each of the dependent variables
q_emiss <- c(3, 2, 3, 2)
#the transition probability matrix
start_TM <- diag(.8, m)
start_TM[lower.tri(start_TM) | upper.tri(start_TM)] <- .2
#the emission distribution(s)
start_EM <- list(matrix(c(0.05, 0.90, 0.05,
0.90, 0.05, 0.05), byrow = TRUE,
nrow = m, ncol = q_emiss[1]), # vocalizing patient
matrix(c(0.1, 0.9,
0.1, 0.9), byrow = TRUE, nrow = m,
ncol = q_emiss[2]), # looking patient
matrix(c(0.90, 0.05, 0.05,
0.05, 0.90, 0.05), byrow = TRUE,
nrow = m, ncol = q_emiss[3]), # vocalizing therapist
matrix(c(0.1, 0.9,
0.1, 0.9), byrow = TRUE, nrow = m,
ncol = q_emiss[4])) # looking therapist
- 如上start_EM为一个list,体现了Multivariate的特点
3、Fitting the model
# Run a model without covariate(s) and default priors:
set.seed(14532)
out_2st <- mHMM(s_data = nonverbal,
gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss),
start_val = c(list(start_TM), start_EM),
mcmc = list(J = 1000, burn_in = 200))
#01:10:10 耗时比较久
- The arguments needed for the MCMC algorithm are given in mcmc: J specifies the number of iterations used by the hybrid metropolis within Gibbs algorithm and burn_in specifies the number of iterations to discard when obtaining the model parameter summary statistics.
out_2st
out_2st
#总体参数
summary(out_2st)
-
可以根据下图的结果,视图解释两个stata的意义
summary(out_2st)
#想看看对每个subject的预测参数情况的话
obtain_gamma(out_2st, level = "subject")
obtain_gamma(out_2st, level = "subject")
4、Visualization
可分别对TM、EM矩阵进行可视化
- EM
library(RColorBrewer)
Voc_col <- c(brewer.pal(3,"PuBuGn")[c(1,3,2)])
Voc_lab <- c("Not Speaking", "Speaking", "Back channeling")
plot(out_2st, component = "emiss", dep = 1, col = Voc_col,
parameter = "emiss", dep_lab = c("Patient vocalizing"), cat_lab = Voc_lab)
如下图,对第一类观测结果p_vocalization的结果可视化。虚线代表10个subject;实线代表总体值
TM
- TM
# Transition probabilities at the group level and for subject number 1, respectively:
gamma_pop <- obtain_gamma(out_2st)
gamma_subj <- obtain_gamma(out_2st, level = "subject")
plot(gamma_pop, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
plot(gamma_subj, subj_nr = 1, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
TM group level
TM subject1
5、Viterbi algorithm decoding stata seq
- 根据模型结果,并提供观测结果,利用维特比算法,推测每次观测最有可能的隐状态序列
state_seq <- vit_mHMM(out_2st, s_data = nonverbal)
head(state_seq)
head(nonverbal)
result
网友评论