美文网首页收入即学习
HMM R包学习之 mHMMbayes

HMM R包学习之 mHMMbayes

作者: 小贝学生信 | 来源:发表于2020-11-10 00:45 被阅读0次

参考官方教程:https://cran.r-project.org/web/packages/mHMMbayes/vignettes/tutorial-mhmm.html

mHMMbayes包特点

1、Multiple individuals simultaneously

  • Using a multilevel framework, we allow for heterogeneity in the model parameters (transition probability matrix and conditional distribution), while estimating one overall HMM.
  • 即可同时对多组的观测数据进行预测。不仅会预测每组的参数,而且也能得出总体的HMM参数

2、Multivariate data with a categorical distribution

  • 即允许有多类隐状态预测值,进行综合分析;
  • 例如天气hidden stata对应 天气潮湿情况、雨伞销量,骑车人数等多类观测结果。
  • 但是至少对于mHMMbayes包而言,这些观测结果必须为分类资料

3、Bayesian estimation to parameter

  • Parameters are estimated using Bayesian estimation utilizing the forward-backward recursion within a hybrid Metropolis within Gibbs sampler
  • In the package mHMMbayes, authors chose to use Bayesian estimation because of its flexibility, which we will require in the multilevel framework of the model.

4、Viterbi algorithm

  • The package also includes a function to simulate data and a function to obtain the most likely hidden state sequence for each individual using the Viterbi algorithm.

R包分析实战

0、了解示例数据

#install.packages("mHMMbayes")
library(mHMMbayes)
data(nonverbal)
head(nonverbal)
table(nonverbal[,1])

如下图,示例数据是15分钟内,10对医患非言语的交流情况观测。一秒钟记录一次,共900次观测,共有四类观测。

  • p_verbalizing: verbalizing behavior of the patient, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
  • p_looking: looking behavior of the patient, consisting of 1 = not looking at therapist, 2 = looking at therapist.
  • t_verbalizing: verbalizing behavior of the therapist, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
  • t_looking: looking behavior of the therapist, consisting of 1 = not looking at patient, 2 = looking at patient. The top 6 rows of the dataset are provided below.
data

2、初始值设置

  • 虽然设置的初始值在后续会使用Bayesian estimation调整优化;
  • 最好还是根据对数据分布的理解以及预期结果,设置合适的初始值;
  • 因为Using sensible starting values increases convergence speed, and often prevents a problem called ‘label switching’.

Label Switching:one should check if the algorithm reaches the same solution when a set of different (but often conceptually similar) starting values are used

#the number of states used
m <- 2
#the number of dependent variables in the dataset used to infer the hidden states
n_dep <- 4
#the number of categorical outcomes for each of the dependent variables
q_emiss <- c(3, 2, 3, 2)
#the transition probability matrix
start_TM <- diag(.8, m)
start_TM[lower.tri(start_TM) | upper.tri(start_TM)] <- .2
#the emission distribution(s)
start_EM <- list(matrix(c(0.05, 0.90, 0.05, 
                          0.90, 0.05, 0.05), byrow = TRUE,
                        nrow = m, ncol = q_emiss[1]), # vocalizing patient
                 matrix(c(0.1, 0.9, 
                          0.1, 0.9), byrow = TRUE, nrow = m,
                        ncol = q_emiss[2]), # looking patient
                 matrix(c(0.90, 0.05, 0.05, 
                          0.05, 0.90, 0.05), byrow = TRUE,
                        nrow = m, ncol = q_emiss[3]), # vocalizing therapist
                 matrix(c(0.1, 0.9, 
                          0.1, 0.9), byrow = TRUE, nrow = m,
                        ncol = q_emiss[4])) # looking therapist
  • 如上start_EM为一个list,体现了Multivariate的特点

3、Fitting the model

# Run a model without covariate(s) and default priors:
set.seed(14532)
out_2st <- mHMM(s_data = nonverbal, 
                gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss), 
                start_val = c(list(start_TM), start_EM),
                mcmc = list(J = 1000, burn_in = 200))
#01:10:10 耗时比较久
  • The arguments needed for the MCMC algorithm are given in mcmc: J specifies the number of iterations used by the hybrid metropolis within Gibbs algorithm and burn_in specifies the number of iterations to discard when obtaining the model parameter summary statistics.
out_2st
out_2st
#总体参数
summary(out_2st)
  • 可以根据下图的结果,视图解释两个stata的意义


    summary(out_2st)
#想看看对每个subject的预测参数情况的话
obtain_gamma(out_2st, level = "subject")
obtain_gamma(out_2st, level = "subject")

4、Visualization

可分别对TM、EM矩阵进行可视化

  • EM
library(RColorBrewer)
Voc_col <- c(brewer.pal(3,"PuBuGn")[c(1,3,2)])
Voc_lab <- c("Not Speaking", "Speaking", "Back channeling")

plot(out_2st, component = "emiss", dep = 1, col = Voc_col, 
     parameter = "emiss", dep_lab = c("Patient vocalizing"), cat_lab = Voc_lab)

如下图,对第一类观测结果p_vocalization的结果可视化。虚线代表10个subject;实线代表总体值


TM
  • TM
# Transition probabilities at the group level and for subject number 1, respectively:
gamma_pop <- obtain_gamma(out_2st)
gamma_subj <- obtain_gamma(out_2st, level = "subject")
plot(gamma_pop, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
plot(gamma_subj, subj_nr = 1, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
TM group level
TM subject1

5、Viterbi algorithm decoding stata seq

  • 根据模型结果,并提供观测结果,利用维特比算法,推测每次观测最有可能的隐状态序列
state_seq <- vit_mHMM(out_2st, s_data = nonverbal)
head(state_seq)
head(nonverbal)
result

相关文章

网友评论

    本文标题:HMM R包学习之 mHMMbayes

    本文链接:https://www.haomeiwen.com/subject/mrqdbktx.html