美文网首页收入即学习
HMM R包学习之 mHMMbayes

HMM R包学习之 mHMMbayes

作者: 小贝学生信 | 来源:发表于2020-11-10 00:45 被阅读0次

    参考官方教程:https://cran.r-project.org/web/packages/mHMMbayes/vignettes/tutorial-mhmm.html

    mHMMbayes包特点

    1、Multiple individuals simultaneously

    • Using a multilevel framework, we allow for heterogeneity in the model parameters (transition probability matrix and conditional distribution), while estimating one overall HMM.
    • 即可同时对多组的观测数据进行预测。不仅会预测每组的参数,而且也能得出总体的HMM参数

    2、Multivariate data with a categorical distribution

    • 即允许有多类隐状态预测值,进行综合分析;
    • 例如天气hidden stata对应 天气潮湿情况、雨伞销量,骑车人数等多类观测结果。
    • 但是至少对于mHMMbayes包而言,这些观测结果必须为分类资料

    3、Bayesian estimation to parameter

    • Parameters are estimated using Bayesian estimation utilizing the forward-backward recursion within a hybrid Metropolis within Gibbs sampler
    • In the package mHMMbayes, authors chose to use Bayesian estimation because of its flexibility, which we will require in the multilevel framework of the model.

    4、Viterbi algorithm

    • The package also includes a function to simulate data and a function to obtain the most likely hidden state sequence for each individual using the Viterbi algorithm.

    R包分析实战

    0、了解示例数据

    #install.packages("mHMMbayes")
    library(mHMMbayes)
    data(nonverbal)
    head(nonverbal)
    table(nonverbal[,1])
    

    如下图,示例数据是15分钟内,10对医患非言语的交流情况观测。一秒钟记录一次,共900次观测,共有四类观测。

    • p_verbalizing: verbalizing behavior of the patient, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
    • p_looking: looking behavior of the patient, consisting of 1 = not looking at therapist, 2 = looking at therapist.
    • t_verbalizing: verbalizing behavior of the therapist, consisting of 1 = not verbalizing, 2 = verbalizing, 3 = back channeling.
    • t_looking: looking behavior of the therapist, consisting of 1 = not looking at patient, 2 = looking at patient. The top 6 rows of the dataset are provided below.
    data

    2、初始值设置

    • 虽然设置的初始值在后续会使用Bayesian estimation调整优化;
    • 最好还是根据对数据分布的理解以及预期结果,设置合适的初始值;
    • 因为Using sensible starting values increases convergence speed, and often prevents a problem called ‘label switching’.

    Label Switching:one should check if the algorithm reaches the same solution when a set of different (but often conceptually similar) starting values are used

    #the number of states used
    m <- 2
    #the number of dependent variables in the dataset used to infer the hidden states
    n_dep <- 4
    #the number of categorical outcomes for each of the dependent variables
    q_emiss <- c(3, 2, 3, 2)
    #the transition probability matrix
    start_TM <- diag(.8, m)
    start_TM[lower.tri(start_TM) | upper.tri(start_TM)] <- .2
    #the emission distribution(s)
    start_EM <- list(matrix(c(0.05, 0.90, 0.05, 
                              0.90, 0.05, 0.05), byrow = TRUE,
                            nrow = m, ncol = q_emiss[1]), # vocalizing patient
                     matrix(c(0.1, 0.9, 
                              0.1, 0.9), byrow = TRUE, nrow = m,
                            ncol = q_emiss[2]), # looking patient
                     matrix(c(0.90, 0.05, 0.05, 
                              0.05, 0.90, 0.05), byrow = TRUE,
                            nrow = m, ncol = q_emiss[3]), # vocalizing therapist
                     matrix(c(0.1, 0.9, 
                              0.1, 0.9), byrow = TRUE, nrow = m,
                            ncol = q_emiss[4])) # looking therapist
    
    • 如上start_EM为一个list,体现了Multivariate的特点

    3、Fitting the model

    # Run a model without covariate(s) and default priors:
    set.seed(14532)
    out_2st <- mHMM(s_data = nonverbal, 
                    gen = list(m = m, n_dep = n_dep, q_emiss = q_emiss), 
                    start_val = c(list(start_TM), start_EM),
                    mcmc = list(J = 1000, burn_in = 200))
    #01:10:10 耗时比较久
    
    • The arguments needed for the MCMC algorithm are given in mcmc: J specifies the number of iterations used by the hybrid metropolis within Gibbs algorithm and burn_in specifies the number of iterations to discard when obtaining the model parameter summary statistics.
    out_2st
    
    out_2st
    #总体参数
    summary(out_2st)
    
    • 可以根据下图的结果,视图解释两个stata的意义


      summary(out_2st)
    #想看看对每个subject的预测参数情况的话
    obtain_gamma(out_2st, level = "subject")
    obtain_gamma(out_2st, level = "subject")
    

    4、Visualization

    可分别对TM、EM矩阵进行可视化

    • EM
    library(RColorBrewer)
    Voc_col <- c(brewer.pal(3,"PuBuGn")[c(1,3,2)])
    Voc_lab <- c("Not Speaking", "Speaking", "Back channeling")
    
    plot(out_2st, component = "emiss", dep = 1, col = Voc_col, 
         parameter = "emiss", dep_lab = c("Patient vocalizing"), cat_lab = Voc_lab)
    

    如下图,对第一类观测结果p_vocalization的结果可视化。虚线代表10个subject;实线代表总体值


    TM
    • TM
    # Transition probabilities at the group level and for subject number 1, respectively:
    gamma_pop <- obtain_gamma(out_2st)
    gamma_subj <- obtain_gamma(out_2st, level = "subject")
    plot(gamma_pop, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
    plot(gamma_subj, subj_nr = 1, col = rep(rev(brewer.pal(3,"PiYG"))[-2], each = m))
    
    TM group level
    TM subject1

    5、Viterbi algorithm decoding stata seq

    • 根据模型结果,并提供观测结果,利用维特比算法,推测每次观测最有可能的隐状态序列
    state_seq <- vit_mHMM(out_2st, s_data = nonverbal)
    head(state_seq)
    head(nonverbal)
    
    result

    相关文章

      网友评论

        本文标题:HMM R包学习之 mHMMbayes

        本文链接:https://www.haomeiwen.com/subject/mrqdbktx.html