Lasko etal.(2013)
1. Procedure
Input (Sparse, noisy, irregular observations on serum uric acid concentration) -> Gaussian process regression (Transforming the raw data into a continuous longitudinal probability density) -> Autoencoder (The process of feature learning which uses 30-day continuous elements of input vector) -> Output (Learned features/phenotypes of the first layer and second layer)
2. Phenotypes (acid concentration in 30-day span):
a) Phenotypes of the first layer: Wi, each row in the weight matrix.
b) Types of features in the first layer: uphill/downhill, single/multiple-spot, short-/long-edge, mixed.
c) Phenotypes of the second layer: nonlinear combination of first-layer features

3. Evaluation:
a) Face validity: Features are continuous without any mandate. However, the regularization and sparsity constraints are required for this continuity (Can be wrong).
b) Population subtypes: Besides separating phenotypes of gout and leukemia, learned feature sets (first-layer and second-layer features, in contrast to expert engineered features) also show additional cluster structure by embedding features into two-dimensional space using t-SNE (Visualized in clusters).
c) Generalized discrimination performance for distinguishing gout and leukemia: Classifiers using logistic regression with four different feature sets: 1) first-layer; 2) second-layer; 3) expert-engineered; 4) sequence mean (baseline).
Harutyunyan et al. (2017)
1. Procedure:
Input (Time-series clinical observations (e.g. capillary refill rate, blood pressure, etc.) of ICU stays across 40,000 critical care patients. Patients who are neonatal, pediatric or with multiple ICU stays are excluded.) -> Output (Predicted vector of binary phenotype labels)
2. Phenotypes:
25 common diseases which are classified into chronic, acute and mixed type.

xt : clinical observations at hour t
pi:k: vector of k binary phenotype labels. Phenotype matrix is only predicted at the last timestep T.
3. Evaluation:
Multitask LSTM vs. single-task (linear regression with hand-engineered features and single-task LSTM)
Ho et al. (2014)
1. Procedure:
Input (Counts of con-occurrences of clinical measurement between various mode (parents*procedures*diagnoses)) -> Marble: Non-negative Poisson tensor decomposition to the data. -> Output: Tensor V which is used to define R candidate phenotypes (M=[C,V])
2. Evaluation:
Similarity of Non-zeros between computed solution and actual solution
a) Simulated dataset:
b) Realistic HER dataset
Glickberg et al. (2018)
1. Procedure
Input (Disease (ICD-9 Code), procedure, lab tests & medication) -----> word2vec (regarding a sequence of medical concepts during a time interval as a sentence) ——> clinical embeddings ------> extract disease cohorts for each patient and get the distance for each disease (query by medical concepts) ——> average of all of the distances

网友评论