TCRGP is a novel Gaussian process method that can predict if TCRs recognize certain epitopes. This method can utilize different CDR sequences from both TCRα and TCRβ chains from single-cell data and learn which CDRs are important in recognizing the different epitopes.
- It is well known that the CDR3β of a TCR is important in recognizing peptides presented to the T cell.
- We propose a method called TCRGP which builds on non-parametric modelling using Gaussian process (GP) classification. The probabilistic formulation of GPs allows robust model inference already from small data sets, which is a great benefit as currently there exists very limited amounts of reported TCR-epitope interactions in curated databases.
一、下载
安装好tensorflow和GPflow
如果网络太慢,可以使用镜像安装
直接下载github上面整个的zip,并解压使用
ATTENTION!!!!!!!!!!!
必须按照
to use TCRGP, you will need to have
- TensorFlow (We have used version 1.8.0)---对应Python为3.6(最好在对应的虚拟环境安装)
- GPflow (We have used version 1.1.1)
关于如何在linux jupyter notebook使用conda虚拟环境,使用的方法二
下载好后在对应Conda环境中输入jupyter notebook
,然后输入对应网址即可
如果报错ImportError: cannot import name 'secure_write'
二、导入
将jupyter notebook的默认读取路径设置到上述tcrgp解压位置
%pwd #获得路径
%cd #更改路径
import tcrgp
注意,对于虚拟环境中安装包的位置应指定到对应路径,如pip3 install matplotlib -i https://pypi.tuna.tsinghua.edu.cn/simple/ -t /home/user/test/miniconda3/envs/tfpy3/lib/python3.6/site-packages
查看安装包的位置:
import tensorflow
print(tensorflow.__path__)
三、训练集获得
如图,导出tsv文件进一步筛选:
- confidence score of at least 1
- 选择所需的物种,如小鼠
- 选择所需的HLA亚型。对于人类,可以看HLA-A*02,对于小鼠,可以筛选所需品系的小鼠。
- 至少存在一条识别序列包括50条TCRB序列
导出为tsv文件
四、模型构建及验证
按照参考走
模型构建后验证方法:
Leave-one-subject-out cross-validations
Leave-one-subject-out (loso) cross-validations can be used to evaluate the performance of the model.
注意,在预测时需要注意
- 修改lmax3,即最大CDR3的氨基酸数
- TRBV12-2+TRBV13-2 中的
+
要改成;
- 阈值为>0.85或0.9
具体我再看看,有需要补充的我再更新
网友评论