背景:
genometools.version:0.4.1
因为在genometools的0.4.1版本中缺少genome.py文件。且在from genometools.expression import ExpGenome,会报错,缺少ExpGenome模块。
在读取analysis.py中的g为基因名,但是self._gene_indices中的key为Expgene对象导致keyerror,所以需要提取出Expgene对象中的基因名进行比较。
for j, gs in enumerate(self._gene_set_coll.gene_sets):
for g in gs.genes:
try:
idx = self._gene_indices[g]
解决方案:
1.需要先导入genome.py文件进入你的python解释器的路径:Lib/site-packages/genometools/expression/
2.添加一段代码于genometools/expression/init.py,使genome.py连接expression模块。
from .genomeimport ExpGenome
3.修改Lib\site-packages\genometools\enrichment\analysis.py代码,将识别基因中的错误改正。
将以下代码修改
def __init__(self,
valid_genes: Iterable[str],
gene_set_coll: GeneSetCollection):
self._valid_genes = tuple(copy.deepcopy(valid_genes))
self._gene_set_coll = copy.deepcopy(gene_set_coll)
self._gene_indices =\
dict([gene, i]
for i, gene in enumerate(valid_genes))
#for key,va in self._gene_indices.items():
改为:
def __init__(self,
valid_genes: Iterable[str],
gene_set_coll: GeneSetCollection):
self._valid_genes = tuple(copy.deepcopy(valid_genes))
self._gene_set_coll = copy.deepcopy(gene_set_coll)
self._gene_indices =\
dict([gene.name, i]
for i, gene in enumerate(valid_genes))
#for key,va in self._gene_indices.items():
更新问题:
由于Scripts中ensembl_extract_protein_coding_genes.py已经更改成
ensembl_extract_protein_coding_genes.exe
在代码的运行中可以修改这段代码解决Homo_sapiens.GRCh38.83.gtf.gz文件下载问题
!ensembl_extract_protein_coding_genes.exe -a "$ensembl_annotation_file" -o "$genome_file"
关于文件protein_coding_genes_human_ensembl83.tsv,因为此文件中缺少ensembl_id这一列属性,在程序运行时会报错缺少该属性。所以请在
https://github.com/flo-compbio/2016-python-gene-expression-workshop/tree/master/data
下载protein_coding_genes_human_ensembl83.tsv该文件,将两个文件内容合并。
网友评论