GetOrganelle是一款由中国科学院昆明植物研究所的金建军和郁文彬两位老师共同开发的质体组装软件,主要用于从基因组测序数据中组装完整的细胞器基因组,尤其擅长组装植物质体基因组。
需要调用的软件包括SPAdes、Bowtie2、BLAST+、Bandage等。更详细的内容见 软件官网。
安装
个人不太习惯 conda 安装,使用了非 conda 安装流程:
## 下载 GetOrganelle 安装包
curl -L https://github.com/Kinggerm/GetOrganelle/archive/1.7.4.1.tar.gz | tar zx
## 下载依赖环境
curl -L https://github.com/Kinggerm/GetOrganelleDep/releases/download/v1.7.0/v1.7.0-linux.tar.gz | tar zx
依赖环境为 SPAdes, Bowtie2, BLAST。
## 尝试安装:
cd GetOrganelle-1.7.4.1
python set,py install
遇见如下报错:
The following error occurred while trying to add or remove files in the installation directory:
[Errno 13] Permission denied: '/build/Cellar/anaconda2/lib/python2.7/site-packages/test-easy-install-367240.write-test'
The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /build/Cellar/anaconda2/lib/python2.7/site-packages/
## 默认目录下无权限,改到自己的文件夹下:
python set,py install --prefix=/my/file
遇见如下报错:
error: bad install directory or PYTHONPATH
* You can choose a different installation directory, i.e., one that is on PYTHONPATH or supports .pth files
* You can add the installation directory to the PYTHONPATH environment variable. (It must then also be on PYTHONPATH whenever you run Python and want to use the package(s) you are installing.)
* You can set up the installation directory to support ".pth" files by using one of the approaches described here:
https://setuptools.readthedocs.io/en/latest/easy_install.html#custom-installation-locations
将安装目录添加到PYTHONPATH环境变量中:
export PYTHONPATH="$PYTHONPATH:/my/file/"
而后再安装:
python set,py install --prefix=/my/file
顺利完成。之后记得将依赖环境以及本软件的 bin 目录配置到 .bashrc 文件内。
试运行
# 下载示例文件:
## 下载参考序列库:
get_organelle_config.py--addembplant_pt,embplant_mt
## 下载重测序数据 fq 文件:
wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.1.fq.gz
wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.2.fq.gz
## 组装叶绿体基因组
get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10
参数详解:
# -1 Arabidopsis_simulated.1.fq.gz Input file with the forward paired-end reads (*.fq/.gz/.tar.gz)
# -2 Arabidopsis_simulated.2.fq.gz Input file with the reverse paired-end reads (*.fq/.gz/.tar.gz)
# -t 1 Maximum threads to use. Default: 1
# -o Arabidopsis_simulated.plastome Output directory
# -F embplant_pt Target organelle genome type(s)
# -R 10 Maximum extension rounds
组装失败,有报错:
......
2024-06-08 19:04:25,434 - ERROR: sympy/scipy not available! Disentangling disabled!!
......
2024-06-08 17:47:03,893 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-core', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq.spades/K45/configs/config.info']" finished abnormally, OS return value: 1
2024-06-08 17:47:03,894 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading.
......
2024-06-08 17:47:17,892 - WARNING: Compression after read correction will be skipped for lack of 'pigz'
2024-06-08 17:47:17,893 - INFO: spades.py -t 1 --disable-gzip-output --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades
2024-06-08 17:47:18,805 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-hammer', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/extended_spades/corrected/configs/config.info']" finished abnormally, OS return value: 1
2024-06-08 17:47:18,806 - ERROR: Assembling failed.
## 安装 sympy 和 scipy
pip install sympy scipy --prefix=/my/folder2
Requirement already satisfied: sympy in /build/Cellar/anaconda2/lib/python2.7/site-packages (1.3)
Requirement already satisfied: scipy in /build/Cellar/anaconda2/lib/python2.7/site-packages (1.2.1)
提示这两个库已经安装过了,但在运行的时候仍提示:2024-06-08 19:04:25,434 - ERROR: sympy/scipy not available! Disentangling disabled!!
可能是前面改变了 PYTHONPATH,如果将之前的 export PYTHONPATH 取消,则会出现新的报错:
Traceback (most recent call last):
File "/mnt/ge-jbod/zhanghongxiang/software/GetOrganelle/GetOrganelle-1.7.4.1/bin/get_organelle_from_reads.py", line 12, in <module>
import GetOrganelleLib
ImportError: No module named GetOrganelleLib
解决办法为同时添加两个 PYTHONPATH:
export PYTHONPATH="/path/to/folder1:/path/to/folder2"
## 安装 pigz
wget https://github.com/madler/pigz/archive/refs/heads/master.zip
unzip master.zip
cd pigz-master
make
再运行,还是报错:
......
2024-06-08 18:44:42,484 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-core', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq.spades/K45/configs/config.info']" finished abnormally, OS return value: 1
2024-06-08 18:44:42,485 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading.
......
2024-06-08 18:44:57,031 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-hammer', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/extended_spades/corrected/configs/config.info']" finished abnormally, OS return value: 1
2024-06-08 18:44:57,032 - ERROR: Assembling failed.
查了一下可能是 SPAdes 的问题,Github 上有人反映说改一个命令就行:
I asked server administrator and showed him my scripts, then it run successfully by removimg "srun" out from my code. 
我嫌麻烦,将原 3.15.4 的版本替换为了 3.15.3,再次运行不再报错。
2024-06-08 19:05:28,351 - INFO: Slimming Arabidopsis_simulated.plastome/extended_spades/K115/assembly_graph.fastg finished!
2024-06-08 19:05:28,352 - INFO: Slimming assembly graphs finished.
以上是我学习过程整理的随手笔记,希望能帮到大家!如果有帮助,希望不吝点个赞,或者关注,也是对我的一个肯定或者鼓励。
网友评论