美文网首页
基于 GetOrganelle 组装叶绿体基因组

基于 GetOrganelle 组装叶绿体基因组

作者: 风知秋 | 来源:发表于2024-06-07 23:32 被阅读0次

GetOrganelle是一款由中国科学院昆明植物研究所的金建军和郁文彬两位老师共同开发的质体组装软件,主要用于从基因组测序数据中组装完整的细胞器基因组,尤其擅长组装植物质体基因组。

需要调用的软件包括SPAdes、Bowtie2、BLAST+、Bandage等。更详细的内容见 软件官网

安装

个人不太习惯 conda 安装,使用了非 conda 安装流程:

## 下载 GetOrganelle 安装包

curl -L https://github.com/Kinggerm/GetOrganelle/archive/1.7.4.1.tar.gz | tar zx

## 下载依赖环境

curl -L https://github.com/Kinggerm/GetOrganelleDep/releases/download/v1.7.0/v1.7.0-linux.tar.gz | tar zx

依赖环境为 SPAdes, Bowtie2, BLAST。

## 尝试安装:

cd GetOrganelle-1.7.4.1

python  set,py install

遇见如下报错:

The following error occurred while trying to add or remove files in the installation directory:

[Errno 13] Permission denied: '/build/Cellar/anaconda2/lib/python2.7/site-packages/test-easy-install-367240.write-test'

The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /build/Cellar/anaconda2/lib/python2.7/site-packages/

## 默认目录下无权限,改到自己的文件夹下:

python  set,py install  --prefix=/my/file

遇见如下报错:

error: bad install directory or PYTHONPATH

* You can choose a different installation directory, i.e., one that is on PYTHONPATH or supports .pth files

* You can add the installation directory to the PYTHONPATH environment variable.  (It must then also be on PYTHONPATH whenever you run Python and want to use the package(s) you are installing.)

* You can set up the installation directory to support ".pth" files by using one of the approaches described here:

https://setuptools.readthedocs.io/en/latest/easy_install.html#custom-installation-locations

将安装目录添加到PYTHONPATH环境变量中:

export PYTHONPATH="$PYTHONPATH:/my/file/"

而后再安装:

python  set,py install  --prefix=/my/file

顺利完成。之后记得将依赖环境以及本软件的 bin 目录配置到 .bashrc 文件内。

试运行

# 下载示例文件:

## 下载参考序列库:

get_organelle_config.py--addembplant_pt,embplant_mt

## 下载重测序数据 fq 文件:

wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.1.fq.gz

wget https://github.com/Kinggerm/GetOrganelleGallery/raw/master/Test/reads/Arabidopsis_simulated.2.fq.gz

## 组装叶绿体基因组

get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10

参数详解:

# -1 Arabidopsis_simulated.1.fq.gz Input file with the forward paired-end reads (*.fq/.gz/.tar.gz)

# -2 Arabidopsis_simulated.2.fq.gz Input file with the reverse paired-end reads (*.fq/.gz/.tar.gz)

# -t 1 Maximum threads to use. Default: 1

# -o Arabidopsis_simulated.plastome Output directory

# -F embplant_pt Target organelle genome type(s)

# -R 10 Maximum extension rounds

组装失败,有报错:

...... 

2024-06-08 19:04:25,434 - ERROR: sympy/scipy not available! Disentangling disabled!!

......

2024-06-08 17:47:03,893 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-core', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq.spades/K45/configs/config.info']" finished abnormally, OS return value: 1

2024-06-08 17:47:03,894 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading.

......

2024-06-08 17:47:17,892 - WARNING: Compression after read correction will be skipped for lack of 'pigz'

2024-06-08 17:47:17,893 - INFO: spades.py -t 1  --disable-gzip-output --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades

2024-06-08 17:47:18,805 - ERROR: Error with running SPAdes: == Error ==  system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-hammer', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/extended_spades/corrected/configs/config.info']" finished abnormally, OS return value: 1

2024-06-08 17:47:18,806 - ERROR: Assembling failed.

## 安装 sympy 和 scipy 

pip install sympy scipy --prefix=/my/folder2

Requirement already satisfied: sympy in /build/Cellar/anaconda2/lib/python2.7/site-packages (1.3)

Requirement already satisfied: scipy in /build/Cellar/anaconda2/lib/python2.7/site-packages (1.2.1)

提示这两个库已经安装过了,但在运行的时候仍提示:2024-06-08 19:04:25,434 - ERROR: sympy/scipy not available! Disentangling disabled!!

可能是前面改变了 PYTHONPATH,如果将之前的 export PYTHONPATH 取消,则会出现新的报错:

Traceback (most recent call last):

    File "/mnt/ge-jbod/zhanghongxiang/software/GetOrganelle/GetOrganelle-1.7.4.1/bin/get_organelle_from_reads.py", line 12, in <module>

    import GetOrganelleLib

ImportError: No module named GetOrganelleLib

解决办法为同时添加两个 PYTHONPATH:

export PYTHONPATH="/path/to/folder1:/path/to/folder2"

## 安装 pigz

wget https://github.com/madler/pigz/archive/refs/heads/master.zip

unzip master.zip

cd  pigz-master

make

再运行,还是报错:

......

2024-06-08 18:44:42,484 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-core', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq.spades/K45/configs/config.info']" finished abnormally, OS return value: 1

2024-06-08 18:44:42,485 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading.

......

2024-06-08 18:44:57,031 - ERROR: Error with running SPAdes: == Error == system call for: "['/XX/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-hammer', '/XX/GetOrganelle/example/Arabidopsis_simulated.plastome/extended_spades/corrected/configs/config.info']" finished abnormally, OS return value: 1

2024-06-08 18:44:57,032 - ERROR: Assembling failed.

查了一下可能是 SPAdes 的问题,Github 上有人反映说改一个命令就行:

I asked server administrator and showed him my scripts, then it run successfully by removimg "srun" out from my code.&nbsp 

详见 Github 上的讨论

我嫌麻烦,将原 3.15.4 的版本替换为了 3.15.3,再次运行不再报错。

2024-06-08 19:05:28,351 - INFO: Slimming Arabidopsis_simulated.plastome/extended_spades/K115/assembly_graph.fastg finished!

2024-06-08 19:05:28,352 - INFO: Slimming assembly graphs finished.



以上是我学习过程整理的随手笔记,希望能帮到大家!如果有帮助,希望不吝点个赞,或者关注,也是对我的一个肯定或者鼓励。

相关文章

网友评论

      本文标题:基于 GetOrganelle 组装叶绿体基因组

      本文链接:https://www.haomeiwen.com/subject/konmqjtx.html