美文网首页群体遗传学
一款好用的大规模数据选择清除分析软件

一款好用的大规模数据选择清除分析软件

作者: 生信阿拉丁 | 来源:发表于2020-03-29 20:55 被阅读0次

作者:小龙
审稿:童蒙
排版:amethyst

1 介绍

群体选择清除(Selective Sweeps)分析是研究群体适应性的过程,计算软件和原理比较多。今天介绍一款对于单个群体大样本量的选择分析软件。SweeD基于复合似然比测试检验全基因组选择清除分析,在SweepFinder算法基础上改进,并且全面优于前者。

2 安装

下载:https://cme.h-its.org/exelixis/resource/download/software/SweeD_v3.2.1_Linux.tar.gz

tar -xzvf SweeD_v3.2.1_Linux.tar.gz
cd SweeD_v3.2.1_Linux
make -f Makefile.gcc

查看:./SweeD -help 会有参数说明


3 输入文件

一共支持5种输入文件格式:

3.1 The SweepFinder format

一共4列:

  • location: the location of a SNP (SNP位置)
  • x: the number of sequences carry the derived allele for a SNP (derived allele SNP数目)
  • n: the number of valid sequences at a SNP (SNP总数)
  • folded: a binary character which denotes if the SNP is unfolded (0) or folded (1).

3.2 FASTA format

这个大家很熟悉了,不做过多解说。

3.3 ms-like format

Hudson’s ms outputs binary data (0 and 1) instead of DNA data (A, C, G, or T). Usually, state 1 is called ‘derived’ and state 0 is called ‘ancestral’.

3.4 MaCS-like format

MaCS [Chen et al., 2009] is a Markovian coalescent simulator.这个格式不常见这里就不做详细解读。

3.5 VCF format

VCF格式是我们比较熟悉的,用此格式作为输入计算,简单快捷。

4 运行命令

SweeD -name test -input input.file -grid 10000 

其中各参数如下:
-name: Specifies a name for the run and the output files. 定义一个名字
-input: Specifies the name of the input alignment file. Supported file formats: SF (Sweep Finder) format.
-grid: Specifies the number of positions in the alignment where the CLR will be computed.

5 输出结果

输出两个文件:
1)information file (SweeD_Info.runName), which contains information related to the run of the program (the command line for instance). 信息文件包含运行过程相关信息。

2)report file (SweeD_Report.runName), which consists the main output file of the program (the score of the statistic at each position). 该文件就是我们要的结果文件。

主要有3列:

第一列:the alignment positions where the SweeD score is calculated 位置
第二列:the corresponding likelihood value 似然值
第三列:the corresponding α value, which is a function of the selection coefficient, the recombination rate and the effective population size.

参考文献

Gary K Chen, Paul Marjoram, and Jeffrey D Wall. Fast and flexible simulation of dna sequence data. Genome Res, 19(1):136-142, Jan 2009. doi: 10.1101/gr.083634.108. URL http://dx.doi.org/10.1101/gr.083634.108.
Richard R Hudson. Generating samples under a wright-fisher neutral model of genetic variation.Bioinformatics, 18(2):337-338, Feb 2002.
Pavlos P , Živković Daniel, Alexandros S , et al. SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes[J]. Molecular Biology and Evolution(9):9.

该文来源于“生信阿拉丁”,关注公众号,第一时间查收“新款”生信学习干货。

相关文章

网友评论

    本文标题:一款好用的大规模数据选择清除分析软件

    本文链接:https://www.haomeiwen.com/subject/texpuhtx.html