Step 1, 使用序列比对工具进行序列比对,这里笔者用的mafft(官网说明:
以下为mafft命令终端输出结果:nthread=0nthreadpair=0nthreadtb=0ppenalty_ex=0stacksize:8192 kbgeneratinga scoring matrix for nucleotide (dist=200) ... doneGapPenalty = -1.53, +0.00, +0.00Makinga distance matrix ..Thereare 1 ambiguous characters.201/ 229done.Constructinga UPGMA tree (efffree=0) ... 220/ 229done.Progressivealignment 1/2... STEP129 / 228 fReallocating..done.*alloclen = 23649STEP176 / 228 fReallocating..done.*alloclen = 26535STEP226 / 228 fReallocating..done.*alloclen = 27829STEP228 / 228 fdone.Makinga distance matrix from msa.. 200/ 229done.Constructinga UPGMA tree (efffree=1) ... 220/ 229done.Progressivealignment 2/2... STEP209 / 228 fReallocating..done.*alloclen = 27476STEP228 / 228 fdone.disttbfast(nuc) Version 7.471alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.00thread(s)Strategy:FFT-NS-2(Fast but rough)Progressivemethod (guide trees were built 2 times.)Ifunsure which option to use, try 'mafft --auto input > output'.Formore information, see 'mafft --help', 'mafft --man' and the mafft page.Thedefault gap scoring scheme has been changed in version 7.110 (2013 Oct).Ittends to insert more gaps into gap-rich regions than previous versions.To disable this change, add the --leavegappyregion option.
# 使用--auto为程序自动选择比对策略,默认比对结果格式为fasta格式。
mafft --clustalout input.fasta > input.out
Step 2, 接下来基于序列比对文件使用FastTree构建ML系统发育树。(FastTree官网:
以下为FastTree命令终端输出结果:FastTreeVersion 2.1.11 SSE3 Jukes-Cantor Joins: balanced Support: SH-like 1000Search:Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1TopHits:1.00*sqrtN close=default refresh=0.80MLModel: Generalized Time-Reversible, CAT approximation with 20 rate categories ###Ignoredunknown character n (seen 1 times)Initialtopology in 2.07 seconds0 of 224 227 seqs (at seed 200) Refiningtopology: 31 rounds ME-NNIs, 2 rounds ME-SPRs, 16 rounds ML-NNIsTotalbranch-length 98.807 after 18.26 sec 1 of 225 splits 0 changes x delta 0.161) ML-NNIround 1: LogLk = -578498.850 NNIs 45 max delta 21.75 Time 32.02s (max delta 21.753) GTRFrequencies: 0.3022 0.2199 0.2241 0.2538ep 12 of 12 GTRrates(ac ag at cg ct gt) 1.0483 2.5389 1.0248 0.9926 2.7404 1.0000Switchedto using 20 rate categories (CAT approximation)19 of 20 Ratecategories were divided by 0.800 so that average rate = 1.0CAT-basedlog-likelihoods may not be comparable across runsML-NNIround 2: LogLk = -558919.789 NNIs 17 max delta 7.53 Time 58.68es (max delta 7.527) ML-NNIround 3: LogLk = -558887.713 NNIs 8 max delta 0.77 Time 64.15es (max delta 0.334) ML-NNIround 4: LogLk = -558870.798 NNIs 1 max delta 0.11 Time 67.04ML-NNIround 5: LogLk = -558870.004 NNIs 1 max delta 0.51 Time 68.13ML-NNIround 6: LogLk = -558869.763 NNIs 0 max delta 0.00 Time 68.71Turningoff heuristics for final round of ML NNIs (converged)ML-NNIround 7: LogLk = -558646.178 NNIs 0 max delta 0.00 Time 81.92 (final)Optimizeall lengths: LogLk = -558636.448 Time 85.18Gamma(20)LogLk = -566145.706 alpha = 9.988 rescaling lengths by 1.223s Total time: 103.97 seconds Unique: 227/229 Bad splits: 0/224