Notung

作者: 多啦A梦的时光机_648d | 来源:发表于2023-06-28 17:12 被阅读0次

    准备物种树和基因树,主要物种树和基因树要保持一致

    1. 物种树 (可以用ofrthofinder获得)

    ((Osat.34TPS:0.3238497464,(Ppec.7TPS:0.4624992055,(Pped.17TPS:0.0840652260,Akon.4TPS:0.0695437415):0.1663737443):0.0425678634):0.05197193935,(Atha.31TPS:0.4047715374,(Bbun.9TPS:0.1548556458,(Cchi.34TPS:0.1348538592,Acoe.68TPS:0.1026688238):0.0216591009):0.1179306530):0.05197193935);
    

    2. 基因树

    ls 0-raw/*.fa|cut -d '/' -f2|cut -d '.' -f1 >species.list
    for species in $(cat species.list); do cat 0-raw/$species.fa | seqkit seq -n | awk '{print $1}' | sed "s/$/_species/g" > t1; cat 0-raw/$species.fa | seqkit seq -s -w 0 > t2; paste t1 t2 | seqkit tab2fx | seqkit seq -w 0 > 1-proteins/$species.fa; rm t1 t2; done
    
    cat *.fa >all.TPS.fa
    linsi --thread 8 all.TPS.fa >all.mafft.fa
    trimal -in all.mafft.fa -out all.trim.fa -automated1
    iqtree -s all.trim.fa  -m MFP  -mset LG,JTT --msub nuclear --rclusterf 10 -B 1000 --alrt 1000 -T AUTO
    

    物种树必须包含基因树中所有基因的物种,额外的物种将被 Notung 忽略。
    输入的基因树中的每个基因需要包含物种信息(格式:gene_species)。Notung 以 “” 作为分隔符,基因名中最后一个 “” 后内容作为物种名。如 Notung 认为 XP_020599319.1_Phalaenopsis_equestris 的基因名为 XP_020599319.1_Phalaenopsis,物种名为 equestris。所以如果物种名中包含 “_” 需替换为 “-” 或其他字符。

    其中物种树中Acoe.68TPS与基因树中Acoe-Aqcoe2G342500.1_Acoe.68TPS应的。

    (Acoe-Aqcoe2G342500.1_Acoe.68TPS:0.0061827419,((Acoe-Aqcoe2G342100.1_Acoe.68TPS:0.1762034096,(((((((((((((((((Acoe-Aqcoe6G133500.1_Acoe.68TPS:0.3084170080,(((((Acoe-Aqcoe4G264800.1_Acoe.68TPS:0.0000024138,(Acoe-Aqcoe4G265300.1_Acoe.68TPS:0.0725334447,Acoe-Aqcoe4G265200.1_Acoe.68TPS:0.0067389571):0.0000868471):0.0755904260,(Acoe-Aqcoe7G313900.1_Acoe.68TPS:0.0189632222,Acoe-Aqcoe0905s0001.1_Acoe.68TPS:0.1172788685):0.0110174313):0.0498014005,(Acoe-Aqcoe4G227700.1_Acoe.68TPS:0.0530108701,Acoe-Aqcoe4G227600.1_Acoe.68TPS:0.0535902543):0.0533783434):0.1415852773,Acoe-Aqcoe7G313700.1_Acoe.68TPS:0.2047042426):0.0178030527,((Acoe-Aqcoe4G268500.1_Acoe.68TPS:0.2715544631,((Bbun-DN5309-c0-g1_Bbun.9TPS:0.4534004521,Bbun-DN958-c0-g1_Bbun.9TPS:0.2943359389):0.0473083401,(((Cchi-Cch00032765-RA_Cchi.34TPS:0.0219343518,Cchi-Cch00007515-RA_Cchi.34TPS:0.0333885575):0.0301898870,(Cchi-Cch00026076-RA_Cchi.34TPS:0.0161559557,Cchi-Cch00034696-RA_Cchi.34TPS:0.0156426620):0.0258928795):0.0706490268,(Cchi-Cch00030515-RA_Cchi.34TPS:0.0116303536,Cchi-Cch00030516-RA_Cchi.34TPS:0.0195937959):0.0819663991):0.1591943873):0.0595766850):0.0462785386,Acoe-Aqcoe7G314500.1_Acoe.68TPS:0.2658873688):0.0093710632):0.0280338107):0.0443197782,Acoe-Aqcoe4G241900.1_Acoe.68TPS:0.1573486191):0.2102369125,(((Akon-evm.model.CTG_13723.8_Akon_Akon.4TPS:0.1741878516,((Akon-evm.model.CTG_2037.3.1_Akon_Akon.4TPS:0.0000026512,Akon-evm.model.CTG_5017.2_Akon_Akon.4TPS:0.0033813551):0.2629656133,((Pped-PIPE01557.t1_Pped.17TPS:0.0000022750,Pped-PIPE01560.t1_Pped.17TPS:0.0000023427):0.1506488700,Pped-PIPE18515.t1_Pped.17TPS:0.0609314501):0.1247172210):0.0650826313):0.1828040558,Pped-PIPE16890.t1_Pped.17TPS:0.5948404525):0.0820123307,((Aath-4G16740.1_Atha.31TPS:0.2035149267,Aath-4G16730.1_Atha.31TPS:0.2578488105):0.1219712896,(Aath-2G24210.1_Atha.31TPS:0.2452229023,((Aath-3G25820.1_Atha.31TPS:0.0000026376,Aath-3G25830.1_Atha.31TPS:0.0000025082):0.0986932354,Aath-3G25810.1_Atha.31TPS:0.1102096682):0.0867002300):0.1008781515):0.3118890707):0.0599483765):0.0228102087,(((Acoe-Aqcoe4G253400.1_Acoe.68TPS:0.0033462303,Acoe-Aqcoe4G253900.1_Acoe.68TPS:0.0000021949):0.0810521617,(((Cchi-Cch00030263-RA_Cchi.34TPS:0.0148719732,Cchi-Cch00030276-RA_Cchi.34TPS:0.1779117363):0.0374231071,Cchi-Cch00026455-RA_Cchi.34TPS:0.2513191479):0.1134726951,Cchi-Cch00026460-RA_Cchi.34TPS:0.0547015201):0.1217809983):0.0874099809,Bbun-DN67630-c1-g1_Bbun.9TPS:0.2699979321):0.0907563247):0.0339944377,((Acoe-Aqcoe1G371200.1_Acoe.68TPS:0.0818290906,(Acoe-Aqcoe1G371100.1_Acoe.68TPS:0.0223055980,Cchi-Cch00038020-RA_Cchi.34TPS:0.1047133648):0.0265773657):0.2295388277,((Acoe-Aqcoe1G370000.1_Acoe.68TPS:0.2045773937,(Acoe-Aqcoe1G370200.1_Acoe.68TPS:0.1388217486,Cchi-Cch00013007-RA_Cchi.34TPS:0.1465163374):0.0378464220):0.0257240244,(Cchi-Cch00016515-RA_Cchi.34TPS:0.0000026646,(Cchi-Cch00038027-RA_Cchi.34TPS:0.0080675536,Cchi-Cch00013005-RA_Cchi.34TPS:0.0000023575):0.0102851212):0.0991229295):0.1954432210):0.1210489710):0.0304810182,(((((Acoe-Aqcoe4G129600.1_Acoe.68TPS:0.0000027604,Acoe-Aqcoe4G129400.1_Acoe.68TPS:0.0000026599):0.0625207511,(Acoe-Aqcoe7G261500.1_Acoe.68TPS:0.0182595582,Acoe-Aqcoe7G261300.1_Acoe.68TPS:0.0000022793):0.0372378108):0.1835177401,((Acoe-Aqcoe4G254000.1_Acoe.68TPS:0.0033579833,Acoe-Aqcoe4G253500.1_Acoe.68TPS:0.0000024810):0.1162656611,Cchi-Cch00012617-RA_Cchi.34TPS:0.1969870128):0.0443678639):0.0382331367,Acoe-Aqcoe7G440300.1_Acoe.68TPS:0.3267165074):0.1429182422,(((Acoe-Aqcoe4G254300.1_Acoe.68TPS:0.1191021219,Acoe-Aqcoe5G018100.1_Acoe.68TPS:0.1712042931):0.1525429356,(Cchi-Cch00025003-RA_Cchi.34TPS:0.1596796345,Cchi-Cch00028566-RA_Cchi.34TPS:0.1370424832):0.1091154988):0.0340381652,Acoe-Aqcoe1G139700.1_Acoe.68TPS:0.3037280544):0.0819891359):0.1633907762):0.0740850521,(((Cchi-Cch00028070-RA_Cchi.34TPS:0.0034716072,Cchi-Cch00022287-RA_Cchi.34TPS:0.0033393723):0.0067886213,Cchi-Cch00015543-RA_Cchi.34TPS:0.0037375535):0.3241113078,Pped-PIPE16912.t1_Pped.17TPS:0.5420867321):0.1234559388):0.0506329055,Ppec-DN56113-c0-g1_Ppec.7TPS:0.6096143037):0.0848407162,((((((Acoe-Aqcoe1G371800.1_Acoe.68TPS:0.0410426365,Aath-2G18700.1_Atha.31TPS:2.7873412231):0.0253798933,Acoe-Aqcoe1G371700.1_Acoe.68TPS:0.2399370629):0.1084865344,(((Acoe-Aqcoe1G370100.1_Acoe.68TPS:0.0000029077,Acoe-Aqcoe1G313700.1_Acoe.68TPS:0.0066108801):0.1327585108,Acoe-Aqcoe1G371400.1_Acoe.68TPS:0.0769751578):0.1777154421,(Acoe-Aqcoe1G372000.1_Acoe.68TPS:0.1580780386,Cchi-Cch00030468-RA_Cchi.34TPS:0.1985733638):0.0371564363):0.0739158831):0.2034992235,Aath-1G61680.1_Atha.31TPS:0.7158907598):0.0253146355,((Osat-02t0121700-01_Osat.34TPS:0.0709841911,Osat-10t0489500-00_Osat.34TPS:0.0805439043):0.6886477696,(Pped-PIPE06129.t1_Pped.17TPS:0.0000023561,Pped-PIPE24147.t1_Pped.17TPS:0.0000025889):0.3829235925):0.1105689656):0.2465262531,Ppec-DN17159-c0-g1_Ppec.7TPS:0.5087429987):0.1280205151):0.0945653776,(((((((((Acoe-Aqcoe4G127600.1_Acoe.68TPS:0.1735873708,(Acoe-Aqcoe4G067000.1_Acoe.68TPS:0.1196502980,Acoe-Aqcoe5G393700.1_Acoe.68TPS:0.1061301726):0.0200801941):0.0130471533,(Acoe-Aqcoe4G128100.1_Acoe.68TPS:0.1834724819,Acoe-Aqcoe4G125200.1_Acoe.68TPS:0.1714255424):0.0255092737):0.0061962799,Acoe-Aqcoe4G248400.1_Acoe.68TPS:0.1528766078):0.0377117802,Acoe-Aqcoe4G273200.1_Acoe.68TPS:0.2190928903):0.1454044545,(Bbun-DN46845-c0-g2_Bbun.9TPS:0.3341681429,Bbun-DN49983-c0-g1_Bbun.9TPS:0.3712939893):0.0534727009):0.0509739620,(Bbun-DN65203-c0-g1_Bbun.9TPS:0.2760913567,Cchi-Cch00025113-RA_Cchi.34TPS:0.6293336097):0.0580335533):0.0677165748,((Acoe-Aqcoe4G237300.1_Acoe.68TPS:0.0955715054,Acoe-Aqcoe5G394600.1_Acoe.68TPS:0.1170910927):0.1744700177,Acoe-Aqcoe4G298300.1_Acoe.68TPS:0.2710434708):0.0893057793):0.1114810926,Acoe-Aqcoe4G236500.1_Acoe.68TPS:0.4188825548):0.2265191428,((((((((Aath-4G13300.1_Atha.31TPS:0.0206974542,Aath-4G13280.1_Atha.31TPS:0.0605547376):0.3855962649,Aath-4G15870.1_Atha.31TPS:0.3253177730):0.0814349405,(((Aath-3G29190.1_Atha.31TPS:0.1977441815,((Aath-4G20200.1_Atha.31TPS:0.1673747364,Aath-1G66020.1_Atha.31TPS:0.2043455280):0.0438828787,Aath-4G20230.1_Atha.31TPS:0.2079841025):0.0650467919):0.0624469480,(Aath-4G20210.1_Atha.31TPS:0.2726053662,Aath-3G29110.1_Atha.31TPS:0.3277545049):0.0577538019):0.0573081888,Aath-1G48800.1_Atha.31TPS:0.3562780195):0.0841767143):0.1726134093,(Aath-5G48110.1_Atha.31TPS:0.4569514801,Aath-1G70080.1_Atha.31TPS:0.3944180194):0.0655142400):0.0727637162,(((Aath-1G31950.1_Atha.31TPS:0.0935801512,(Aath-3G14540.1_Atha.31TPS:0.0630868243,Aath-3G14520.1_Atha.31TPS:0.0396345570):0.0685513770):0.2077321268,Aath-3G32030.1_Atha.31TPS:0.2446220311):0.0313387222,((Aath-3G14490.1_Atha.31TPS:0.1902938262,Aath-3G29410.1_Atha.31TPS:0.3302092324):0.0309666256,Aath-1G33750.1_Atha.31TPS:0.2928741619):0.0255885180):0.1902288556):0.3293366692,Aath-5G23960.1_Atha.31TPS:0.6528767168):0.0750284892,((((Osat-01t0337100-00_Osat.34TPS:0.2813188417,Osat-08t0168400-00_Osat.34TPS:0.1339434029):0.0453360523,(Osat-08t0167800-01_Osat.34TPS:0.1022473751,Osat-08t0168000-01_Osat.34TPS:0.0586193393):0.1142493038):0.3605638483,Osat-08t0139700-01_Osat.34TPS:0.3731025154):0.1520742550,(Osat-02t0458100-01_Osat.34TPS:0.5645878566,((Osat-03t0347900-01_Osat.34TPS:0.3059971935,((Osat-03t0348200-00_Osat.34TPS:0.1317522410,Osat-07t0218200-01_Osat.34TPS:0.3568890878):0.1392853110,(((Osat-04t0340300-01_Osat.34TPS:0.0067666033,Osat-04t0341500-02_Osat.34TPS:0.0166195233):0.1734641231,Osat-04t0345400-01_Osat.34TPS:0.2420894980):0.0540598317,(Osat-04t0342100-01_Osat.34TPS:0.2450250147,(Osat-04t0344100-01_Osat.34TPS:0.0786725481,Osat-04t0344400-00_Osat.34TPS:0.0777944324):0.1561246027):0.0453734338):0.2498808987):0.0509603605):0.1889135625,(((Osat-03t0361100-00_Osat.34TPS:0.2189770555,Osat-03t0362500-00_Osat.34TPS:0.1338477522):0.0281881045,Osat-03t0361700-00_Osat.34TPS:0.2312163907):0.0574341311,Osat-03t0361600-00_Osat.34TPS:0.1908511594):0.2342731699):0.2408315263):0.0483487670):0.1506364653):0.0708488937,(((((Pped-PIPE16625.t1_Pped.17TPS:0.0381049081,Pped-PIPE16646.t1_Pped.17TPS:0.0402769644):0.0247742377,Pped-PIPE16636.t1_Pped.17TPS:0.0695148191):0.0334440767,Pped-PIPE16645.t1_Pped.17TPS:0.0943019753):0.0404115670,Pped-PIPE16637.t1_Pped.17TPS:0.1347688478):0.0581858747,Pped-PIPE16638.t1_Pped.17TPS:0.0644579932):0.4456597751):0.0641486613):0.1836484041):0.4951511575,((((((((Acoe-Aqcoe6G034800.1_Acoe.68TPS:0.0567787539,Acoe-Aqcoe4G250100.1_Acoe.68TPS:0.2402453448):0.0146369991,((Acoe-Aqcoe4G299300.1_Acoe.68TPS:0.0100560776,Acoe-Aqcoe4G299400.1_Acoe.68TPS:0.0032028420):0.1284799970,Acoe-Aqcoe4G242800.1_Acoe.68TPS:0.1433833506):0.1175290358):0.0251715607,Bbun-DN12710-c1-g1_Bbun.9TPS:0.1144980921):0.0365602816,Cchi-Cch00036133-RA_Cchi.34TPS:0.1444304613):0.1472878879,Aath-1G79460.1_Atha.31TPS:0.3764416331):0.0272373627,(((Akon-evm.model.HIC_ASM_5.9208_evm.model.HIC_ASM_5.9209_Akon_Akon.4TPS:0.0789968756,Pped-PIPE16990.t5_Pped.17TPS:0.1125449594):0.2023414141,((Ppec-DN19898-c1-g1_Ppec.7TPS:0.2483628298,(Ppec-DN4354-c0-g1_Ppec.7TPS:0.1774971416,Ppec-DN19898-c0-g1_Ppec.7TPS:0.1050650766):0.0571354435):0.1790874634,Ppec-DN10447-c0-g1_Ppec.7TPS:0.6591443504):0.0974405271):0.0530603194,((Osat-02t0568700-01_Osat.34TPS:0.3348566827,(((Osat-02t0571300-01_Osat.34TPS:0.0357300704,Osat-02t0571800-00_Osat.34TPS:0.0564357080):0.1690557827,Osat-11t0474800-01_Osat.34TPS:0.2380584009):0.0575958360,Osat-12t0491800-01_Osat.34TPS:0.3068172184):0.1014305108):0.1720048334,(((Osat-02t0570400-01_Osat.34TPS:0.3020243021,Osat-04t0611800-00_Osat.34TPS:0.1112518921):0.0591872719,Osat-04t0179700-01_Osat.34TPS:0.3128159166):0.0121028785,(Osat-04t0611700-00_Osat.34TPS:0.2274517405,Osat-04t0612000-01_Osat.34TPS:0.1457280491):0.0533567008):0.0820075265):0.1559573717):0.0782497181):0.0237965279,((((Acoe-Aqcoe4G265900.1_Acoe.68TPS:0.0433626525,Acoe-Aqcoe4G265600.1_Acoe.68TPS:0.1114238056):0.3319693409,Bbun-DN52646-c0-g2_Bbun.9TPS:0.3101274434):0.0551152320,Acoe-Aqcoe4G061300.1_Acoe.68TPS:0.2491938034):0.0521007691,((Acoe-Aqcoe3G041600.1_Acoe.68TPS:0.1571538454,Bbun-DN112413-c0-g1_Bbun.9TPS:0.2572983574):0.0443873641,Cchi-Cch00032035-RA_Cchi.34TPS:0.2139020860):0.0637841461):0.1648770542):0.4174510378,(Aath-1G61120.1_Atha.31TPS:0.7057315716,(Pped-PIPE14795.t1_Pped.17TPS:0.0000022815,Pped-PIPE20235.t1_Pped.17TPS:0.0039883171):0.3786329017):0.5495447940):0.1473844101):0.7998256133,Aath-4G02780.1_Atha.31TPS:0.5042075535):0.1077374125,((Acoe-Aqcoe4G243000.1_Acoe.68TPS:0.1592233523,Acoe-Aqcoe5G057100.1_Acoe.68TPS:0.0658378394):0.1065104077,((Cchi-Cch00002638-RA_Cchi.34TPS:0.0894285200,Cchi-Cch00028562-RA_Cchi.34TPS:0.1478907587):0.0372117182,Cchi-Cch00010326-RA_Cchi.34TPS:0.1342319716):0.0927357300):0.2141478645):0.0369542904,((Osat-02t0278700-01_Osat.34TPS:0.5363617486,Ppec-DN1805-c0-g1_Ppec.7TPS:0.3211902513):0.0844971303,(Osat-02t0571100-01_Osat.34TPS:0.5970822631,Pped-PIPE01889.t2_Pped.17TPS:0.3554174913):0.1157173947):0.1812273776):0.3685201152,(Cchi-Cch00021098-RA_Cchi.34TPS:0.0027371839,Cchi-Cch00013892-RA_Cchi.34TPS:0.0157104507):0.1451317097):0.0208848894,((((Acoe-Aqcoe6G196200.1_Acoe.68TPS:0.0415753904,Acoe-Aqcoe7G138000.1_Acoe.68TPS:0.0609094880):0.0514338987,Acoe-Aqcoe7G146600.1_Acoe.68TPS:0.0765178557):0.0241737324,((Acoe-Aqcoe4G266600.1_Acoe.68TPS:0.0000020557,Acoe-Aqcoe4G266100.1_Acoe.68TPS:0.0000025382):0.0123309720,Acoe-Aqcoe4G266300.1_Acoe.68TPS:0.0111171800):0.1459452661):0.0681642707,(Acoe-Aqcoe4G314800.1_Acoe.68TPS:0.1853688099,Acoe-Aqcoe4G314500.1_Acoe.68TPS:0.2133063137):0.0414580394):0.1208135207):0.0200346411,(Cchi-Cch00034001-RA_Cchi.34TPS:0.0549806601,(Cchi-Cch00032413-RA_Cchi.34TPS:0.0903111689,(Cchi-Cch00032038-RA_Cchi.34TPS:0.0032614602,Cchi-Cch00017142-RA_Cchi.34TPS:0.0041923870):0.0128131653):0.0170078874):0.1968174585):0.0546674800):0.1404311522,Acoe-Aqcoe2G342300.1_Acoe.68TPS:0.1150348617):0.1154081588,Acoe-Aqcoe2G342400.1_Acoe.68TPS:0.0184132164);
    
    

    3. notung

    在 Notung 中导入基因树、物种树后,通过 Rooting Mode 将无根基因树转化为有根基因树。Rooting Mode 根据有根物种树为每个边计算 DTL 分数(DTL 分数越小的越适合做根),Notung 会高亮(红色)最小值及附近 [(max-min) × 5% ] 的边。用户通过鼠标点击确定选择那个边作为根。将生根后的基因树以 NEWICK 格式导出。

    Notung 的 Reconciliation Mode 功能的输入是 有根基因树,通过比较基因树和物种树推断基因复制、转移、丢失事件。
    Notung 的 Rooting Mode 功能的输入是 无根基因树,通过比较基因树和物种树推断最可能的生根边,根据用户选择的根推断基因复制、转移、丢失事件。本文使用的是 Rooting Mode 功能。
    将有根物种树、无根基因树导入后发现 Notung 会计算出许多适合的生根位点(下图红线)。在没有外群的情况下,难以确定适合的生根位点。

    为此,本文原先的数据集中加入外群(Amborella trichopoda)后重新进行 MUSCLE、IQtree 分析。Notung 计算出适合的生根位点如下图所示。可以发现,凭借外群能方便寻找适合的生根位点。从 Notung 的底边栏可以看到 Notung 推断 TPS发生了171次基因复制事件,96 次基因丢失事件。将生根后的基因树以 NEWICK 格式导出。

    edb93317accabcf0f23c15e6c26e8d2d.png 07b96b6124f7f183e92f520f7e44917e.png
    image.png

    相关文章

      网友评论

        本文标题:Notung

        本文链接:https://www.haomeiwen.com/subject/dxboydtx.html