一个提取序列的脚本

作者: 正踪大米饭儿 | 来源:发表于2016-12-09 19:32 被阅读341次

    由于过程比较复杂,先附上代码,希望有天能写出多个版本~~~

    #!/usr/bin/perl -w
    use strict;
    use Getopt::Long;
    
    my ($list,$indir,$outdir,$help);
    
    GetOptions(
        "list|l=s"   => \$list,
        "indir|i=s"  => \$indir,
        "outdir|o:s" => \$outdir,
        "help|?"     => \$help
    );
    
    die &usage if ( !defined $list || !defined $indir || defined $help);
    $outdir ||= ".";
    mkdir $outdir,0755 if (not -e $outdir);
    
    
    open LST,$list or die "Can't open the filtered gene list file $! \n";
    my %lst = map{chomp;(split /\t/,$_)[0],1}<LST>;
    close LST;
    
    my %genes;
    opendir DIR,$indir or die "Can't open the outDIR of RSEM expression\n";
    foreach my $file (sort grep(/^.*\.genes\.results$/,readdir(DIR))) {
        open IN,"$indir/$file" or die $!;
        while (<IN>) {
            chomp;
            next if /^gene/;
    
            my @a = split /\t/,$_;
            if (exists $lst{$a[0]}) {
                my @b = split /,/,$a[1];
                for my $b (@b){
                    $genes{$b} = $a[0];
                }
            }
        }
        close IN;
    }
    closedir DIR;
    
    open OUT,">$outdir/transcript.list";
    foreach my $k (sort keys %genes){
        print OUT "$k\t$genes{$k}\n";
    }
    close OUT;
    
    
    sub usage {
        print <<USAGE;
    Name:
        $0
    Description:
        merge the out put file of the rsem < \$sample.genes.result >
    Update:
        2016-12-02  genebang
    Uasge:
        perl $0 [options] 
    Options:
        -l, --list    <string>*    list of filtered gene 
        -i, --indir   <string>*    input files contain the output of the rsem-exprssion-calular result
        -o, --output  <string>     output file
        --help                     print this help information
    e.g
        perl $0 -l filter.gene.FPKM.xls -i Test/output -o ./ 
        perl $0 -l filter.gene.FPKM.xls -i Test/output -o ./ 
        perl $0 --list filter.gene.FPKM.xls --indir Test/output --outdir ./  
    USAGE
        exit 0;
    }
    
    __END__
    

    该脚本可以实现输入一个列表文件,提取某一目录下所有某一类型文件中的所有子集对应的名称。
    示例文件如下:

    列表文件:

    ENSG00000000003.14   
    ENSG00000000005.5      
    ENSG00000000419.12    
    ENSG00000000457.13    
    ENSG00000000460.16    
    

    文件夹目录列表:

    $ ls ./
    BC486AA.genes.results
    BC486BA.genes.results
    BC499AA.genes.results
    BC499BA.genes.results
    BC505AA.genes.results
    BC505BA.genes.results
    

    文件内容:

    ENSG00000000003.14      ENST00000373020.8,ENST00000494424.1,ENST00000496771.5,ENST00000612152.4
    ENSG00000000005.5       ENST00000373031.4,ENST00000485971.1 
    ENSG00000000419.12      ENST00000371582.8,ENST00000371584.8,ENST00000371588.9,ENST00000413082.1
    ENSG00000000457.13      ENST00000367770.5,ENST00000367771.10,ENST00000367772.8,ENST00000423670.
    ENSG00000000460.16      ENST00000286031.10,ENST00000359326.8,ENST00000413811.3,ENST00000459772.
    

    结果输出:

    ENST00000000233.9       ENSG00000004059.10
    ENST00000000412.7       ENSG00000003056.7
    ENST00000000442.10      ENSG00000173153.13
    ENST00000001008.5       ENSG00000004478.7
    ENST00000001146.6       ENSG00000003137.8
    

    该脚本使用了perl 程序中好多实用的功能。

    相关文章

      网友评论

        本文标题:一个提取序列的脚本

        本文链接:https://www.haomeiwen.com/subject/qzcymttx.html