美文网首页
生信随手记2020-02-10:awk命令字符串操作

生信随手记2020-02-10:awk命令字符串操作

作者: 猫叽先森 | 来源:发表于2020-02-10 14:54 被阅读0次

    复现lncRNA文献时,作者使用了一个叫做Animal QTL的数据库。

    The Animal Quantitative Trait Loci (QTL) Database (Animal QTLdb) strives to collect all publicly available trait mapping data, i.e. QTL (phenotype/expression, eQTL), candidate gene and association data (GWAS), and copy number variations (CNV) mapped to livestock animal genomes, in order to facilitate locating and comparing discoveries within and between species. New data and database tools are continually developed to align various trait mapping data to map-based genome features such as annotated genes.

    下载了文件以后,查看内容:

    wc -l qdwnld82711OVKG.txt 
    30195 qdwnld82711OVKG.txt
    grep -v '^#' qdwnld82711OVKG.txt |less -SN
    

    圈出来的几行没有坐标,需要去除。
    awk命令查看第二列:
    grep -v '^#' qdwnld82711OVKG.txt |less -SN
    

    可以看到,没有坐标的行输出的是字符串。
    查看第二列的首字母:

    grep -v "^#" qdwnld82711OVKG.txt |awk '{print(substr($2,1,1))}' |less -SN
    

    统计:

    grep -v "^#" qdwnld82711OVKG.txt |awk '{print(substr($2,1,1))}' |sort | uniq -c
    

    我们只需要首字母是0-9的行。

    grep -v "^#" qdwnld82711OVKG.txt |awk '(substr($2,1,1) ~ /[0-9]/){print($0)}' |less -SN
    
    grep -v "^#" qdwnld82711OVKG.txt |awk '(substr($2,1,1) ~ /[0-9]/){print($0)}' |wc -l
    28594
    

    涨姿势

    1. awk命令字符串函数substr()
    2. awk命令中字符串匹配模式~ //

    相关文章

      网友评论

          本文标题:生信随手记2020-02-10:awk命令字符串操作

          本文链接:https://www.haomeiwen.com/subject/yqsaxhtx.html