美文网首页
R 函数学习 - grep()

R 函数学习 - grep()

作者: Thinkando | 来源:发表于2020-05-18 21:48 被阅读0次

    grep()能对向量中特定条件的元素进行查询,默认return为index。grep()语法与grep()大致相似,但默认return为logical。

    参考 :https://www.jianshu.com/p/11bbfa8e98c5

    grep()

    代码如下:

    grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
         fixed = FALSE, useBytes = FALSE, invert = FALSE)
    
    

    grep()函数参数:

    参数 功能
    pattern 包含正则表达式的字符串
    x 寻找匹配的字符向量,或者可以通过字符向量强制转换的对象。支持长向量
    ignore.case 如果为FALSE,则模式匹配区分大小写;如果为TRUE,则在匹配期间忽略大小写
    perl 如果为TRUE,使用perl匹配的正则表达式
    value 如果为FALSE,则返回包含由grep确定的匹配的索引的向量,如果为TRUE,则返回包含匹配元素本身的向量
    fixed 如果为TRUE,则pattern是要按原样匹配的字符串
    useBytes 如果为TRUE,则匹配是逐字节而不是逐字符完成的
    invert 如果为TRUE,则返回不匹配的元素的索引或值

    R 语言中的正则表达式

    正则表达式符号 含义
    ^ 匹配一个字符串的开始
    $ 匹配一个字符串的结尾
    . 匹配除了换行符以外的任一字符
    * 匹配所有含有*后的字符
    ? 匹配所有含有?后的字符
    + 匹配所有含有+后的字符
    .* 可以匹配任意字符
    | 表示逻辑的或
    [^] 表示逻辑的补集
    [] 匹配多个字符,如果不使用任何分隔符号,则搜寻这个集合
    [-] 匹配一个范围
    贪婪和懒惰规则

    默认情况下是匹配尽可能多的字符,是为贪婪匹配,比如sub("a.b","",c("aabab","eabbe")),默认匹配最长的a开头b结尾的字串,也就是整个字符串。如果要进行懒惰匹配,也就是匹配最短的字串,只需要在后面加个“?”,比如sub("a.?b","",c("aabab","eabbe")),就会匹配最开始找到的最短的a开头b结尾的字串。

    grep()函数实例:

    1. ^ 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1')
    Results <- grep('^C', Protein, value = T)
    Results
    
    
    image
    2. $ 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1')
    Results <- grep('2$', Protein, value = T)
    Results
    
    
    image
    3. . 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                'GLS','GLS2','NADSYN1')
    Results <- grep('MCM.', Protein, value = T)
    Results
    
    
    image
    4. * 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
                 'DDO','DCLRE1C','DLC1')
    Results <- grep('*2', Protein, value = T)
    Results
    
    
    image
    5. ? 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
                 'DDO','DCLRE1C','DLC1','USP11')
    Results <- grep('?D', Protein, value = T)
    Results
    
    
    image
    6. + 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
                 'DDO','DCLRE1C','DLC1','USP11')
    Results <- grep('+D', Protein, value = T)
    Results
    
    
    image
    7. .* 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
                 'DDO','DCLRE1C','DLC1','USP11')
    Results <- grep('T.*3', Protein, value = T)
    Results
    
    
    image
    8. | 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
                 'DDO','DCLRE1C','DLC1','USP11')
    Results <- grep('^T|*3', Protein, value = T)
    Results
    
    
    image
    9. [^] 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
                 'DDO','DCLRE1C','DLC1','USP11')
    Results <- grep('[^TP53]', Protein, value = T)
    Results
    
    
    image
    10. [] 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
                 'DDO','DCLRE1C','DLC1','USP11')
    Results <- grep('[4,3,9,6]', Protein, value = T)
    Results
    
    
    image
    11. [-] 的使用:
    Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
                 'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
                 'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
                 'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
                 'DDO','DCLRE1C','DLC1','USP11')
    Results <- grep('[1-3]', Protein, value = T)
    Results
    
    
    image

    相关文章

      网友评论

          本文标题:R 函数学习 - grep()

          本文链接:https://www.haomeiwen.com/subject/jsfmohtx.html