R 字符串处理函数

作者: 上校的猫 | 来源:发表于2019-10-15 17:13 被阅读0次

    grep {base} 系列

    grep 系列返回符合正则条件的元素在向量中位置、本身、或者逻辑值。

    grep("a",c("a1","a2","b1","b2"))
    ## [1] 1 2
    
    grep("a",c("a1","a2","b1","b2"),value = T)
    ## [1] "a1" "a2"
    
    grepl("a",c("a1","a2","b1","b2"))
    ## [1]  TRUE  TRUE FALSE FALSE
    

    sub 替换第一次匹配的元素,gsub是贪婪模式,替换所有匹配到的。

    sub("a",replacement = "A",x=c("a1a","a2","b1","b2"))
    ## [1] "A1a" "A2"  "b1"  "b2" 
    
    gsub("a",replacement = "A",x=c("a1a","a2","b1","b2"))
    ## [1] "A1A" "A2"  "b1"  "b2" 
    

    regexpr 和 gregexpr 不是返回在向量中的位置,而是分别返回在每个元素中的位置。

    regexpr("a",c("1aa","12aba","123abca"))
    ## [1] 2 3 4
    ## attr(,"match.length")
    ## [1] 1 1 1
    ## attr(,"index.type")
    ## [1] "chars"
    ## attr(,"useBytes")
    ## [1] TRUE
    
    gregexpr("a",c("1aa","12aba","123abca"))
    # [[1]]
    # [1] 2 3
    # attr(,"match.length")
    # [1] 1 1
    # attr(,"index.type")
    # [1] "chars"
    # attr(,"useBytes")
    # [1] TRUE
    # 
    # [[2]]
    # [1] 3 5
    # attr(,"match.length")
    # [1] 1 1
    # attr(,"index.type")
    # [1] "chars"
    # attr(,"useBytes")
    # [1] TRUE
    # 
    # [[3]]
    # [1] 4 7
    # attr(,"match.length")
    # [1] 1 1
    # attr(,"index.type")
    # [1] "chars"
    # attr(,"useBytes")
    # [1] TRUE
    

    substr {base} 系列

    提取或者替换元素中起始位置之间的内容。

    x <- c("abc123","abc456","abc789")
    substr(x, start=2, stop=4)
    # [1] "bc1" "bc4" "bc7"
    
    substring(x, first=2) # stop = 1000000L
    # [1] "bc123" "bc456" "bc789"
    
    substr(x, start=2, stop=4) <- "***"
    x
    # [1] "a***23" "a***56" "a***89"
    

    paste() 和 strsplit()

    粘合和分割字符串

    paste("a","b",sep="-")
    # [1] "a-b"
    
    strsplit(c("a-b","c-d"),split="-")
    # [[1]]
    # [1] "a" "b"
    # 
    # [[2]]
    # [1] "c" "d"
    

    学了这些发现提取特定模式里的字符串用基础函数还是很麻烦,还是去学习stringr包吧。

    相关文章

      网友评论

        本文标题:R 字符串处理函数

        本文链接:https://www.haomeiwen.com/subject/wupsmctx.html