美文网首页R数据清洗生信学习
stringr-处理字符的函数

stringr-处理字符的函数

作者: 萍智医信 | 来源:发表于2022-01-30 18:19 被阅读0次

    准备工作:安装R包载入数据

    rm(list = ls())
    if(!require(stringr))install.packages('stringr')
    library(stringr)
    x <- "The birch canoe slid on the smooth planks."
    

    1.检测字符串长度

    length(x)
    str_length(x)
    str_length(" ")
    
    最后一行代码说明空格也占一个字符

    2.字符串拆分与组合

    str_split(x," ")
    class(str_split(x," "))
    

    可以看出拆分后,向量变成了列表,可以通过列表取子集的方式来重新提取向量。

    x2 = str_split(x," ")[[1]]
    class(x2)
    x2
    

    用下列代码拆分后生成的是矩阵

    str_split(x," ",simplify = T)
    class(str_split(x," ",simplify = T))
    

    下面我们把拆分的字符合并起来

    x2
    str_c(x2,collapse = " ")
    str_c(x2,1234,sep = "+")
    

    3.提取字符串的一部分

    x
    str_sub(x,5,9)
    

    很明显空格占一个字符。

    4.大小写转换

    #全部转换成大写
    str_to_upper(x2)
    #全部转换成小写
    str_to_lower(x2)
    #全部首字母大写
    str_to_title(x2)
    

    5.字符串排序

    x2
    str_sort(x2)
    

    按26英文字母顺序排序

    6.字符检测

    str_detect(x2,"h")
    str_starts(x2,"T")
    str_ends(x2,"e")
    
    与sum和mean连用,可以统计匹配的个数和比例
    str_detect(x2,"h")
    sum(str_detect(x2,"h"))
    mean(str_detect(x2,"h"))
    

    mean(str_detect(x2,"h"))得出的结果为什么是0.5,看下图,先把str_detect(x2,"h")得出的逻辑型向量转换成数值型向量,TURE:1,FALSE:0,其中1占4个,总数为8,4/8=0.5,故TURE占50%,x2向量中含h占总数的50%。

    7.提取匹配到的字符串

    x2
    #方法一
    str_subset(x2,"h")
    #方法二
    x2[str_detect(x2,"h")]
    

    8.字符计数

    x
    str_count(x," ")
    

    统计x中的空格数,有7个空格

    x2
    str_count(x2,"o")
    

    x2向量中,每个元素中o的个数

    str_count(x)
    length(x)
    x
    str_count(x2)
    length(x2)
    x2
    

    9.字符串替换

    x2
    str_replace(x2,"o","A")
    str_replace_all(x2,"o","A")
    

    ------------------------------------------小练习----------------------------------------

    #Bioinformatics is a new subject of genetic data collection,analysis and dissemination to the research community.
    #1.将上面这句话作为一个长字符串,赋值给tmp
    tmp = "Bioinformatics is a new subject of genetic data collection,analysis and dissemination to the research community."
    #2.拆分为一个由单词组成的向量,赋值给tmp2(注意标点符号)
    library(stringr)
    tmp2 = tmp %>% 
      str_replace(","," ") %>%
      str_remove("[.]") %>% 
      str_split(" ")
    tmp2 = tmp2[[1]]
    

    参考资料:生信技能树-小洁老师

    相关文章

      网友评论

        本文标题:stringr-处理字符的函数

        本文链接:https://www.haomeiwen.com/subject/ozqqfrtx.html