美文网首页
1002 chapter 10 stringr 上篇

1002 chapter 10 stringr 上篇

作者: 森尼啊 | 来源:发表于2018-10-19 20:20 被阅读0次

    第十章想着一气看完把笔记发出来,结果!!看了好久都没看完没看懂,为了不打击我的积极性+ 赶进度,第十章后面的回头再看吧。。尴尬

    正则表达式
    library(tidyverse)
    library(stringr)

    基础

    • 单引号/双引号来创建字符串
    • 使用\对单引号或双引号进行”转义“
    • 换行符 \n ,制表符\t

    stringr函数

    以str_开头,RStudio中处理字符串

    字符串组合

    合并两个字符串str_c("x", "y",sep = ",") [1] "x,y"
    将字符向量合并为字符串 str_c(c("x","y","z"),collapse = ",") [1] "x,y,z"

    字符串取子集

    x <- c("apple","banana","pear") 
    str_sub(x,1,3) 
    [1] "app" "ban" "pea"
    str_sub(x,-3,-1) # 负数表示从后往前数
    [1] "ple" "ana" "ear"
    
    • str_to_lower()文本转为小写
    • str_to_upper()、str_to_title()文本转为大写

    区域设置

    查看维基百科中List of ISO 639-1 codes
    -str_sort(x, locale = "en") #英语

    p134练习题

    1. paste()和paste0() 的区别是前一个可以自定义分隔符
      ❌ || 答案: 前者中间默认有空格
    paste("foo", "bar")
     [1] "foo bar"
    paste0("foo", "bar")
     [1] "foobar"
    str_c("foo", "bar")
     [1] "foobar"
    

    str_c与paste0的功能相似,但是str_c()中,NA具有传染性,paste和paste0两个函数将NA返回成字符

     x <- "apple"
     y <- "banana"
     paste(x,y)
    [1] "apple banana"
     paste0(x,y)
    [1] "applebanana"
     z <- NA
     paste0(x,z)
    [1] "appleNA"
     paste(x,z)
    [1] "apple NA"
    str_c(x,z)
    [1] NA
    
    1. sep添加分隔符,collapse将字符向量合并成一个字符串
      3.一开始 ceiling()函数的功能不了解
      本来想先判断下是否是奇数个。。
     x <- c("abc",'abcd')
     n <- str_length(x)
     m <- ceiling(n/2)
     str_sub(x,m,m)
    [1] "b" "b"
     m <- floor(n/2)
     str_sub(x,m,m)
    [1] "a" "b"
    
    1. str_wrap()将字符串set到特定宽度,使得打印好看
      str_wrap(string, width = 80, indent = 0, exdent = 0)
    2. str_trim()修剪字符串中的空白
    str_trim(" abc ")
    #> [1] "abc"
    str_trim(" abc ", side = "left")
    #> [1] "abc "
    str_trim(" abc ", side = "right")
    #> [1] " abc"
    

    与str_trim相反的是str_pad()

    str_pad("abc", 5, side = "both")
    #> [1] " abc "
    str_pad("abc", 4, side = "right")
    #> [1] "abc "
    str_pad("abc", 4, side = "left")
    #> [1] " abc"
    

    模式匹配

    str_view()和str_view_all():接手一个字符向量和一个正则表达式,并显示如何匹配的。

    x <- c("apple", "banana","pear")
     str_view(x,"an")
    
    x <- c("apple", "banana","pear")
     str_view(x,".a.") # .匹配任意字符
    

    匹配. 的正则表达式是.,字符串是\.。

    p136 练习题

    1."": This will escape the next character in the R string.
    "\": This will resolve to \ in the regular expression, which will escape the next character in the regular expression.
    "\": The first two backslashes will resolve to a literal backslash in the regular expression, the third will escape the next character. So in the regular expression, this will escape some escaped character.

    1. ""'\" ❌ 答案 str_view("\"'\\", "\"'\\\\")
      ✔️ str_view("\"'\\\", "\"'\\\\\")也可以
    2. 点后面的任何字母都能匹配
      ???不是很理解

    锚点

    • ^ 从字符串开头进行匹配
    • $从字符串末尾开始匹配
    • \b匹配单词边界

    p137练习题

    1. str_view("^", "\$\\$$")
    str_view(stringr::words, "^y", match =TRUE)
    
    str_view(stringr::words, "x$", match = TRUE)
    
    str_view(stringr::words, "^...$", match = TRUE)
    
    str_view(stringr::words, ".......", match = TRUE)
    

    字符选项

    • \d
    • \s 匹配任意空白字符
    • [abc] 匹配成a,b,c
    • [^abc] 匹配成除a,b,c以外的任意字符

    p138 练习题

    1.

    str_view(stringr::words, "^a|e|i|o|u", match =TRUE)
    Error in loadNamespace(name) : there is no package called ‘htmlwidgets’
    install.packages("htmlwidgets")
    str_view(stringr::words, "^a|e|i|o|u", match =TRUE)
    

    || 答案:str_view(stringr::words, "^[aeiou]", match = TRUE)

    str_view(stringr::words, "[^aeiou]", match =TRUE)
    able
    about
    absolute
    accept
    account
    achieve
    across
    ...
    

    || 答案

    str_view(stringr::words, "^[^aeiou]+$", match=TRUE)
    

    符号+ 去掉之后,结果为空白,稍后章节会解释其含义

    str_view(stringr::words, "ed$| [^eed$]", match=TRUE)
    bed
    feed
    hundred
    indeed
    need
    proceed
    red
    speed
    ...
    

    ×|| 答案:

    str_view(stringr::words, "^ed$|[^e]ed$", match = TRUE)
    

    str_view(stringr::words, "ing$|ize$", match = TRUE)
    bring
    during
    evening
    king
    meaning
    morning
    organize
    recognize
    ring
    thing
    

    || 答案:str_view(stringr::words, "i(ng|se)$", match = TRUE)

    2

    str_view(stringr::words, "(cei|[^c]ie)", match = TRUE)
    

    找到a) c在前面,i在e后面 b)i在e前面,i前面是非c的单词

    str_view(stringr::words, "(cie|[^c]ei)", match = TRUE)
    
    sum(str_detect(stringr::words, "(cei|[^c]ie)"))
    sum(str_detect(stringr::words, "(cie|[^c]ei)"))
    

    3

    str_view(stringr::words, "q[^u]", match = TRUE)
    

    在stringr::words里是的

    4

    “ou” instead of “o”
    use of “ae” and “oe” instead of “a” and “o”
    ends in ise instead of ize
    ends in yse
    

    ou|ise$|ae|oe|yse$
    黑人问号

    5

    答案中,美国

    x <- c("123-456-7890", "1235-2351")
    str_view(x, "\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d")
    
    str_view(x, "[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]")
    
    str_view(x, "\\d{3}-\\d{3}-\\d{4}")
    

    中国应该是 xxxx-xxxxxxxx

    重复

    控制一个模式匹配多次

    • ?:0次或者多次‘
    • +:1次或者多次
    • *:0次或者多次
    • {n};匹配n次
    • {n,}: 匹配n次或者多次
    • {,m}:最多匹配m次
    • {n,m}:匹配n到m次

    p139练习

    1

    ?: {0,1}
    +: {1,}
    *:{0,}

    2

    ① 匹配任意字符
    ② 匹配任意花括号里有至少一个字母的字符
    ③日期的表达,YYYY-MM-DD
    ④ \{4},会匹配4个反斜杠

    3

    a

    str_view(stringr::words, "^[^aeiou]{3}", match=TRUE)
    

    || 答案: str_view(words, "^[^aeiou]{3}") 答案是不是错了。。。。
    b

    str_view(words, "[aeiou]{3,}")
    

    ???
    c

    str_view(words, "([aeiou][^aeiou]){2,}")
    

    分组与回溯引用

    ()可定义“分组”

    p140练习

    不是很懂,用的时候再看吧。。。罪过

    相关文章

      网友评论

          本文标题:1002 chapter 10 stringr 上篇

          本文链接:https://www.haomeiwen.com/subject/wpxaoftx.html