53.关于模式匹配的一些函数（三）

作者: 心惊梦醒 | 来源:发表于2021-09-05 07:41 被阅读0次

【上一篇：52.关于模式匹配的一些函数（二）】
【下一篇：54.关于模式匹配的一些函数（四）】

模式匹配和替换相关的函数包含以下几种功能：
1. 确定匹配模式的字符串
2. 找到匹配的位置
3. 提取匹配的内容
4. 用新值替换匹配项
5. 基于匹配拆分字符串

模式匹配示意图

Base R 中的模式匹配函数有：grep, grepl, regexpr, gregexpr, regexec and gregexec，模式替换函数有：sub、gsub。

3.提取match部分的内容

str_extract()和str_extract_all()函数用来获取match部分的内容，两个函数的Usage：

str_extract(string, pattern)
str_extract_all(string, pattern, simplify = FALSE)
********************
str_extract()的返回值是一个与输入等长的向量，有匹配返回匹配的内容，否则用NA表示
str_extract_all()默认返回一个与输入等长的列表，元素类型为字符；设置simplify=TRUE生成一个矩阵。

str_match()和str_match_all()函数也可获取match的内容，但更进一步提取match的group。
str_extract的Title是Extract matching patterns from a string。
str_match的Title是Extract matched groups from a string。
所谓group，就是用小括号括起来的pattern或pattern的一部分。前面经过小括号在正则匹配中可以用来明确优先级，还可以用在匹配捕获分组的反向引用上。在反向引用中，capture group将与正则表达式部分匹配的字符串部分存储在括号内，因此一个括号代表一个字，通过\n（n=1,2,3......）获得第n个括号中的内容。
str_extract()提取complete match的内容，str_match提取complete match的内容+每个group的内容。str_match的返回值是一个字符矩阵，矩阵第一列是complete match的内容，之后每列是每个group的内容。

str_match(string, pattern)
str_match_all(string, pattern)

str_match_all()返回一个与输入等长的list

如果输入是个数据框，可用tidyr::extract()函数实现与str_macth相似的功能。注意：tidyr::extract()需要指明group的名字，即into参数。

extract(
  data,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)

tibble(sentence = sentences) %>% 
  tidyr::extract(
    sentence, c("article", "noun"), "(a|the) ([^ ]+)", 
    remove = FALSE
  )

# A tibble: 720 x 3
   sentence                                    article noun   
   <chr>                                       <chr>   <chr>  
 1 The birch canoe slid on the smooth planks.  the     smooth 
 2 Glue the sheet to the dark blue background. the     sheet  
 3 It's easy to tell the depth of a well.      the     depth  
 4 These days a chicken leg is a rare dish.    a       chicken
 5 Rice is often served in round bowls.        NA      NA     
 6 The juice of lemons makes fine punch.       NA      NA     
 7 The box was thrown beside the parked truck. the     parked 
 8 The hogs were fed chopped corn and garbage. NA      NA     
 9 Four hours of steady work faced us.         NA      NA     
10 Large size in stockings is hard to sell.    NA      NA

【上一篇：52.关于模式匹配的一些函数（二）】
【下一篇：54.关于模式匹配的一些函数（四）】

网友评论

本文标题：53.关于模式匹配的一些函数（三）

本文链接：https://www.haomeiwen.com/subject/mzifwltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

53.关于模式匹配的一些函数（三）

3.提取match部分的内容

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读