美文网首页
R语言 --- split 二三事

R语言 --- split 二三事

作者: 日月其除 | 来源:发表于2021-07-14 14:46 被阅读0次

最近有很多对文件的操作,经常使用到split函数,但是存在三个split函数,有时候会弄混,谨以此文以记之。
1. split()
2. str_spit()
3. strsplit()


  • split()
Usage
split(x, f, drop = FALSE, ...)
## Default S3 method:
split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)
split(x, f, drop = FALSE, ...) <- value
unsplit(value, f, drop = FALSE)


Arguments
x   vector or data frame containing values to be divided into groups.

f   a ‘factor’ in the sense that as.factor(f) defines the grouping, or a list of such factors in which case their interaction is used for the grouping. If x is a data frame, f can also be a formula of the form ~ g to split by the variable g, or more generally of the form ~ g1 + ... + gk to split by the interaction of the variables g1, ..., gk, where these variables are evaluated in the data frame x using the usual non-standard evaluation rules.

drop     logical indicating if levels that do not occur should be dropped (if f is a factor or a list).

value   a list of vectors or data frames compatible with a splitting of x. Recycling applies if the lengths do not match.

sep character string, passed to interaction in the case where f is a list.

lex.order   logical, passed to interaction when f is a list.

... 
further potential arguments passed to methods.

总结: split(参数):split(向量/列表/数据框,因子/因子列表)
split()函数可以分组数据框和向量,返回list。
可以直接使用unsplit()。
split是按照factor去切分vector或者数据框,因此不能这样用:

> split(c('1_1', '2-2', '3_3'), '_')
$`_`
[1] "1_1" "2-2" "3_3"

切割数据框的用法:

> data = data.frame(v1 = c(1,1,2,2,3,3), v2 = c('a', 'b', 'c', 'd','e','f'))
> data
  v1 v2
1  1  a
2  1  b
3  2  c
4  2  d
5  3  e
6  3  f
> split(data, data$v1) #返回一个list,按照v1分组
$`1`
  v1 v2
1  1  a
2  1  b

$`2`
  v1 v2
3  2  c
4  2  d

$`3`
  v1 v2
5  3  e
6  3  f

针对vector的用法:

> x = c(rep(1:10, 2))
> f = gl(10,1)
> x
 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9
[20] 10
> f
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 2 3 4 5 6 7 8 9 10
> split(x,f)
$`1`
[1] 1 1

$`2`
[1] 2 2

$`3`
[1] 3 3

$`4`
[1] 4 4

$`5`
[1] 5 5

$`6`
[1] 6 6

$`7`
[1] 7 7

$`8`
[1] 8 8

$`9`
[1] 9 9

$`10`
[1] 10 10

  • str_split()
    来自R包stringr
    有两种形式 str_split() & str_split_fixed()
    str_split() 修改simplify = T效果等同于 str_split_fixed()
Usage
str_split(string, pattern, n = Inf, simplify = FALSE)
str_split_fixed(string, pattern, n)

Arguments
string  Input vector. Either a character vector, or something coercible to one.

pattern Pattern to look for.

The default interpretation is a regular expression, as described in stringi::stringi-search-regex. Control options with regex().

Match a fixed string (i.e. by comparing only bytes), using fixed(). This is fast, but approximate. Generally, for matching human text, you'll want coll() which respects character matching rules for the specified locale.

Match character, word, line and sentence boundaries with boundary(). An empty pattern, "", is equivalent to boundary("character").

n   number of pieces to return. Default (Inf) uses all possible split positions.

For str_split_fixed, if n is greater than the number of pieces, the result will be padded with empty strings.

simplify    If FALSE, the default, returns a list of character vectors. If TRUE returns a character matrix.

str_spllit()主要用于 split a vector of strings. 返回一个list。
str_spllit_fixed()可以返回一个matrix。
举个栗子:

> str_split(c('1_2','1_1','2_2','3'), '_')
[[1]]
[1] "1" "2"

[[2]]
[1] "1" "1"

[[3]]
[1] "2" "2"

[[4]]
[1] "3"
> str_split_fixed(c('1_2','1_1','2_2','3'), pattern = '_', n =2)
     [,1] [,2]
[1,] "1"  "2" 
[2,] "1"  "1" 
[3,] "2"  "2" 
[4,] "3"  ""

str_split(c('1_2','1_1','2_2','3'), pattern = '_', n =2, simplify = T)
[,1] [,2]
[1,] "1" "2"
[2,] "1" "1"
[3,] "2" "2"
[4,] "3" ""


  • strsplit()
    对character组成的vector进行切割。返回一个list。
    fixed = T 可有对.分割的charactor切割。 对于其他的分割符不需要额外添加fixed = T
Description
Split the elements of a character vector x into substrings according to the matches to substring split within them.

Usage
strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)
Arguments
x   character vector, each element of which is to be split. Other inputs, including a factor, will give an error.

split   character vector (or object which can be coerced to such) containing regular expression(s) (unless fixed = TRUE) to use for splitting. If empty matches occur, in particular if split has length 0, x is split into single characters. If split has length greater than 1, it is re-cycled along x.

fixed   logical. If TRUE match split exactly, otherwise use regular expressions. Has priority over perl.

perl    logical. Should Perl-compatible regexps be used?

useBytes    logical. If TRUE the matching is done byte-by-byte rather than character-by-character, and inputs with marked encodings are not converted. This is forced (with a warning) if any input is found which is marked as "bytes" (see Encoding).

上栗子:

> strsplit(c('1.2','1.1','2.2','3'), split = '.')
[[1]]
[1] "" "" ""

[[2]]
[1] "" "" ""

[[3]]
[1] "" "" ""

[[4]]
[1] ""

> strsplit(c('1.2','1.1','2.2','3'), split = '.', fixed = T)
[[1]]
[1] "1" "2"

[[2]]
[1] "1" "1"

[[3]]
[1] "2" "2"

[[4]]
[1] "3"

> strsplit("a.b.c", "[.]")
[[1]]
[1] "a" "b" "c"

> strsplit(c('1_2','1_1','2_2','3'), split = '_')
[[1]]
[1] "1" "2"

[[2]]
[1] "1" "1"

[[3]]
[1] "2" "2"

[[4]]
[1] "3"
## Note that final empty strings are not produced:
strsplit(paste(c("", "a", ""), collapse="#"), split="#")[[1]]
# [1] ""  "a"
## and also an empty string is only produced before a definite match:
strsplit("", " ")[[1]]    # character(0)
strsplit(" ", " ")[[1]]   # [1] ""

相关文章

  • R语言 --- split 二三事

    最近有很多对文件的操作,经常使用到split函数,但是存在三个split函数,有时候会弄混,谨以此文以记之。1. ...

  • R语言_split()函数用法

    前言:微博参与话题 #给你四年时间你也学不会生信# Divide into Groups and Reassemb...

  • R语言作图——Split violin plot

    原创:黄小仙 最近小仙同学在好几篇文献里看到了这种小提琴图,暂时就肤浅地认为这是作者为了更好地比较对照组与实验组的...

  • 20190506-R语言向量操作函数split

    split:Usagesplit(x, f, drop = FALSE, ...);将向量x根据因子f进行分组,返...

  • 详解R中的apply家族函数

    R语言中提供了一系列apply()的函数,为数据分析中Split-Apply-Combine的策略提供了简洁方便的...

  • 读《R Commander操作手冊》

    对很多刚接触R语言的朋友来讲,R语言就意味编程,而编程是一件很难的事,至少是只有程序员才会做的事。其实用R语言并不...

  • R语言----按照列的信息对行分组

    R语言使用技巧当你要对按照数据框某一列的信息对文件进行分组时1. 可以使用split函数2. 可以使用group_...

  • 学习小组Day4笔记-皇晓燕

    R语言和R studio R语言是全面的统计分析平台,计算作图等等 R studio是R语言的操作平台 下载R语言...

  • Day4 学习小组--张小张

    今天是 R 语言基础的学习 了解R与Rstudio R 语言是一款统计软件; R 语言也是一门编程语言,语言也是一...

  • 学习小组Day4笔记--扬马延

    R语言学习 1. R以及R studio安装 直接搜索R语言网页可直接安装 2. R语言入门 参考书目《R for...

网友评论

      本文标题:R语言 --- split 二三事

      本文链接:https://www.haomeiwen.com/subject/ecqdpltx.html