Cite: http://adv-r.had.co.nz/Data-structures.html
R's base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether they're homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise to the five data types most often used in data analysis:
Homogeneous | Heterogeneous | |
---|---|---|
1d (vector) | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array | - |
Note that R has no 0-dimensional, or scalar types. Individual numbers or strings, which you might think would be scalars, are actually vectors
of length one.
Given an object, the best way to understand what data structures it’s composed of is to use str()
:
vector
and matrix
are just aliases for one- and two-dimensional array
respectively.
Vector
The basic data structure in R is the vector
. Vectors come in two flavours: atomic vector
and list
. They have three common properties:
- Type,
typeof()
, what it is. - Length,
length()
, how many elements it contains. - Attributes,
attributes()
, additional arbitrary metadata.
Atomic vector
There are four common types of atomic vectors: logical
, integer
, double
(often called numeric
), and character
. There are two rare types that I will not discuss further: complex
and raw
. Atomic vectors
are usually created with c()
, short for combine.
Atomic vectors are always flat, even if you nest c()’s:
c(1, c(2, c(3, 4)))
#> [1] 1 2 3 4
# the same as
c(1, 2, 3, 4)
#> [1] 1 2 3 4
Given a vector, you can determine its type with typeof()
, or check if it's a specific type with an "is" function:
is.character()
is.double()
is.integer()
is.logical()
# or, more generally
is.atomic()
# examples
int_var <- c(1L, 6L, 10L)
typeof(int_var)
#> [1] "integer"
is.integer(int_var)
#> [1] TRUE
is.atomic(int_var)
#> [1] TRUE
dbl_var <- c(1, 2.5, 4.5)
typeof(dbl_var)
#> [1] "double"
is.double(dbl_var)
#> [1] TRUE
is.atomic(dbl_var)
#> [1] TRUE
is.numeric()
相当于 is.integer() | is.double()
:
is.numeric(int_var)
#> [1] TRUE
is.numeric(dbl_var)
#> [1] TRUE
List
You construct lists by using list()
instead of c()
:
x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(x)
#> List of 4
#> $ : int [1:3] 1 2 3
#> $ : chr "a"
#> $ : logi [1:3] TRUE FALSE TRUE
#> $ : num [1:2] 2.3 5.9
Lists are sometimes called recursive vectors, because a list can contain other lists:
x <- list(list(list(list())))
str(x)
#> List of 1
#> $ :List of 1
#> ..$ :List of 1
#> .. ..$ : list()
is.recursive(x)
#> [1] TRUE
c()
will combine several lists into one. If given a combination of atomic vectors and lists, c()
will coerce the vectors to lists before combining them. Compare the results of list()
and c()
:
x <- list(list(1, 2), c(3, 4))
y <- c(list(1, 2), c(3, 4))
str(x)
#> List of 2
#> $ :List of 2
#> ..$ : num 1
#> ..$ : num 2
#> $ : num [1:2] 3 4
str(y)
#> List of 4
#> $ : num 1
#> $ : num 2
#> $ : num 3
#> $ : num 4
You can turn a list into an atomic vector with unlist()
. If the elements of a list have different types, unlist()
uses the same coercion rules as c()
.
Lists are used to build up many of the more complicated data structures in R. For example, both data frames (described in data frames) and linear models objects (as produced by lm()
) are lists:
is.list(mtcars)
#> [1] TRUE
mod <- lm(mpg ~ wt, data = mtcars)
is.list(mod)
#> [1] TRUE
网友评论