重点:数据框
1.数据框来源
(1)在R中新建
(2)由已有数据转换或处理得到
(3)从文件中读取
(4)内置数据集
2.新建和读取数据框
> options(stringsAsFactors = FALSE)
#“在读入数据时,遇到字符串之后,不将其转换为factors,仍然保留为字符串格式”
> df <- data.frame(gene = c("gene1","gene2","gene3"),
+ sam = c("sample1","sample2","sample3"),
+ exp = c(32,34,45))
> df
gene sam exp
1 gene1 sample1 32
2 gene2 sample2 34
3 gene3 sample3 45
> df <- data.frame(gene = paste0("gene",1:3),
+ sam = paste0("sample",1:3),
+ exp = c(32,34,45))
> df
gene sam exp
1 gene1 sample1 32
2 gene2 sample2 34
3 gene3 sample3 45
> df2 <- read.csv("gene.csv")
> df2
gene sam exp
1 gene1 sample1 32
2 gene2 sample2 34
3 gene3 sample3 45
3.数据框属性描述
> #1、维度dim(行,列)nrow/ncol
> dim(df)
[1] 3 3
> nrow(df)
[1] 3
> ncol(df)
[1] 3
> #行名列名
> rownames(df)
[1] "1" "2" "3"
> colnames(df)
[1] "gene" "sam" "exp"
[Day1-R 01-get_start](https://www.jianshu.com/p/b188a9713b0e)
4.数据框取子集;一个维度为一个向量
> df[2,2]
[1] "sample2"
> df[2,]
gene sam exp
2 gene2 sample2 34
> df[,2]
[1] "sample1" "sample2" "sample3"
> df[c(1,3),1:2]#因为是每个维度都是向量,且向量不相隔,所以c(1,3)
gene sam
1 gene1 sample1
3 gene3 sample3
> df[,"gene"]
[1] "gene1" "gene2" "gene3"
> df[,c('gene','exp')]
gene exp
1 gene1 32
2 gene2 34
3 gene3 45
> df$exp #删掉exp,按tab键试试
[1] 32 34 45
> mean(df$exp)
[1] 37
> #改一个格
> df[3,3]<- 5
> #改一整列
> df$exp<-c(12,23,50)
> #?
> df$abc <-c(23,15,37)
> df
gene sam exp abc
1 gene1 sample1 12 23
2 gene2 sample2 23 15
3 gene3 sample3 50 37
> #改行名和列名
> rownames(df) <- c("r1","r2","r3")
> #只修改某一行/列的名
> rownames(df)[2]="x"
6.数据框进阶
(1)转置
> t(df)
r1 x r3
gene "gene1" "gene2" "gene3"
sam "sample1" "sample2" "sample3"
exp "12" "23" "50"
abc "23" "15" "37"
> class(t(df))
[1] "matrix"
2)生成一个有NA的数据框
> df<-data.frame(X1 = LETTERS[1:5],X2 = 1:5)
> df[2,2] <- NA
> df[4,1] <- NA
> df
X1 X2
1 A 1
2 B NA
3 C 3
4 <NA> 4
5 E 5
> na.omit(df)
X1 X2
1 A 1
3 C 3
5 E 5
>
(3)两个表格的链接
> test1 <- data.frame(x = c('b','e','f'),
+ z = c("A","B","C"),
+ stringsAsFactors = F)
> test1
x z
1 b A
2 e B
3 f C
> test2 <- data.frame(x = c('a','b','c','d','e','f'),
+ y = c(1,2,3,4,5,6),
+ stringsAsFactors = F)
> test2
x y
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
> merge(test1,test2,by="x")
x z y
1 b A 2
2 e B 5
3 f C 6
(4)行数较多的数据框可截取前/后几行查看
> iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1
......
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> head(iris,3)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
> tail(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
(5)行列数都多的数据框可取前几行前几列查看
> iris[1:3,1:3]
Sepal.Length Sepal.Width Petal.Length
1 5.1 3.5 1.4
2 4.9 3.0 1.4
3 4.7 3.2 1.3
(6) 查看每一列的数据类型和具体内容
> str(df)
'data.frame': 5 obs. of 2 variables:
$ X1: chr "A" "B" "C" NA ...
$ X2: int 1 NA 3 4 5
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
矩阵和列表
> m <- matrix(1:9, nrow = 3)
> colnames(m) <- c("a","b","c") #列名
> m
a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> #整行
> m[2,]
a b c
2 5 8
> #整列
> m[,1]
[1] 1 2 3
> #单个格
> m[2,3]
c
8
> #多个格
> m[2:3,1:2]
a b
[1,] 2 5
[2,] 3 6
> #列表
> l <- list(m=matrix(1:9, nrow = 3),
+ df=data.frame(gene = paste0("gene",1:3),
+ sam = paste0("sample",1:3),
+ exp = c(32,34,45)),
+ x=c(1,3,5))
> l
$m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
$df
gene sam exp
1 gene1 sample1 32
2 gene2 sample2 34
3 gene3 sample3 45
$x
[1] 1 3 5
> l[[2]]
gene sam exp
1 gene1 sample1 32
2 gene2 sample2 34
3 gene3 sample3 45
> l$df
gene sam exp
1 gene1 sample1 32
2 gene2 sample2 34
3 gene3 sample3 45
补充:元素的名字
#(1)向量
> x=1:10
> names(x)=letters[1:10]
> x
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
> x["a"]
a
1
#(2)数据框
> df
X1 X2
1 A 1
2 B NA
3 C 3
4 <NA> 4
5 E 5
> names(df)
[1] "X1" "X2"
> df[,"X1"]
[1] "A" "B" "C" NA "E"
#(3)列表
> names(l)
[1] "m" "df" "x"
> l[["df"]]
gene sam exp
1 gene1 sample1 32
2 gene2 sample2 34
3 gene3 sample3 45
> #删除一个
> rm(l)
> #删除多个
> rm(df,m)
> #删除全部
> rm(list = ls())
网友评论