美文网首页
Day1-R 02-data_structure

Day1-R 02-data_structure

作者: 养猪场小老板 | 来源:发表于2020-01-16 23:47 被阅读0次

重点:数据框

1.数据框来源

(1)在R中新建
(2)由已有数据转换或处理得到
(3)从文件中读取
(4)内置数据集

2.新建和读取数据框

> options(stringsAsFactors = FALSE)
#“在读入数据时,遇到字符串之后,不将其转换为factors,仍然保留为字符串格式”
> df <- data.frame(gene = c("gene1","gene2","gene3"),
+                  sam  = c("sample1","sample2","sample3"),
+                  exp  = c(32,34,45))
> df
   gene     sam exp
1 gene1 sample1  32
2 gene2 sample2  34
3 gene3 sample3  45
> df <- data.frame(gene  = paste0("gene",1:3),
+                  sam   = paste0("sample",1:3),
+                  exp   = c(32,34,45))
> df
   gene     sam exp
1 gene1 sample1  32
2 gene2 sample2  34
3 gene3 sample3  45
> df2 <- read.csv("gene.csv")
> df2
   gene     sam exp
1 gene1 sample1  32
2 gene2 sample2  34
3 gene3 sample3  45

3.数据框属性描述

> #1、维度dim(行,列)nrow/ncol
> dim(df)
[1] 3 3
> nrow(df)
[1] 3
> ncol(df)
[1] 3
> #行名列名
> rownames(df)
[1] "1" "2" "3"
> colnames(df)
[1] "gene" "sam"  "exp" 
[Day1-R 01-get_start](https://www.jianshu.com/p/b188a9713b0e)

4.数据框取子集;一个维度为一个向量

> df[2,2]
[1] "sample2"
> df[2,]
   gene     sam exp
2 gene2 sample2  34
> df[,2]
[1] "sample1" "sample2" "sample3"
> df[c(1,3),1:2]#因为是每个维度都是向量,且向量不相隔,所以c(1,3)
   gene     sam
1 gene1 sample1
3 gene3 sample3
> df[,"gene"]
[1] "gene1" "gene2" "gene3"
> df[,c('gene','exp')]
   gene exp
1 gene1  32
2 gene2  34
3 gene3  45
> df$exp  #删掉exp,按tab键试试
[1] 32 34 45
> mean(df$exp)
[1] 37
> #改一个格
> df[3,3]<- 5
> #改一整列
> df$exp<-c(12,23,50)     
> #?
> df$abc <-c(23,15,37) 
> df
   gene     sam exp abc
1 gene1 sample1  12  23
2 gene2 sample2  23  15
3 gene3 sample3  50  37
> #改行名和列名
> rownames(df) <- c("r1","r2","r3")
> #只修改某一行/列的名
> rownames(df)[2]="x"

6.数据框进阶

(1)转置

> t(df)
     r1        x         r3       
gene "gene1"   "gene2"   "gene3"  
sam  "sample1" "sample2" "sample3"
exp  "12"      "23"      "50"     
abc  "23"      "15"      "37"     
> class(t(df))
[1] "matrix"

2)生成一个有NA的数据框

> df<-data.frame(X1 = LETTERS[1:5],X2 = 1:5)
> df[2,2] <- NA
> df[4,1] <- NA
> df
    X1 X2
1    A  1
2    B NA
3    C  3
4 <NA>  4
5    E  5
> na.omit(df)
  X1 X2
1  A  1
3  C  3
5  E  5
>

(3)两个表格的链接

> test1 <- data.frame(x = c('b','e','f'), 
+                     z = c("A","B","C"),
+                     stringsAsFactors = F)
> test1
  x z
1 b A
2 e B
3 f C
> test2 <- data.frame(x = c('a','b','c','d','e','f'), 
+                     y = c(1,2,3,4,5,6),
+                     stringsAsFactors = F)
> test2 
  x y
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
> merge(test1,test2,by="x")
  x z y
1 b A 2
2 e B 5
3 f C 6

(4)行数较多的数据框可截取前/后几行查看

> iris
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
7            4.6         3.4          1.4         0.3     setosa
8            5.0         3.4          1.5         0.2     setosa
9            4.4         2.9          1.4         0.2     setosa
10           4.9         3.1          1.5         0.1  
......
145          6.7         3.3          5.7         2.5  virginica
146          6.7         3.0          5.2         2.3  virginica
147          6.3         2.5          5.0         1.9  virginica
148          6.5         3.0          5.2         2.0  virginica
149          6.2         3.4          5.4         2.3  virginica
150          5.9         3.0          5.1         1.8  virginica
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> head(iris,3)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
> tail(iris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
145          6.7         3.3          5.7         2.5 virginica
146          6.7         3.0          5.2         2.3 virginica
147          6.3         2.5          5.0         1.9 virginica
148          6.5         3.0          5.2         2.0 virginica
149          6.2         3.4          5.4         2.3 virginica
150          5.9         3.0          5.1         1.8 virginica

(5)行列数都多的数据框可取前几行前几列查看

> iris[1:3,1:3]
  Sepal.Length Sepal.Width Petal.Length
1          5.1         3.5          1.4
2          4.9         3.0          1.4
3          4.7         3.2          1.3

(6) 查看每一列的数据类型和具体内容

> str(df)
'data.frame':   5 obs. of  2 variables:
 $ X1: chr  "A" "B" "C" NA ...
 $ X2: int  1 NA 3 4 5
> str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
矩阵和列表
> m <- matrix(1:9, nrow = 3)
> colnames(m) <- c("a","b","c") #列名
> m
     a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> #整行
> m[2,]
a b c 
2 5 8 
> #整列
> m[,1]
[1] 1 2 3
> #单个格
> m[2,3]
c 
8 
> #多个格
> m[2:3,1:2]
     a b
[1,] 2 5
[2,] 3 6
> #列表
> l <- list(m=matrix(1:9, nrow = 3),
+           df=data.frame(gene  = paste0("gene",1:3),
+                         sam   = paste0("sample",1:3),
+                         exp   = c(32,34,45)),
+           x=c(1,3,5))
> l
$m
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

$df
   gene     sam exp
1 gene1 sample1  32
2 gene2 sample2  34
3 gene3 sample3  45

$x
[1] 1 3 5

> l[[2]]
   gene     sam exp
1 gene1 sample1  32
2 gene2 sample2  34
3 gene3 sample3  45
> l$df
   gene     sam exp
1 gene1 sample1  32
2 gene2 sample2  34
3 gene3 sample3  45

补充:元素的名字

 #(1)向量
> x=1:10
> names(x)=letters[1:10]
> x
 a  b  c  d  e  f  g  h  i  j 
 1  2  3  4  5  6  7  8  9 10 
> x["a"]
a 
1 
#(2)数据框
> df
    X1 X2
1    A  1
2    B NA
3    C  3
4 <NA>  4
5    E  5
> names(df)
[1] "X1" "X2"
> df[,"X1"]
[1] "A" "B" "C" NA  "E"
#(3)列表
> names(l)
[1] "m"  "df" "x" 
> l[["df"]]
   gene     sam exp
1 gene1 sample1  32
2 gene2 sample2  34
3 gene3 sample3  45
> #删除一个
> rm(l)
> #删除多个
> rm(df,m)
> #删除全部
> rm(list = ls()) 

相关文章

网友评论

      本文标题:Day1-R 02-data_structure

      本文链接:https://www.haomeiwen.com/subject/opwqzctx.html