数据框
数据框是一种表格式的数据结构,通常是由数据构成的一个矩形数组,行表示观测,列表示变量。
数据框实际是一个列表,列表中的元素是向量,向量构成数据框的列,所以数据框是矩形结构(但不是矩阵,矩阵必须为同一数据类型),数据框列必须同一类型,而列可以不同。并且数据框的列必须命名。
- 1.创建数据框(data.frame())
>state <- data.frame(state.name,state.abb,state.region,state.x77)
> state
state.name state.abb state.region Population Income Illiteracy
Alabama Alabama AL South 3615 3624 2.1
Alaska Alaska AK West 365 6315 1.5
Arizona Arizona AZ West 2212 4530 1.8
Arkansas Arkansas AR South 2110 3378 1.9
California California CA West 21198 5114 1.1
Colorado Colorado CO West 2541 4884 0.7
Connecticut Connecticut CT Northeast 3100 5348 1.1
Delaware Delaware DE South 579 4809 0.9
Florida Florida FL South 8277 4815 1.3
Georgia Georgia GA South 4931 4091 2.0
Hawaii Hawaii HI West 868 4963 1.9
Idaho Idaho ID West 813 4119 0.6
Illinois Illinois IL North Central 11197 5107 0.9
Indiana Indiana IN North Central 5313 4458 0.7
Iowa Iowa IA North Central 2861 4628 0.5
Kansas Kansas KS North Central 2280 4669 0.6
Kentucky Kentucky KY South 3387 3712 1.6
Louisiana Louisiana LA South 3806 3545 2.8
Maine Maine ME Northeast 1058 3694 0.7
Maryland Maryland MD South 4122 5299 0.9
Massachusetts Massachusetts MA Northeast 5814 4755 1.1
Michigan Michigan MI North Central 9111 4751 0.9
Minnesota Minnesota MN North Central 3921 4675 0.6
Mississippi Mississippi MS South 2341 3098 2.4
Missouri Missouri MO North Central 4767 4254 0.8
Montana Montana MT West 746 4347 0.6
Nebraska Nebraska NE North Central 1544 4508 0.6
Nevada Nevada NV West 590 5149 0.5
New Hampshire New Hampshire NH Northeast 812 4281 0.7
New Jersey New Jersey NJ Northeast 7333 5237 1.1
New Mexico New Mexico NM West 1144 3601 2.2
New York New York NY Northeast 18076 4903 1.4
North Carolina North Carolina NC South 5441 3875 1.8
North Dakota North Dakota ND North Central 637 5087 0.8
Ohio Ohio OH North Central 10735 4561 0.8
Oklahoma Oklahoma OK South 2715 3983 1.1
Oregon Oregon OR West 2284 4660 0.6
Pennsylvania Pennsylvania PA Northeast 11860 4449 1.0
Rhode Island Rhode Island RI Northeast 931 4558 1.3
South Carolina South Carolina SC South 2816 3635 2.3
South Dakota South Dakota SD North Central 681 4167 0.5
Tennessee Tennessee TN South 4173 3821 1.7
Texas Texas TX South 12237 4188 2.2
Utah Utah UT West 1203 4022 0.6
Vermont Vermont VT Northeast 472 3907 0.6
Virginia Virginia VA South 4981 4701 1.4
Washington Washington WA West 3559 4864 0.6
West Virginia West Virginia WV South 1799 3617 1.4
Wisconsin Wisconsin WI North Central 4589 4468 0.7
Wyoming Wyoming WY West 376 4566 0.6
Life.Exp Murder HS.Grad Frost Area
Alabama 69.05 15.1 41.3 20 50708
Alaska 69.31 11.3 66.7 152 566432
Arizona 70.55 7.8 58.1 15 113417
Arkansas 70.66 10.1 39.9 65 51945
California 71.71 10.3 62.6 20 156361
Colorado 72.06 6.8 63.9 166 103766
Connecticut 72.48 3.1 56.0 139 4862
Delaware 70.06 6.2 54.6 103 1982
Florida 70.66 10.7 52.6 11 54090
Georgia 68.54 13.9 40.6 60 58073
Hawaii 73.60 6.2 61.9 0 6425
Idaho 71.87 5.3 59.5 126 82677
Illinois 70.14 10.3 52.6 127 55748
Indiana 70.88 7.1 52.9 122 36097
Iowa 72.56 2.3 59.0 140 55941
Kansas 72.58 4.5 59.9 114 81787
Kentucky 70.10 10.6 38.5 95 39650
Louisiana 68.76 13.2 42.2 12 44930
Maine 70.39 2.7 54.7 161 30920
Maryland 70.22 8.5 52.3 101 9891
Massachusetts 71.83 3.3 58.5 103 7826
Michigan 70.63 11.1 52.8 125 56817
Minnesota 72.96 2.3 57.6 160 79289
Mississippi 68.09 12.5 41.0 50 47296
Missouri 70.69 9.3 48.8 108 68995
Montana 70.56 5.0 59.2 155 145587
Nebraska 72.60 2.9 59.3 139 76483
Nevada 69.03 11.5 65.2 188 109889
New Hampshire 71.23 3.3 57.6 174 9027
New Jersey 70.93 5.2 52.5 115 7521
New Mexico 70.32 9.7 55.2 120 121412
New York 70.55 10.9 52.7 82 47831
North Carolina 69.21 11.1 38.5 80 48798
North Dakota 72.78 1.4 50.3 186 69273
Ohio 70.82 7.4 53.2 124 40975
Oklahoma 71.42 6.4 51.6 82 68782
Oregon 72.13 4.2 60.0 44 96184
Pennsylvania 70.43 6.1 50.2 126 44966
Rhode Island 71.90 2.4 46.4 127 1049
South Carolina 67.96 11.6 37.8 65 30225
South Dakota 72.08 1.7 53.3 172 75955
Tennessee 70.11 11.0 41.8 70 41328
Texas 70.90 12.2 47.4 35 262134
Utah 72.90 4.5 67.3 137 82096
Vermont 71.64 5.5 57.1 168 9267
Virginia 70.08 9.5 47.8 85 39780
Washington 71.72 4.3 63.5 32 66570
West Virginia 69.48 6.7 41.6 100 24070
Wisconsin 72.48 3.0 54.5 149 54464
Wyoming 70.29 6.9 62.9 173 97203
如果想将数据存入R中进行分析,则可以将每个内容存为向量,然后利用data.frame进行合并即可。
- 2.访问数据框
- 通过索引访问
> state[1] ##state数据框第1列
> state[c(2,4)] ##访问第2和第4列
> state[-1] ##负索引则是删除该列
2.利用行和列的名字访问
> state[,"state.abb"] ##访问列名
[1] AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT
[27] NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA WA WV WI WY
50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME ... WY
> state["Washington",] ##访问行名
state.name state.abb state.region Population Income Illiteracy Life.Exp
Washington Washington WA West 3559 4864 0.6 71.72
Murder HS.Grad Frost Area
Washington 4.3 63.5 32 66570
3.采用$的方式访问
> plot(women$height,women$weight)
进行线性回归(lm()函数)---直接给出列名
lm(weight ~height, data=women)
Call:
lm(formula = weight ~ height, data = women)
Coefficients:
(Intercept) height
-87.52 3.45
- R还提供attach和with函数的方法
- attach是加载数据框到R搜索目录中。
使用完之后用detach()函数取消加载
> attach(mtcars)
> names(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"
> mpg ##就可以不用mtcars$mpg的方式访问数据框了。
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
> detach(mtcars)
> mpg
错误:找不到对象'mpg'
2.with(数据框,{列名})
> with(mtcars,{mpg})
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
> with(mtcars,sum({mpg}))
[1] 642.9
- 双中括号访问(返回向量而不是列表)
> mtcars[['mpg']]
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
![](https://img.haomeiwen.com/i14744215/1b6acb6e0be126a4.png)
网友评论