直接上参考代码
install.packages("bitops")
install.packages("RCurl")
library("bitops")
library("RCurl")
# 输入数据
url = "https://raw.githubusercontent.com/chrisestevez/MSDA-Bridge/master/mushroom.csv"
Rdata = getURL(url)
MyData = read.csv(text = Rdata,header = FALSE,sep=",")
MyFinalData = data.frame(MyData)
samp = head(MyFinalData, n = 10)
以上我们完整的将数据导入到了内存中,为了方便展示,我截取前十个row作为例子
> samp
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 p x s n t p f c n k e e s s
2 e x s y t a f c b k e c s s
3 e b s w t l f c b n e c s s
4 p x y w t p f c n n e e s s
5 e x s g f n f w b k t e s s
6 e x y y t a f c b n e c s s
7 e b s w t a f c b g e c s s
8 e b y w t l f c b n e c s s
9 p x y w t p f c n p e e s s
10 e b s y t a f c b g e c s s
V15 V16 V17 V18 V19 V20 V21 V22 V23
1 w w p w o p k s u
2 w w p w o p n n g
3 w w p w o p n n m
4 w w p w o p k s u
5 w w p w o e n a g
6 w w p w o p k n g
7 w w p w o p k n m
8 w w p w o p n s m
9 w w p w o p k v g
10 w w p w o p k s m
我们取其中的V1, V3,V5,V7 作为子集,并且替换每列的标签
samp = subset(samp, select = c(V1,V3,V5,V9))
colnames(samp) = c("MushroomType","CapSurface","Bruises","GillSize")
输出如下:
> samp
MushroomType CapSurface Bruises GillSize
1 p s t n
2 e s t b
3 e s t b
4 p y t n
5 e s f b
6 e y t b
7 e s t b
8 e y t b
9 p y t n
10 e s t b
下面我们要将其中每个cell的字母所代表的意思列出来,当然,如果用图形表示的话并不需要全部替换,但是有时候需要将表格出示。
# 替换数据
samp$MushroomType = c('p'="poisonous",'e'="edible")[ as.character(samp$MushroomType)]
samp$CapSurface = c('f'="fibrous",'g'="grooves",y='scaly','s'="smooth")[ as.character(samp$CapSurface)]
samp$Bruises = c('t'="bruises",'f'="no")[ as.character(samp$Bruises)]
samp$GillSize = c('b'="broad",'n'="narrow")[ as.character(samp$GillSize)]
最终输出结果如下:
> samp
MushroomType CapSurface Bruises GillSize
1 poisonous smooth bruises narrow
2 edible smooth bruises broad
3 edible smooth bruises broad
4 poisonous scaly bruises narrow
5 edible smooth no broad
6 edible scaly bruises broad
7 edible smooth bruises broad
8 edible scaly bruises broad
9 poisonous scaly bruises narrow
10 edible smooth bruises broad
网友评论