用R替换数据

作者: YX_Andrew | 来源:发表于2019-05-14 18:56 被阅读57次

    直接上参考代码

    install.packages("bitops")
    install.packages("RCurl")
    
    library("bitops")
    library("RCurl")
    
    # 输入数据
    url = "https://raw.githubusercontent.com/chrisestevez/MSDA-Bridge/master/mushroom.csv"
    
    Rdata = getURL(url)
    
    MyData = read.csv(text = Rdata,header = FALSE,sep=",")
    MyFinalData = data.frame(MyData)
    samp = head(MyFinalData, n = 10)
    
    以上我们完整的将数据导入到了内存中,为了方便展示,我截取前十个row作为例子
    
    > samp
       V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
    1   p  x  s  n  t  p  f  c  n   k   e   e   s   s
    2   e  x  s  y  t  a  f  c  b   k   e   c   s   s
    3   e  b  s  w  t  l  f  c  b   n   e   c   s   s
    4   p  x  y  w  t  p  f  c  n   n   e   e   s   s
    5   e  x  s  g  f  n  f  w  b   k   t   e   s   s
    6   e  x  y  y  t  a  f  c  b   n   e   c   s   s
    7   e  b  s  w  t  a  f  c  b   g   e   c   s   s
    8   e  b  y  w  t  l  f  c  b   n   e   c   s   s
    9   p  x  y  w  t  p  f  c  n   p   e   e   s   s
    10  e  b  s  y  t  a  f  c  b   g   e   c   s   s
       V15 V16 V17 V18 V19 V20 V21 V22 V23
    1    w   w   p   w   o   p   k   s   u
    2    w   w   p   w   o   p   n   n   g
    3    w   w   p   w   o   p   n   n   m
    4    w   w   p   w   o   p   k   s   u
    5    w   w   p   w   o   e   n   a   g
    6    w   w   p   w   o   p   k   n   g
    7    w   w   p   w   o   p   k   n   m
    8    w   w   p   w   o   p   n   s   m
    9    w   w   p   w   o   p   k   v   g
    10   w   w   p   w   o   p   k   s   m
    我们取其中的V1, V3,V5,V7 作为子集,并且替换每列的标签
    
    samp = subset(samp, select = c(V1,V3,V5,V9))
    colnames(samp) = c("MushroomType","CapSurface","Bruises","GillSize")
    
    输出如下:
    
    > samp
       MushroomType CapSurface Bruises GillSize
    1             p          s       t        n
    2             e          s       t        b
    3             e          s       t        b
    4             p          y       t        n
    5             e          s       f        b
    6             e          y       t        b
    7             e          s       t        b
    8             e          y       t        b
    9             p          y       t        n
    10            e          s       t        b
    下面我们要将其中每个cell的字母所代表的意思列出来,当然,如果用图形表示的话并不需要全部替换,但是有时候需要将表格出示。
    
    # 替换数据
    samp$MushroomType = c('p'="poisonous",'e'="edible")[ as.character(samp$MushroomType)]
    samp$CapSurface = c('f'="fibrous",'g'="grooves",y='scaly','s'="smooth")[ as.character(samp$CapSurface)]
    samp$Bruises = c('t'="bruises",'f'="no")[ as.character(samp$Bruises)]
    samp$GillSize = c('b'="broad",'n'="narrow")[ as.character(samp$GillSize)]
    
    最终输出结果如下:
    
    > samp
       MushroomType CapSurface Bruises GillSize
    1     poisonous     smooth bruises   narrow
    2        edible     smooth bruises    broad
    3        edible     smooth bruises    broad
    4     poisonous      scaly bruises   narrow
    5        edible     smooth      no    broad
    6        edible      scaly bruises    broad
    7        edible     smooth bruises    broad
    8        edible      scaly bruises    broad
    9     poisonous      scaly bruises   narrow
    10       edible     smooth bruises    broad
    

    相关文章

      网友评论

        本文标题:用R替换数据

        本文链接:https://www.haomeiwen.com/subject/bjbeaqtx.html