重复一下谢益辉老师R语言忍者秘笈的基本操作
Chard Liu
2019年1月8日
#Sys.setlocale(‘LC_ALL’,‘C’) readlines读取文本文件,返回字符型向量
# R软件的许可证文件(GPL)
gpl = readLines(file.path(R.home(), "COPYING"))
head(gpl) # GPL前几行
## [1] "\t\t GNU GENERAL PUBLIC LICENSE"
## [2] "\t\t Version 2, June 1991"
## [3] ""
## [4] " Copyright (C) 1989, 1991 Free Software Foundation, Inc."
## [5] " 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA"
## [6] " Everyone is permitted to copy and distribute verbatim copies"
xie = readLines("https://yihui.name") # 我的主页
head(xie) # HTML代码
## [1] "<!DOCTYPE html>"
## [2] "<html lang=\"en-us\">"
## [3] " <head>"
## [4] "\t<meta name=\"generator\" content=\"Hugo 0.25.1\" />"
## [5] " <meta charset=\"utf-8\">"
## [6] " <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">"
nchar(gpl[1:10]) # GPL前10行分别有多少字符
## [1] 32 29 0 56 79 61 58 0 15 0
sum(nchar(gpl))
## [1] 17671
#strsplit返回的结果是列表,每个元素是向量
strsplit(gpl[4:5], " ") # 拆分第4、5两行
## [[1]]
## [1] "" "Copyright" "(C)" "1989," "1991"
## [6] "Free" "Software" "Foundation," "Inc."
##
## [[2]]
## [1] "" "" "" "" ""
## [6] "" "" "" "" ""
## [11] "" "" "" "" ""
## [16] "" "" "" "" ""
## [21] "" "" "" "51" "Franklin"
## [26] "St," "Fifth" "Floor," "Boston," "MA"
## [31] "" "02110-1301" "" "USA"
#用空格做分隔符并不严格,标点符号也是单词之间的分隔符
#于是需要用到正则表达式
#正在正则表达式中,单词之间的分隔符可以统一被表达为\\W(反斜杠引导大写字母W),这个特殊表##达式可以匹配任意非单词的字符,这样就能得到只剩下单词的
words = unlist(strsplit(gpl, "\\W"))#unlist去除列表形式,变成字符
words = words[words != ""] # 去掉空字符
# 频数最大的10个单词
tail(sort(table(tolower(words))), 10)#tolower小写转换,sort升序,tail保留末位
##
## this is a program and you or of to
## 49 53 57 71 72 76 77 104 108
## the
## 194
#拆的另一种方式根据位置拆解substr与substring
xie[8]
## [1] " <title>Yihui Xie | 璋㈢泭杈\x89</title>"
substr(xie[8], 12, 20)#拆出12-20的位置
## [1] "Yihui Xie"
#学会拼接
paste(1:3,"a")
## [1] "1 a" "2 a" "3 a"
paste(1:3,"a",sep = "-")#分别与a以-相连,向量的横向拼接
## [1] "1-a" "2-a" "3-a"
paste(letters[1:10], collapse = "~")#collapase将向量内部的每个元素一起连接
## [1] "a~b~c~d~e~f~g~h~i~j"
paste(1:3, "a", sep = "-", collapse = "+")#经典案例
## [1] "1-a+2-a+3-a"
#sep返回的仍是向量,而collapse把字符向量“坍缩”为一个字符串
来一个调皮的,模仿大佬们的操作
love = function() cat("I love you\n")#\n换行 function括号中表示的是输入参数,此处即空格
say = function(person) {
love()
love()
cat(paste("I love you dear", person, "\n"))
love()
}
say("somebody") # 对somebody唱一嗓子吧
## I love you
## I love you
## I love you dear somebody
## I love you
网友评论