R 学习 DAY3

作者: Peng_001 | 来源:发表于2020-05-01 23:02 被阅读0次

学习小组Day4笔记——冬梅
R 学习 DAY3
RIA便签打卡-day3：20170621
花体
21天|禾叶组合《如何学习》
Sapcemacs21天学习视频-学习笔记-Day3
考霸训练营打卡作业－第二阶段day3－20190428
｛R语言学习日记｝Day3 环境搭建
day5 阿来
R语言-0基础学习4-实战1-常见操作

参考 datacamp intermediate R course

functions

使用函数可以按照顺序或名称调用。
如sd(x, na.rm = FALSE)
通过位置

sd(values, TRUE)

通过名称

sd(x = values, na.rm = TRUE)

好用的tips：args()
可以不用像help()一样阅读大量内容，获得function的表达式。
help()与?function_name 可以获取函数使用记载的详细文件。
如

> args(strsplit)
function (x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE) 
NULL

mean()
mean(x, trim = 0, na.rm = FALSE, ...)
trim 表示修剪之意。
取值从0～0.5，表示在向量首尾去除异常值的比例。
若trim=0.1，元素数量为10，则会在首尾各去除0.1*10 = 1，共2个元素，之后再进行平均。

写一个函数

范例

my_fun <- function(arg1, arg2) {
  body
}

如定义一个计算绝对值加和的函数

sum_abs <- function(a, b){
  abs(a) + abs(b)
}
# 调用函数
# sum_abs(-3, 2)
# 返回5

也可以定义不需要任何输入值的函数，直接调用
在function中不设定参数

hello <- function(){
  print("Hi there!")
  TRUE
}
hello()

function 中定义的变量为局部变量，因此只能在函数内调用，在外部调用会显示无目标值。

变量被函数调用后发生的变化只会发生在返回值上，而变量本身数值不变。即通过某个函数计算某变量，该变量本身数值并不会改变。

R packages

mean(), list(), sample()，这些function 都来自于某package中。
这些函数以及它们的package 都作为基础的包默认安装在了R中。（安装R 就会默认安装它们）

通过install.packages() 下载包
通过library() 加载安装的包
通过require() 加载安装的包，和library不同，该命令会返回一个布尔值，若为TRUE表示有下载的包，且完成了加载。
通过search() 查找运行的包

要点：一次只能运行一个包。

lapply 函数

通过lapply，输出变量及相关预期输出结果类型。

lapply(x, class)
lapply(x, triple)
lapply(x, Function)

function 对应具体的各种函数功能。
若希望对函数值有具体数字要求，还可以通过lapply(x, FUN, ...)
在...部分补充。
使用lapply()会将结果返回为一个list。
不过可以通过unlist(lapply())将结果转换为vector。

也可以自己创建函数，然后依靠lapply() 引用。

# 将向量内容分割，并使用tolower函数
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)

# 构建函数
select_first <- function(x) {
  x[1]
}
# 用lapply调用select_first
names <- lapply(split_low, select_first)

若每次都为了使用lapply都构建一个函数（当然指这个函数功能并不存在于R运行的库中），则显得过犹不及（overkill）。
可以直接通过anonymous functions方式使用lapply。

names <- lapply(split_low, function(x){ x[1] })

就跟一次性筷子一样，只是为了lapply 函数构建一个function。

为lapply 添加额外参数

# 定义变量
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)
# 定义lapply使用函数
select_el <- function(x, index) {
  x[index]
}
# 调用lapply
names <- lapply(split_low, select_el, index = 1)
years <- lapply(split_low, select_el, index = 2)

总结：通过lapply，可以将向量或者列表中的元素，全部运用于某个函数。

sapply 函数

sapply 返回值的类型取决于调用函数最终返回的值，并尽可能的转换为表格形式(array)。
并且与lapply相比，返回值会额外多一个名称，该名称来源于变量。即会将变量与经过函数运行后的返回值一一对应。

sapply 与lapply 差异

# Create a function that returns min and max of a vector: extremes
extremes <- function(x) {
  c(min = min(x), max = max(x))
}

# Apply extremes() over temp with sapply()
sapply(temp, extremes)

# Apply extremes() over temp with lapply()
lapply(temp, extremes)

> sapply(temp, extremes)
    [,1] [,2] [,3] [,4] [,5] [,6] [,7]
min   -1    5   -3   -2    2   -3    1
max    9   13    8    7    9    9    9

 lapply(temp, extremes)
[[1]]
min max 
 -1   9 
[[2]]
min max 
  5  13 
[[3]]
min max 
 -3   8 
[[4]]
min max 
 -2   7 
[[5]]
min max 
  2   9 
[[6]]
min max 
 -3   9 
[[7]]
min max 
  1   9

以上代码可见，sapply 返回值受到了函数约束，即最终返回了带有名称的向量，而lapply 则返回列表信息。

但如果原函数返回值为列表，则在这种情况下lapply和sapply是一样的函数。可以通过identical确认。

# temp is already prepared for you in the workspace

# Definition of below_zero()
below_zero <- function(x) {
  return(x[x < 0])
}

# Apply below_zero over temp using sapply(): freezing_s
freezing_s <- sapply(temp, below_zero)

# Apply below_zero over temp using lapply(): freezing_l
freezing_l <- lapply(temp, below_zero)

# Are freezing_s and freezing_l identical?
identical(freezing_l, freezing_s)

对于lapply和sapply来说，并不会简化NULL值。

Notice here that, quite surprisingly, sapply() does not simplify the list of NULL's. That's because the 'vector-version' of a list of NULL's would simply be a NULL, which is no longer a vector with the same length as the input.

vapply

和lapply 与sapply区别在输出结果上，前者输出默认为list形式，后者则尽可能将输出内容转换为表格形式。而vapply 需要手动设置输出的格式，如果不符合则会报错。
因此相对来说vapply 对于数据处理来说更加安全。

vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)

如果fun.value 不正确。

Error: values must be length 2,
 but FUN(X[[1]]) result is length 1

小复习

其他一些常用函数

seq()

seq(1, 10, 3)
# 1～10， 间隔3
# [1]  1  4  7 10

rep()

rep(c(2, 4, 6, 8), times = 2)
# 将向量复制两遍
# [1] 2 4 6 8 2 4 6 8

# 也可以用each
rep(c(2, 4, 6, 8), each = 2)
# [1] 2 2 4 4 6 6 8 8

is.*()
用来测定变量是否为某种格式，符合则返回TRUE

# li 为list 格式
is.list(li)
# [1] TRUE

as.*()
可以将向量转换为list。

li2 <- as.list(c(1, 3, 4))
is.list(li2)
# 返回TRUE

另外，之前也提到过的可以用unlist()来转换list 到其他格式

append()&rev()
rev() 可以将列表顺序颠倒。
append() 可以将新内容加到列表上。

正则表达式

grepl() & grep()
grepl 返回元素是否符合条件，为布尔值。
grep 返回符合条件元素的位置，为数字。

# The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use grepl() to match for "edu"
grepl(pattern = "edu", emails)
# [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE

# Use grep() to match for "edu", save result to hits
hits <- grep(pattern = "edu", emails)
# [1] 1 2 4 5

# Subset emails using hits
emails[hits]

使用符号加强搜索

# ^ 搜索开始位置
# $ 搜索末尾位置
# .* 指多次匹配
# \\.edu 指强调. 为匹配字符串内容，而非元字符（用于匹配运算）
#

例子

grepl(pattern = "@.*\\.edu$", emails)

sub()&gsub()
sub 替换符合条件字符串最先的字符。
gsub 替换符合条件字符串的全部字符。

# The emails vector has already been defined for you
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "global@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use sub() to convert the email domains to datacamp.edu
sub(pattern = "@.*\\.edu$", replacement = "@datacamp.edu", emails)

Times& dates

Sys.time()
Sys.Date()
as.Date("1999-06-01")
as.Date("1999-13-01", format = "%Y-%d-%m")
as.POSIXct("1999-06-01 11:11:11")
my_date <- as.Date("1999-06-01")
my_date + 1
# "1999-06-02"
my_date2 - my_date

my_time <-as.POSIXct("1999-06-01 11:11:11")
my_time + 1
# "1999-06-01 11:11:12"
my_time2 - my_time

unclass(my_date)
# 返回从1970-01-01 到my_date 天数

其他一些不错的时间&日期类型R包

日期符号的相应说明

%Y: 4-digit year (1982)
%y: 2-digit year (82)
%m: 2-digit month (01)
%d: 2-digit day of the month (13)
%A: weekday (Wednesday)
%a: abbreviated weekday (Wed)
%B: month (January)
%b: abbreviated month (Jan)

借助上述符号将string 转换为date 形式

# Definition of character strings representing dates
str1 <- "May 23, '96"
str2 <- "2012-03-15"
str3 <- "30/January/2006"

# Convert the strings to dates: date1, date2, date3
date1 <- as.Date(str1, format = "%b %d, '%y")
date2 <- as.Date(str2)
date3 <- as.Date(str3, format = "%d/%B/%Y")

通过format 函数将指定date 转换为需要内容

format(date1, "%A")
format(date2, "%d")
format(date3, "%b %Y")
# 输出
'''
> format(date1, "%A")
[1] "Thursday"
> format(date2, "%d")
[1] "15"
> format(date3, "%b %Y")
[1] "Jan 2006"
'''

时间符号的相应说明

%H: hours as a decimal number (00-23)
%I: hours as a decimal number (01-12)
%M: minutes as a decimal number
%S: seconds as a decimal number
%T: shorthand notation for the typical format %H:%M:%S
%p: AM/PM indicator

类似日期的相关时间格式的测验

# Definition of character strings representing times
str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
str2 <- "2012-3-12 14:23:08"

# Convert the strings to POSIXct objects: time1, time2
time1 <- as.POSIXct(str1, format = "%B %d, '%y hours:%H minutes:%M seconds:%S")
time2 <- as.POSIXct(str2, format = "%Y-%m-%d %H:%M:%S")

# Convert times to formatted strings
format(time1, "%M")
format(time2, "%I:%M %p")

这里 intermediate R 完全学完！

学习小组Day4笔记——冬梅
生信星球学习笔记-Day3 今天入门R安装R→基础操作下面这张图片是R studio界面的介绍，图片引用自微信公众...
R 学习 DAY3
参考 datacamp intermediate R course functions 使用函数可以按照顺序或名称...
RIA便签打卡-day3：20170621
片段： RIA便签打卡-day3 【R 原文片段】学习效果的瓶颈在于学，而不在于教。【I理解】从这个知识中收获了...
花体
DAY3 还是r写的丑T_T n丑 m丑心累哪有个视频啊
21天|禾叶组合《如何学习》
【Day3】今日学习《如何学习》第四章 day3 分散式学习现在是考试周前的最后三天，学生们都在刻苦复习，迎接考...
Sapcemacs21天学习视频-学习笔记-Day3
Sapcemacs21天学习视频-学习笔记-Day3 大纲 Split your configs into mul...
考霸训练营打卡作业－第二阶段day3－20190428
日期：20190428 打卡用时：60分钟打卡序号：第二阶段day3 学习内容：核聚课程day3《建立专业水准：...
｛R语言学习日记｝Day3 环境搭建
大家好，我是William李梓峰，欢迎阅读我的 R 语言学习日记。官网链接：https://www.tutori...
day5 阿来
继续学习R语言 R语言数据学习数据R语言学习.png 数据输入数据输出总结 R语言学习的第二天，熟悉了很多操...
R语言-0基础学习4-实战1-常见操作
R语言学习系列R语言-0基础学习1-数据结构R语言-0基础学习2-构建子集R语言-0基础学习3-循环排序信息处理函...