Getting and cleaning data-quiz1

作者: 小狼小狼_e211 | 来源:发表于2021-03-15 22:00 被阅读0次

Getting and cleaning data-quiz1
Getting and Cleaning Data - Week
Getting and Cleaning Data - Week
Getting and Cleaning Data - Week
Getting and cleaning data——Week2
Getting and cleaning data——Week1
Getting and cleaning data——Week4
Getting and cleaning data——Week3
House cleaning
Cleaning

Question1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

and load the data into R. The code book, describing the variable names is here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

How many properties are worth $1,000,000 or more?

if (!file.exists("data")) {
        dir.create("data")
}
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(fileUrl, destfile = "./data/06hid.csv", method = "curl")
dateDownloaded <- date()

HD <- read.csv("./data/06hid.csv")
sum(!is.na(HD[HD$VAL >= 24, 37]))

Question2

Use the data you loaded from Question 1. Consider the variable FES in the code book. Which of the "tidy data" principles does this variable violate?

table(data$FES)
#Answer: Tidy data one variable per column

Question3

Download the Excel spreadsheet on Natural Gas Aquisition Program here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx

Read rows 18-23 and columns 7-15 into R and assign the result to a variable called:

dat

What is the value of:

sum(dat$Zip*dat$Ext,na.rm=T)

#先去Java官网下载合适的版本
install.packages("rJava")
install.packages("xlsx")
rowIndex = 18 : 23
colIndex = 7 : 15
dat <- read.xlsx("DATA.gov_NGAP.xlsx", sheetIndex = 1, rowIndex = rowIndex, 
                 colIndex = colIndex, header = TRUE)
sum(dat$Zip * dat$Ext, na.rm=T)

Question4

Read the XML data on Baltimore restaurants from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

How many restaurants have zipcode 21231?

library(XML)
fileUrl <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
doc <- xmlTreeParse(fileUrl, useInternal = TRUE)
rootNode <- xmlRoot(doc)
sum(xpathSApply(rootNode, "//zipcode", xmlValue) == "21231")

Question5

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv

using the fread() command load the data into an R object

DT

The following are ways to calculate the average value of the variable

pwgtp15

broken down by sex. Using the data.table package, which will deliver the fastest user time?

DT[,mean(pwgtp15),by=SEX]
#reference: https://xmuxiaomo.github.io/2015/07/10/Getting-and-Cleaning-Data-Quiz-1/

网友评论

本文标题：Getting and cleaning data-quiz1

本文链接：https://www.haomeiwen.com/subject/zoulcltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Getting and cleaning data-quiz1

Question1

Question2

Question3

Question4

Question5

相关文章

Getting and cleaning data-quiz1

Getting and Cleaning Data - Week

Getting and Cleaning Data - Week

Getting and Cleaning Data - Week

Getting and cleaning data——Week2

Getting and cleaning data——Week1

Getting and cleaning data——Week4

Getting and cleaning data——Week3

House cleaning

Cleaning

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读