美文网首页
Getting and cleaning data-quiz1

Getting and cleaning data-quiz1

作者: 小狼小狼_e211 | 来源:发表于2021-03-15 22:00 被阅读0次

    Question1

    The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

    https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

    and load the data into R. The code book, describing the variable names is here:

    https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

    How many properties are worth $1,000,000 or more?

    if (!file.exists("data")) {
            dir.create("data")
    }
    fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
    download.file(fileUrl, destfile = "./data/06hid.csv", method = "curl")
    dateDownloaded <- date()
    
    HD <- read.csv("./data/06hid.csv")
    sum(!is.na(HD[HD$VAL >= 24, 37]))
    

    Question2

    Use the data you loaded from Question 1. Consider the variable FES in the code book. Which of the "tidy data" principles does this variable violate?

    table(data$FES)
    #Answer: Tidy data one variable per column
    

    Question3

    Download the Excel spreadsheet on Natural Gas Aquisition Program here:

    https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx

    Read rows 18-23 and columns 7-15 into R and assign the result to a variable called:

    dat
    

    What is the value of:

    sum(dat$Zip*dat$Ext,na.rm=T)
    
    #先去Java官网下载合适的版本
    install.packages("rJava")
    install.packages("xlsx")
    rowIndex = 18 : 23
    colIndex = 7 : 15
    dat <- read.xlsx("DATA.gov_NGAP.xlsx", sheetIndex = 1, rowIndex = rowIndex, 
                     colIndex = colIndex, header = TRUE)
    sum(dat$Zip * dat$Ext, na.rm=T)
    

    Question4

    Read the XML data on Baltimore restaurants from here:

    https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

    How many restaurants have zipcode 21231?

    library(XML)
    fileUrl <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
    doc <- xmlTreeParse(fileUrl, useInternal = TRUE)
    rootNode <- xmlRoot(doc)
    sum(xpathSApply(rootNode, "//zipcode", xmlValue) == "21231")
    

    Question5

    The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

    https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv

    using the fread() command load the data into an R object

    DT
    

    The following are ways to calculate the average value of the variable

    pwgtp15
    

    broken down by sex. Using the data.table package, which will deliver the fastest user time?

    DT[,mean(pwgtp15),by=SEX]
    #reference: https://xmuxiaomo.github.io/2015/07/10/Getting-and-Cleaning-Data-Quiz-1/
    

    相关文章

      网友评论

          本文标题:Getting and cleaning data-quiz1

          本文链接:https://www.haomeiwen.com/subject/zoulcltx.html