题目来源github
1.Assign it to a variable called users and use the 'user_id' as index
从网站读取数据,并把user_id作为index
import pandas as pd
import numpy as np
import io
import requests
link= 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user'
s=requests.get(link).content
users = pd.read_csv(io.StringIO(s.decode('utf-8')),sep='|', index_col='user_id')
pandas.read_csv needs a file-like object as the first argument.
所以需要通过io.StringIO函数进行转换。
2.查看前25条记录 和最后10条记录
users.head(25)
users.tail(10)
3.What is the number of observations in the dataset?
users.shape
4.What is the number of columns in the dataset?
users.shape[1]
5.What is the data type of each column?
users.dtypes #正确答案
6. Print only the occupation column
users['occupation']
7.How many different occupations there are in this dataset?
users['occupation'].nunique()
8.What is the most frequent occupation?
users['occupation'].value_counts()
9.Summarize all the columns
users.describe(include = "all")
10.Summarize only the occupation column
users.occupation.describe()
11. What is the age with least occurrence?
users['age'].value_counts().sort_values()
users.age.value_counts().tail() #答案的写法
具体参考答案在github
reference:
http://landcareweb.com/questions/6234/lai-zi-urlde-pandas-read-csv
网友评论