series是DataFrame的子集。读取的文件默认是DataFrame类型,而其每一列是一个Series类型。
rating_table = pandas.read_csv("fandango_score_comparison.csv")
print(type(rating_table))
column = rating_table["FILM"]
print(type(column))
运行结果:
1. 读取5行数据
rating_table = pandas.read_csv("fandango_score_comparison.csv")
column = rating_table["FILM"]
# 读取0-4号数据
column[0:5]
2. 读取某列所有值
rating_table = pandas.read_csv("fandango_score_comparison.csv")
film_names = rating_table["FILM"].values # values 是ndarray类型
print(type(film_names))
3. 为某列添加索引
from pandas import Series
rating_table = pandas.read_csv("fandango_score_comparison.csv")
film_names = rating_table["FILM"].values
# 获取所有的烂番茄评分
film_rt = rating_table["RottenTomatoes"].values
# 使用为film_names为film_rt添加索引
series_custom = Series(film_rt,index=film_names)
# 根据电影名获取评分
series_custom[["Minions (2015)","Leviathan (2014)"]]
4. 重建索引
origi_index = series_custom.index.tolist()
sorted_index =sorted(origi_index)
sorted_by_index = series_custom.reindex(sorted_index)
print(sorted_by_index)
5. Series可以与numpy结合
因为Series是对numpy的封装,所以可以结合使用
numpy.add(series_custom,series_custom)
numpy.sin(series_custom)
numpy.max(series_custom)
获取所有评分大于80的电影:
series_custom[series_custom>80]
获取评分大于80以及小于60的电影:
one = series_custom>80
two = series_custom<60
series_custom[one | two]
求多个评分网站的平均值:
from pandas import Series
rating_table = pandas.read_csv("fandango_score_comparison.csv")
rt_critics = Series(rating_table["RottenTomatoes"].values,index=rating_table["FILM"])
rt_user = Series(rating_table["RottenTomatoes_User"].values,index=rating_table["FILM"])
rt_mean = (rt_critics+rt_user) /2
print(rt_mean)
网友评论