1.导入pandas模块
>>> import pandas as pd
2.导入CSV表格数据
>>> titanic = pd.read_csv(r'C:\Users\Administrator\Desktop\titanic.csv')
pandas支持许多不同的文件格式或数据源(csv,excel,sql,json,parquet等),每种格式都有前缀read_*,将文件的数据读入pandas的DataFrame
3.查看导入数据,显示时DataFrame,默认情况下将显示前5行
>>> titanic
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q
4.查看DataFrame的前8行,不指定行数,默认情况下将显示前5行
>>> titanic.head(8)
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
5 6 0 3 ... 8.4583 NaN Q
6 7 0 1 ... 51.8625 E46 S
7 8 0 3 ... 21.0750 NaN S
[8 rows x 12 columns]
查看末尾多少行,titanic.tail(10) 将返回DataFrame的最后10行
5.查看每列数据类型属性
>>> titanic.dtypes
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object
数据类型DataFrame为整数(int64),浮点数(float63)和字符串(object)
6.将数据存储到Excel文件中
>>> titanic.to_excel(r'C:\Users\Administrator\Desktop\titanic.xlsx',sheet_name='passengers',index=False)
sheet_name若不指定名称,则使用默认的Sheet1。通过设置 index=False行索引标签不会保存在电子表格中
7.导入Excel表格数据
>>> titanic = pd.read_excel(r'C:\Users\Administrator\Desktop\titanic.xlsx')
若表格中有多个sheet,则需要使用参数sheet_name='xxxx'指定
8.查看DataFrame的详细信息
>>> titanic.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 13 columns):
Unnamed: 0 891 non-null int64
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(6), object(5)
memory usage: 90.6+ KB
网友评论