1.导入pandas模块
>>> import pandas as pd
2.导入数据
>>> titanic = pd.read_csv(r'C:\Users\Administrator\Desktop\titanic.csv')
3.选择单列
>>> ages = titanic["Age"]
>>> ages.head()
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
Name: Age, dtype: float64
>>> type(ages)
<class 'pandas.core.series.Series'>
head()方法,不指定行数,则默认显示5行,单列的类型是Series
4.选择多列
>>> age_sex = titanic[["Name", "Age", "Sex"]]
>>> age_sex.head()
Name Age Sex
0 Braund, Mr. Owen Harris 22.0 male
1 Cumings, Mrs. John Bradley (Florence Briggs Th... 38.0 female
2 Heikkinen, Miss. Laina 26.0 female
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0 female
4 Allen, Mr. William Henry 35.0 male
>>> type(age_sex)
<class 'pandas.core.frame.DataFrame'>
>>> age_sex.shape
(891, 3)
双列或以上的类型依旧是DataFrame,shape方法使用在DataFrame上则返回(行,列),若使用在Series上则返回行数
5.过滤特定的单行,单条件
>>> age_35 = titanic[titanic["Age"] > 35]
>>> age_35.head()
PassengerId Survived Pclass ... Fare Cabin Embarked
1 2 1 1 ... 71.2833 C85 C
6 7 0 1 ... 51.8625 E46 S
11 12 1 1 ... 26.5500 C103 S
13 14 0 3 ... 31.2750 NaN S
15 16 1 2 ... 16.0000 NaN S
>>> age_35.shape #过滤后的行数
(217, 12)
>>> titanic.shape #过滤前的行数
(891, 12)
条件表达式还支持( =>,>,==, !=,<,<=)等等...
6.过滤特定单行,多条件
>>> class_23 = titanic[titanic["Pclass"].isin([2, 3])]
>>> class_23.head()
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
2 3 1 3 ... 7.9250 NaN S
4 5 0 3 ... 8.0500 NaN S
5 6 0 3 ... 8.4583 NaN Q
7 8 0 3 ... 21.0750 NaN S
[5 rows x 12 columns]
>>> class_23.shape
(675, 12)
>>> class_23 = titanic[(titanic["Pclass"] == 2) | (titanic["Pclass"] == 3)]
>>> class_23.shape
(675, 12)
isin()函数等价于多条件判断的or或and,但你需要使用"|"和"&"替代or和and,且不同的条件需要单独用"()"
7.过滤非空值
>>> age_no_na = titanic[titanic["Age"].notna()]
>>> age_no_na.shape
(714, 12)
>>> titanic.shape
(891, 12)
8.过滤特定行和列
>>> adult_names = titanic.loc[titanic["Age"] > 35, ["Name", "Age"]]
>>> adult_names.head()
Name Age
1 Cumings, Mrs. John Bradley (Florence Briggs Th... 38.0
6 McCarthy, Mr. Timothy J 54.0
11 Bonnell, Miss. Elizabeth 58.0
13 Andersson, Mr. Anders Johan 39.0
15 Hewlett, Mrs. (Mary D Kingcome) 55.0
>>>
>>> titanic.iloc[9:25, 2:5] #选取10至25行和第3至5列
Pclass Name Sex
9 2 Nasser, Mrs. Nicholas (Adele Achem) female
10 3 Sandstrom, Miss. Marguerite Rut female
11 1 Bonnell, Miss. Elizabeth female
12 3 Saundercock, Mr. William Henry male
13 3 Andersson, Mr. Anders Johan male
14 3 Vestrom, Miss. Hulda Amanda Adolfina female
15 2 Hewlett, Mrs. (Mary D Kingcome) female
16 3 Rice, Master. Eugene male
17 2 Williams, Mr. Charles Eugene male
18 3 Vander Planke, Mrs. Julius (Emelia Maria Vande... female
19 3 Masselmani, Mrs. Fatima female
20 2 Fynney, Mr. Joseph J male
21 2 Beesley, Mr. Lawrence male
22 3 McGowan, Miss. Anna "Annie" female
23 1 Sloper, Mr. William Thompson male
24 3 Palsson, Miss. Torborg Danira female
>>> titanic.iloc[0:3, 3] = "anonymous" #将名称分配给anonymous第三列的前三个元素
>>> titanic.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 anonymous male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 anonymous female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 anonymous female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
关于loc/iloc[行,列],其中loc:通过选取行(列)标签索引数据
iloc:通过选取行(列)位置编号索引数据从0开始计数。
网友评论