import pandas as pd
创建Series
传入列表
s = pd.Series([175, 65, 25])
print(s)
0 175
1 65
2 25
dtype: int64
指定索引index
s = pd.Series([175, 65, 25],
index=['height', 'weight', 'age'])
print(s)
height 175
weight 65
age 25
dtype: int64
当传入混合类型的列表时,类型为object
s = pd.Series([175, 65, 25, 'kabisor'],
index=['height', 'weight', 'age', 'name'])
print(s)
height 175
weight 65
age 25
name kabisor
dtype: object
创建DataFrame
DataFrame是Series组成的字典,键是列名,值是列的内容
df = pd.DataFrame({
'name':['kabisor','pikaqio'],
'height':[175, 45],
'weight':[65, 25],
'age':[25, 10]
})
print(df)
name height weight age
0 kabisor 175 65 25
1 pikaqio 45 25 10
指定列的顺序
df = pd.DataFrame(data={
'name':['kabisor','pikaqio'],
'height':[175, 45],
'weight':[65, 25],
'age':[25, 10]
},
columns=['name', 'age', 'height', 'weight'])
print(df)
name age height weight
0 kabisor 25 175 65
1 pikaqio 10 45 25
指定索引
df = pd.DataFrame(data={
'name':['kabisor','pikaqio'],
'height':[175, 45],
'weight':[65, 25],
'age':[25, 10]
},
columns=['name', 'age', 'height', 'weight'],
index=['baby1', 'baby2'])
print(df)
name age height weight
baby1 kabisor 25 175 65
baby2 pikaqio 10 45 25
更改Series和DataFrame
- 添加新列
- 更改原始的列
- 删除列
scientists = pd.read_csv('data/scientists.csv')
print(scientists)
Name Born Died Age Occupation
0 Rosaline Franklin 1920-07-25 1958-04-16 37 Chemist
1 William Gosset 1876-06-13 1937-10-16 61 Statistician
2 Florence Nightingale 1820-05-12 1910-08-13 90 Nurse
3 Marie Curie 1867-11-07 1934-07-04 66 Chemist
4 Rachel Carson 1907-05-27 1964-04-14 56 Biologist
5 John Snow 1813-03-15 1858-06-16 45 Physician
6 Alan Turing 1912-06-23 1954-06-07 41 Computer Scientist
7 Johann Gauss 1777-04-30 1855-02-23 77 Mathematician
增加列(直接在原数据上增加)
new_Born = scientists.Born + scientists.Died
new_Age = 100 - scientists.Age
scientists['new_Born'], scientists['new_Age'] = (new_Born, new_Age)
print(scientists)
Name Born Died Age Occupation \
0 Rosaline Franklin 1920-07-25 1958-04-16 17 Chemist
1 William Gosset 1876-06-13 1937-10-16 41 Statistician
2 Florence Nightingale 1820-05-12 1910-08-13 70 Nurse
3 Marie Curie 1867-11-07 1934-07-04 46 Chemist
4 Rachel Carson 1907-05-27 1964-04-14 36 Biologist
5 John Snow 1813-03-15 1858-06-16 25 Physician
6 Alan Turing 1912-06-23 1954-06-07 21 Computer Scientist
7 Johann Gauss 1777-04-30 1855-02-23 57 Mathematician
new_Age new_Born
0 83 1920-07-251958-04-16
1 59 1876-06-131937-10-16
2 30 1820-05-121910-08-13
3 54 1867-11-071934-07-04
4 64 1907-05-271964-04-14
5 75 1813-03-151858-06-16
6 79 1912-06-231954-06-07
7 43 1777-04-301855-02-23
删除列drop(默认不修改原始数据)
# 删除列
print(scientists.drop('new_Born', axis=1)) # axis=1表示删除列,不加会报错
print(scientists.drop(7, axis=0)) # axis=0表示删除行,默认为0
print(scientists) # 没有修改原数据
scientists.drop('new_Born', axis=1, inplace=True)
print(scientists) # inplace=True 可以对原数据进行修改
Name Born Died Age Occupation \
0 Rosaline Franklin 1920-07-25 1958-04-16 -3 Chemist
1 William Gosset 1876-06-13 1937-10-16 21 Statistician
2 Florence Nightingale 1820-05-12 1910-08-13 50 Nurse
3 Marie Curie 1867-11-07 1934-07-04 26 Chemist
4 Rachel Carson 1907-05-27 1964-04-14 16 Biologist
5 John Snow 1813-03-15 1858-06-16 5 Physician
6 Alan Turing 1912-06-23 1954-06-07 1 Computer Scientist
7 Johann Gauss 1777-04-30 1855-02-23 37 Mathematician
new_Age
0 83
1 59
2 30
3 54
4 64
5 75
6 79
7 43
Name Born Died Age Occupation \
0 Rosaline Franklin 1920-07-25 1958-04-16 -3 Chemist
1 William Gosset 1876-06-13 1937-10-16 21 Statistician
2 Florence Nightingale 1820-05-12 1910-08-13 50 Nurse
3 Marie Curie 1867-11-07 1934-07-04 26 Chemist
4 Rachel Carson 1907-05-27 1964-04-14 16 Biologist
5 John Snow 1813-03-15 1858-06-16 5 Physician
6 Alan Turing 1912-06-23 1954-06-07 1 Computer Scientist
new_Age new_Born
0 83 1920-07-251958-04-16
1 59 1876-06-131937-10-16
2 30 1820-05-121910-08-13
3 54 1867-11-071934-07-04
4 64 1907-05-271964-04-14
5 75 1813-03-151858-06-16
6 79 1912-06-231954-06-07
Name Born Died Age Occupation \
0 Rosaline Franklin 1920-07-25 1958-04-16 -3 Chemist
1 William Gosset 1876-06-13 1937-10-16 21 Statistician
2 Florence Nightingale 1820-05-12 1910-08-13 50 Nurse
3 Marie Curie 1867-11-07 1934-07-04 26 Chemist
4 Rachel Carson 1907-05-27 1964-04-14 16 Biologist
5 John Snow 1813-03-15 1858-06-16 5 Physician
6 Alan Turing 1912-06-23 1954-06-07 1 Computer Scientist
7 Johann Gauss 1777-04-30 1855-02-23 37 Mathematician
new_Age new_Born
0 83 1920-07-251958-04-16
1 59 1876-06-131937-10-16
2 30 1820-05-121910-08-13
3 54 1867-11-071934-07-04
4 64 1907-05-271964-04-14
5 75 1813-03-151858-06-16
6 79 1912-06-231954-06-07
7 43 1777-04-301855-02-23
Name Born Died Age Occupation \
0 Rosaline Franklin 1920-07-25 1958-04-16 -3 Chemist
1 William Gosset 1876-06-13 1937-10-16 21 Statistician
2 Florence Nightingale 1820-05-12 1910-08-13 50 Nurse
3 Marie Curie 1867-11-07 1934-07-04 26 Chemist
4 Rachel Carson 1907-05-27 1964-04-14 16 Biologist
5 John Snow 1813-03-15 1858-06-16 5 Physician
6 Alan Turing 1912-06-23 1954-06-07 1 Computer Scientist
7 Johann Gauss 1777-04-30 1855-02-23 37 Mathematician
new_Age
0 83
1 59
2 30
3 54
4 64
5 75
6 79
7 43
Series的一些方法(加括号的)
- max/min/mean/median/mode/quantile (最大值,最小值,平均值,中位数,众数,指定位置的分位数)
- describe (计算统计量,自动丢弃缺失值)
- drop_duplicates (去掉重复项,返回Series)
- sample (返回n个随机采样值)
- replace (指定值替换Series中的值)
- sort_values (对值排序)
- unique (返回唯一值组成的numpy.ndarray)
- equals (判断两个Series是否有相同元素)
- isin (逐个检查Series中每个元素是否存在指定列表中)
- cov/corr (计算与另一个Series的协方差/相关系数,自动丢弃缺失值)
- append (连接多个Series)
网友评论