美文网首页
Panda - 3. Series 和 DataFrame

Panda - 3. Series 和 DataFrame

作者: 陈天睡懒觉 | 来源:发表于2022-05-21 14:30 被阅读0次
import pandas as pd

创建Series

传入列表

s = pd.Series([175, 65, 25])
print(s)
0    175
1     65
2     25
dtype: int64

指定索引index

s = pd.Series([175, 65, 25],
             index=['height', 'weight', 'age'])
print(s)
height    175
weight     65
age        25
dtype: int64

当传入混合类型的列表时,类型为object

s = pd.Series([175, 65, 25, 'kabisor'],
             index=['height', 'weight', 'age', 'name'])
print(s)
height        175
weight         65
age            25
name      kabisor
dtype: object

创建DataFrame

DataFrame是Series组成的字典,键是列名,值是列的内容

df = pd.DataFrame({
    'name':['kabisor','pikaqio'],
    'height':[175, 45],
    'weight':[65, 25],
    'age':[25, 10]
})
print(df)
      name  height  weight  age
0  kabisor     175      65   25
1  pikaqio      45      25   10

指定列的顺序

df = pd.DataFrame(data={
    'name':['kabisor','pikaqio'],
    'height':[175, 45],
    'weight':[65, 25],
    'age':[25, 10]
},
columns=['name', 'age', 'height', 'weight'])
print(df)
      name  age  height  weight
0  kabisor   25     175      65
1  pikaqio   10      45      25

指定索引

df = pd.DataFrame(data={
    'name':['kabisor','pikaqio'],
    'height':[175, 45],
    'weight':[65, 25],
    'age':[25, 10]
},
columns=['name', 'age', 'height', 'weight'],
index=['baby1', 'baby2'])
print(df)
          name  age  height  weight
baby1  kabisor   25     175      65
baby2  pikaqio   10      45      25

更改Series和DataFrame

  • 添加新列
  • 更改原始的列
  • 删除列
scientists = pd.read_csv('data/scientists.csv')
print(scientists)
                   Name        Born        Died  Age          Occupation
0     Rosaline Franklin  1920-07-25  1958-04-16   37             Chemist
1        William Gosset  1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie  1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson  1907-05-27  1964-04-14   56           Biologist
5             John Snow  1813-03-15  1858-06-16   45           Physician
6           Alan Turing  1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss  1777-04-30  1855-02-23   77       Mathematician

增加列(直接在原数据上增加)

new_Born = scientists.Born + scientists.Died
new_Age = 100 - scientists.Age
scientists['new_Born'], scientists['new_Age'] = (new_Born, new_Age)
print(scientists)
                   Name        Born        Died  Age          Occupation  \
0     Rosaline Franklin  1920-07-25  1958-04-16   17             Chemist   
1        William Gosset  1876-06-13  1937-10-16   41        Statistician   
2  Florence Nightingale  1820-05-12  1910-08-13   70               Nurse   
3           Marie Curie  1867-11-07  1934-07-04   46             Chemist   
4         Rachel Carson  1907-05-27  1964-04-14   36           Biologist   
5             John Snow  1813-03-15  1858-06-16   25           Physician   
6           Alan Turing  1912-06-23  1954-06-07   21  Computer Scientist   
7          Johann Gauss  1777-04-30  1855-02-23   57       Mathematician   

   new_Age              new_Born  
0       83  1920-07-251958-04-16  
1       59  1876-06-131937-10-16  
2       30  1820-05-121910-08-13  
3       54  1867-11-071934-07-04  
4       64  1907-05-271964-04-14  
5       75  1813-03-151858-06-16  
6       79  1912-06-231954-06-07  
7       43  1777-04-301855-02-23  

删除列drop(默认不修改原始数据)

# 删除列
print(scientists.drop('new_Born', axis=1)) # axis=1表示删除列,不加会报错
print(scientists.drop(7, axis=0)) # axis=0表示删除行,默认为0
print(scientists) # 没有修改原数据
scientists.drop('new_Born', axis=1, inplace=True)
print(scientists) # inplace=True 可以对原数据进行修改
                   Name        Born        Died  Age          Occupation  \
0     Rosaline Franklin  1920-07-25  1958-04-16   -3             Chemist   
1        William Gosset  1876-06-13  1937-10-16   21        Statistician   
2  Florence Nightingale  1820-05-12  1910-08-13   50               Nurse   
3           Marie Curie  1867-11-07  1934-07-04   26             Chemist   
4         Rachel Carson  1907-05-27  1964-04-14   16           Biologist   
5             John Snow  1813-03-15  1858-06-16    5           Physician   
6           Alan Turing  1912-06-23  1954-06-07    1  Computer Scientist   
7          Johann Gauss  1777-04-30  1855-02-23   37       Mathematician   

   new_Age  
0       83  
1       59  
2       30  
3       54  
4       64  
5       75  
6       79  
7       43  
                   Name        Born        Died  Age          Occupation  \
0     Rosaline Franklin  1920-07-25  1958-04-16   -3             Chemist   
1        William Gosset  1876-06-13  1937-10-16   21        Statistician   
2  Florence Nightingale  1820-05-12  1910-08-13   50               Nurse   
3           Marie Curie  1867-11-07  1934-07-04   26             Chemist   
4         Rachel Carson  1907-05-27  1964-04-14   16           Biologist   
5             John Snow  1813-03-15  1858-06-16    5           Physician   
6           Alan Turing  1912-06-23  1954-06-07    1  Computer Scientist   

   new_Age              new_Born  
0       83  1920-07-251958-04-16  
1       59  1876-06-131937-10-16  
2       30  1820-05-121910-08-13  
3       54  1867-11-071934-07-04  
4       64  1907-05-271964-04-14  
5       75  1813-03-151858-06-16  
6       79  1912-06-231954-06-07  
                   Name        Born        Died  Age          Occupation  \
0     Rosaline Franklin  1920-07-25  1958-04-16   -3             Chemist   
1        William Gosset  1876-06-13  1937-10-16   21        Statistician   
2  Florence Nightingale  1820-05-12  1910-08-13   50               Nurse   
3           Marie Curie  1867-11-07  1934-07-04   26             Chemist   
4         Rachel Carson  1907-05-27  1964-04-14   16           Biologist   
5             John Snow  1813-03-15  1858-06-16    5           Physician   
6           Alan Turing  1912-06-23  1954-06-07    1  Computer Scientist   
7          Johann Gauss  1777-04-30  1855-02-23   37       Mathematician   

   new_Age              new_Born  
0       83  1920-07-251958-04-16  
1       59  1876-06-131937-10-16  
2       30  1820-05-121910-08-13  
3       54  1867-11-071934-07-04  
4       64  1907-05-271964-04-14  
5       75  1813-03-151858-06-16  
6       79  1912-06-231954-06-07  
7       43  1777-04-301855-02-23  
                   Name        Born        Died  Age          Occupation  \
0     Rosaline Franklin  1920-07-25  1958-04-16   -3             Chemist   
1        William Gosset  1876-06-13  1937-10-16   21        Statistician   
2  Florence Nightingale  1820-05-12  1910-08-13   50               Nurse   
3           Marie Curie  1867-11-07  1934-07-04   26             Chemist   
4         Rachel Carson  1907-05-27  1964-04-14   16           Biologist   
5             John Snow  1813-03-15  1858-06-16    5           Physician   
6           Alan Turing  1912-06-23  1954-06-07    1  Computer Scientist   
7          Johann Gauss  1777-04-30  1855-02-23   37       Mathematician   

   new_Age  
0       83  
1       59  
2       30  
3       54  
4       64  
5       75  
6       79  
7       43  

Series的一些方法(加括号的)

  1. max/min/mean/median/mode/quantile (最大值,最小值,平均值,中位数,众数,指定位置的分位数)
  2. describe (计算统计量,自动丢弃缺失值)
  3. drop_duplicates (去掉重复项,返回Series)
  4. sample (返回n个随机采样值)
  5. replace (指定值替换Series中的值)
  6. sort_values (对值排序)
  7. unique (返回唯一值组成的numpy.ndarray)
  8. equals (判断两个Series是否有相同元素)
  9. isin (逐个检查Series中每个元素是否存在指定列表中)
  10. cov/corr (计算与另一个Series的协方差/相关系数,自动丢弃缺失值)
  11. append (连接多个Series)

相关文章

网友评论

      本文标题:Panda - 3. Series 和 DataFrame

      本文链接:https://www.haomeiwen.com/subject/zvycprtx.html