pandas采用了很多Numpy的代码风格。
Numpy适合处理同质型的数值类数组数据,更适合进行数学那类的运算。
Pandas用来处理表格型或异质型数据,数据清洗和分析的主要工具。
以上两条纯属猜测
pandas的两个常用的数据结构:Series和DataFrame
import pandas as pd
from pandas import Series , DataFrame
一、Series
data:image/s3,"s3://crabby-images/25baa/25baab1ca163e826bb32e9c00ea8adab344ac04b" alt=""
index 索引 values 值
简单的运算
data:image/s3,"s3://crabby-images/31649/31649079a4d0f97e9f3658c09deea62f20088e65" alt=""
data:image/s3,"s3://crabby-images/5bc58/5bc582d266f53c35fcbb11f5100e1461f0413472" alt=""
'a' in obj ----> True 'a'为索引
我们可以直接将字典传递给Series
dic={'a':1,'b':2,'c':3,'d':4}
obj1=pd.Series(dic)
data:image/s3,"s3://crabby-images/e5cc1/e5cc16e661520d9096f2b70200af751249e87748" alt=""
data:image/s3,"s3://crabby-images/26089/26089b3d81f4436b04f79b1341ac6bc5a4d8b441" alt=""
上面的例子中,三个值被放在正确的位置上,但是因为index ‘e’并没有出现在obj1的键中,所以它对应的值为NaN(not a number)。
以下是两个Series的拼接:
data:image/s3,"s3://crabby-images/eb604/eb604f5c8a5175ffef27218aa59a2591b420b708" alt=""
介绍两个函数 isnull notnull
pd.isnull(obj) pd.notnull(obj)
data:image/s3,"s3://crabby-images/dd85c/dd85cb581b0d71e4ef21b0fe1647e0d7cf5295d0" alt=""
Series对象自身和其索引都有name属性,这个特性与pandas其他重要功能集成在一起:obj.name='test' obj.index.name='abcd'
data:image/s3,"s3://crabby-images/77015/770154714b0d377fd08febe4e7ff16f5efda6af6" alt=""
Series的索引可以通过按位置赋值的方式进行改变: obj.index=['a','s','d','f']
data:image/s3,"s3://crabby-images/216eb/216ebb92cf24db7cce6f677f8ac9f150d459d02f" alt=""
二、DataFrame
DataFrame表示的是矩阵的数据表,它包含已排序的列集合,每一列可以是不同的值类型(数值、字符串、布尔值等)。DataFrame既有行索也有列索
字典转化DataFrame
data:image/s3,"s3://crabby-images/666d4/666d4dc28067db0554c2dcac5a5084b7dc8f3bc7" alt=""
head()方法
data:image/s3,"s3://crabby-images/71c14/71c147c0708465d8a5157079ebe833d639faf37a" alt=""
用字典创建DataFrame并对列进行选择,pd.DataFrame(dic,columns=['xxx','xx'])
data:image/s3,"s3://crabby-images/3041f/3041f5ea4d891533030b7aee66fcd6efceffc0df" alt=""
用字典创建DataFrame并对index索引进行选择,pd.DataFrame(dic,index=['xxx','xx'])
data:image/s3,"s3://crabby-images/f0470/f04708724f3154d35f11ec2b5774b8c6facf22a9" alt=""
这里需要注意一点,columns里的参数可以多于或者少于字典里的个数,但是index参数的个数必须和原字典里的键值个数相同。
列的索引
frame['age'] <==> frame.age 其实并不完全等价,区别将在后一个黑体注意中说明!!!!! 尽量使用frame['age']这种方式,没毛病。
data:image/s3,"s3://crabby-images/8b3e4/8b3e4cc944747183f2974028b5b6e85312c01cb0" alt=""
行的索引 loc iloc
data:image/s3,"s3://crabby-images/42ea0/42ea0750e987c00413995cb5178ef1f4fc5c3e8b" alt=""
DataFrame的转置:
data:image/s3,"s3://crabby-images/a663c/a663ca7baf08da9af8a2cca5407ca90d43ddf32d" alt=""
赋值:
data:image/s3,"s3://crabby-images/1ecce/1ecce028a2e78f56a1bdfa18752e3a411dedbb29" alt=""
给DataFrame的列赋值Series
data:image/s3,"s3://crabby-images/accf1/accf1645a8a4d6d33fce224e1a43e7adbc54b332" alt=""
frame2['result']=frame2.name=='b'
data:image/s3,"s3://crabby-images/0bf57/0bf57e7d6655a66e9525dfc442b2f8e718e5e795" alt=""
注意:这里的frame2['result'] 并不能用frame2.result代替,因为result列并不存在
del 删除列
data:image/s3,"s3://crabby-images/82e7a/82e7a8f53b4caf2ea75bc97175e26105dbf99bc5" alt=""
将包含字典的嵌套字典赋值给DataFrame
data:image/s3,"s3://crabby-images/d731f/d731f23d0f25e6b4ead6ab4e5f28b26bb7aaa20e" alt=""
查看DataFrame所有的列、行
frame2.columns frames.index
data:image/s3,"s3://crabby-images/74d6e/74d6e5476008ca885c1eb9b6e6488e200f6dc7ae" alt=""
frame2.values
data:image/s3,"s3://crabby-images/fcb05/fcb051bb5c72e2ae9b225a90815d4c2ad52ac407" alt=""
网友评论