Pandas

作者: Henry_Liu | 来源:发表于2018-04-18 10:23 被阅读0次

    Pandas 介绍

    pandas是python的一个数据分析库,主要提供两种主要的资料结构,Series与DataFrame。Series是用来处理时间 顺序相关资料,DataFrame则是用来处理结构化的资料(二维的数据资料)

    安装Pandas

    pip install pandas
    

    Pandas读取不同格式的资料

    读取CSV档案
    import pandas as pd
    df = pd.read_csv('file.csv')
    print(df)
    
    读取HTML档案
    import pandas as pd
    df = pd.read_html('https://www.jianshu.com/u/e635858eda0b')
    print(df)
    

    Pandas提供的资料结构

    · Series:处理时间序列的相关资料,主要是创建一维list。

    ·DataFrame:处理结构化的资料,有索引和标签的二维资料集。

    ·Panel:处理三维数据。

    1.series

    数据类型是array
     import pandas as pd
     list = ['python', 'ruby', 'c', 'c++']
     select = pd.Series(list)
     print (select)
    输出:
    0    python
    1      ruby
    2         c
    3       c++
    dtype: object
    
    数据类型是Dictionary
    import pandas as pd
    dict = {'key1': '1', 'key2': '2', 'key3': '3'}
    select = pd.Series(dict, index = dict.keys())
    输出:
    print select
    key3    3
    key2    2
    key1    1
    dtype: object
    print (select[0])
    3
    print select[2]
    1
    print select['key3']
    3
    print select[[2]]
    key1    1
    dtype: object
    print (select[[0,2,1]])
    key3    3
    key1    1
    key2    2
    dtype: object
    
    数据类型是单一数据
    import pandas as pd
    string = 'henry'
     select = pd.Series (string, index = range(3))
     print (select)
    输出:
    0    henry
    1    henry
    2    henry
    
    切片选择
     print (select[1:])
    1    henry
    2    henry
    

    2.DataFrame

    2.1建立DataFrame
    可以用DDictionary或Array来创建,也可以用外部资料读取后创建。
    
    Dictionary
    import pandas as pd
    groups = ['Movies', 'Sports', 'Conding', 'Fishing', 'Dancing']
    num = [12, 5, 18, 99, 88]
    dict = {'groups': groups, 'num': num}
    df = pd.DataFrame(dict)
    print (df)
    输出:
        groups  num
    0   Movies   12
    1   Sports    5
    2  Conding   18
    3  Fishing   99
    4  Dancing   88
    
    Array
     array = [['Movies',12], ['Sports', 5], ['Conding', 18], ['Fishing', 99], ['Dancing', 88]]
    df = pd.DataFrame(arr, colums = ['name', 'num'])
    df = pd.DataFrame(array, columns = ['name', 'num'])
    print df
    输出:
          name  num
    0   Movies   12
    1   Sports    5
    2  Conding   18
    3  Fishing   99
    4  Dancing   88
    
    2.2DataFrame的操作
    DataFrame的方法
    .shape 返回行数和列数
    .describe() 返回描述性统计
    .head()
    .tail()
    .columns
    .index
    .info()
    import pandas as pd
    groups = ['Movies', 'Sports', 'Conding', 'Fishing', 'Dancing']
    num = [12, 5, 18, 99, 88]
    dict = {'groups': groups, 'num': num}
    df = pd.DataFrame(dict)
    print df.shape
    (5, 2)
    print df.describe()
                 num
    count   5.000000
    mean   44.400000
    std    45.224993
    min     5.000000
    25%    12.000000
    50%    18.000000
    75%    88.000000
    max    99.000000
    print df.head()
        groups  num
    0   Movies   12
    1   Sports    5
    2  Conding   18
    3  Fishing   99
    4  Dancing   88
    print df.columns
    Index([u'groups', u'num'], dtype='object')
    print df.index
    RangeIndex(start=0, stop=5, step=1)
    print df.info
    <bound method DataFrame.info of     groups  num
    0   Movies   12
    1   Sports    5
    2  Conding   18
    3  Fishing   99
    4  Dancing   88>
    print df.tail(3)
        groups  num
    2  Conding   18
    3  Fishing   99
    4  Dancing   88
    

    相关文章

      网友评论

          本文标题:Pandas

          本文链接:https://www.haomeiwen.com/subject/xcxvkftx.html