第一门课程:Introduction to Data Scien

作者: 英天 | 来源:发表于2018-01-03 03:32 被阅读0次

    第一周 Python Fundamentals

    • 从字段中取出Christopher
    x = 'Dr. Christopher Brooks'
    print(x[4:15])
    
    • 保留Dr.和last name. use function and map
    people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']
    
    def split_title_and_name(person):
        title = person.split()[0]
        lastname = person.split()[-1]
        return '{} {}'.format(title, lastname)
    
    list(map(split_title_and_name, people))
    
    • list comparation
    def times_tables():
        lst = []
        for i in range(10):
            for j in range (10):
                lst.append(i*j)
        return lst
    
    times_tables() == [j*i for i in range(10) for j in range(10)]
    #the last line has the same function as the first
    

    第二周 Basic Data Processing with Pandas

    The DataFrame Data Structure

    • 形成一个表格
    import pandas as pd
    purchase_1 = pd.Series({'Name': 'Chris',
                            'Item Purchased': 'Dog Food',
                            'Cost': 22.50})
    purchase_2 = pd.Series({'Name': 'Kevyn',
                            'Item Purchased': 'Kitty Litter',
                            'Cost': 2.50})
    purchase_3 = pd.Series({'Name': 'Vinod',
                            'Item Purchased': 'Bird Seed',
                            'Cost': 5.00})
    df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
    df.head()
    
    
    • 修改表格中某一列的数值
    purchase_1 = pd.Series({'Name': 'Chris',
                            'Item Purchased': 'Dog Food',
                            'Cost': 22.50})
    purchase_2 = pd.Series({'Name': 'Kevyn',
                            'Item Purchased': 'Kitty Litter',
                            'Cost': 2.50})
    purchase_3 = pd.Series({'Name': 'Vinod',
                            'Item Purchased': 'Bird Seed',
                            'Cost': 5.00})
    
    df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
    
    
    df['Cost'] *= 0.8
    print(df)
    
    • 读取CSV文件
    import pandas as pd
    df = pd.read_csv('olympics.csv')
    df.head()
    
    • 筛选出价格大于3的值
    purchase_1 = pd.Series({'Name': 'Chris',
                            'Item Purchased': 'Dog Food',
                            'Cost': 22.50})
    purchase_2 = pd.Series({'Name': 'Kevyn',
                            'Item Purchased': 'Kitty Litter',
                            'Cost': 2.50})
    purchase_3 = pd.Series({'Name': 'Vinod',
                            'Item Purchased': 'Bird Seed',
                            'Cost': 5.00})
    
    df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
    
    
    df['Name'][df['Cost']>3]
    

    Missing value

    • read from csv
    import pandas as pd
    df = pd.read_csv('log.csv')
    df
    
    • set time column as index and sort according to it
    df = df.set_index('time')
    df = df.sort_index()
    df
    
    • set two index :time and user
    df = df.reset_index()
    df = df.set_index(['time', 'user'])
    df
    
    • fill missing value
    df = df.fillna(method='ffill')
    df.head()
    

    第三周 advanced pandas

    相关文章

      网友评论

        本文标题:第一门课程:Introduction to Data Scien

        本文链接:https://www.haomeiwen.com/subject/gomcnxtx.html