第一门课程：Introduction to Data Scien

作者: 英天 | 来源:发表于2018-01-03 03:32 被阅读0次

第一周 Python Fundamentals

从字段中取出Christopher

x = 'Dr. Christopher Brooks'
print(x[4:15])

保留Dr.和last name. use function and map

people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    title = person.split()[0]
    lastname = person.split()[-1]
    return '{} {}'.format(title, lastname)

list(map(split_title_and_name, people))

list comparation

def times_tables():
    lst = []
    for i in range(10):
        for j in range (10):
            lst.append(i*j)
    return lst

times_tables() == [j*i for i in range(10) for j in range(10)]
#the last line has the same function as the first

第二周 Basic Data Processing with Pandas

The DataFrame Data Structure

形成一个表格

import pandas as pd
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
df.head()

修改表格中某一列的数值

purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])


df['Cost'] *= 0.8
print(df)

读取CSV文件

import pandas as pd
df = pd.read_csv('olympics.csv')
df.head()

筛选出价格大于3的值

purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])


df['Name'][df['Cost']>3]

Missing value

read from csv

import pandas as pd
df = pd.read_csv('log.csv')
df

set time column as index and sort according to it

df = df.set_index('time')
df = df.sort_index()
df

set two index :time and user

df = df.reset_index()
df = df.set_index(['time', 'user'])
df

fill missing value

df = df.fillna(method='ffill')
df.head()

第三周 advanced pandas

网友评论

本文标题：第一门课程：Introduction to Data Scien

本文链接：https://www.haomeiwen.com/subject/gomcnxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

第一门课程：Introduction to Data Scien

第一周 Python Fundamentals

第二周 Basic Data Processing with Pandas

The DataFrame Data Structure

Missing value

第三周 advanced pandas

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

我爱编程

Python语言与信息数据获取和机器学习