美文网首页Python
Python进阶学习笔记

Python进阶学习笔记

作者: Jason数据分析生信教室 | 来源:发表于2021-06-22 23:19 被阅读0次

    Intermediate Python -- DataCamp

    进阶数据操作(Dictionary&DataFrame)

    List

    数据单独列出来就是List

    pop = [30.55, 2.77, 39.21]
    countries = ["afghanistan", "albania", "algeria"] 
    ind_alb = countries.index("albania")
    ind_alb
    1
    pop[ind_alb]
    2.77
    

    list使用起来不是很方便。

    Dictionary

    两组list的数据对应起来的话就变成Dictionary了

    world = {"afghanistan":30.55, "albania":2.77, "algeria":39.21}
    world["albania"]
    2.77
    
    • 新增或者删除数值
    world["sealand"] = 0.000028 
    world
    {'afghanistan': 30.55, 'albania': 2.81, 'algeria': 39.21, 'sealand': 2.8e-05}
    
    del(world["sealand"]) world
    {'afghanistan': 30.55, 'albania': 2.81, 'algeria': 39.21}
    
    • 多层字典
      说白了就是字典里面套字典
    # Dictionary of dictionaries
    europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
               'france': { 'capital':'paris', 'population':66.03 },
               'germany': { 'capital':'berlin', 'population':80.62 },
               'norway': { 'capital':'oslo', 'population':5.084 } }
    
    
    # Print out the capital of France
    europe["france"]["capital"]
    
    # Create sub-dictionary data
    data={"capital":"rome","population":59.83}
    
    # Add data to europe under key 'italy'
    europe["italy"]=data
    
    # Print europe
    print(europe)
    {'france': {'population': 66.03, 'capital': 'paris'}, 'italy': {'population': 59.83, 'capital': 'rome'}, 'germany': {'population': 80.62, 'capital': 'berlin'}, 'norway': {'population': 5.084, 'capital': 'oslo'}, 'spain': {'population': 46.77, 'capital': 'madrid'}}
    

    DataFrame

    数据分析里用的最多的其实还是DataFrame(数据框),操作数据框的话会用到pandas工具包 。

    • 把字典转换成数据框
      关键语法
      import xx as x
      pd.DataFrame()
    # Pre-defined lists
    names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
    dr =  [True, False, False, False, True, True, True]
    cpc = [809, 731, 588, 18, 200, 70, 45]
    
    # Import pandas as pd
    import pandas as pd
    
    # Create dictionary my_dict with three key:value pairs: my_dict
    my_dict={"country":names,"drives_right":dr,"cars_per_cap":cpc}
    
    # Build a DataFrame cars from my_dict: cars
    cars=pd.DataFrame(my_dict)
    
    # Print cars
    print(cars)
       cars_per_cap        country  drives_right
    0           809  United States          True
    1           731      Australia         False
    2           588          Japan         False
    3            18          India         False
    4           200         Russia          True
    5            70        Morocco          True
    6            45          Egypt          True
    
    • 设置行名
      cars.index=xxxxx
    import pandas as pd
    
    # Build cars DataFrame
    names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
    dr =  [True, False, False, False, True, True, True]
    cpc = [809, 731, 588, 18, 200, 70, 45]
    cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
    cars = pd.DataFrame(cars_dict,)
    print(cars)
    
    # Definition of row_labels
    row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']
    
    # Specify row labels of cars
    cars.index=row_labels
    
    
    # Print cars again
    print(cars)
       cars_per_cap        country  drives_right
    0           809  United States          True
    1           731      Australia         False
    2           588          Japan         False
    3            18          India         False
    4           200         Russia          True
    5            70        Morocco          True
    6            45          Egypt          True
         cars_per_cap        country  drives_right
    US            809  United States          True
    AUS           731      Australia         False
    JPN           588          Japan         False
    IN             18          India         False
    RU            200         Russia          True
    MOR            70        Morocco          True
    EG             45          Egypt          True
    
    • 读取csv文件
    cars= pd.read_csv("cars.csv",index_col = 0)
    

    Pandas的简单操作

    之后会花篇幅详细学习Pandas

    • 选择列
      [][[ ]]的区别,带列名和不带列名
    print(cars["country"])
    US     United States
    AUS        Australia
    JPN            Japan
    IN             India
    RU            Russia
    MOR          Morocco
    EG             Egypt
    Name: country, dtype: object
    
    print(cars[["country"]])
               country
    US   United States
    AUS      Australia
    JPN          Japan
    IN           India
    RU          Russia
    MOR        Morocco
    EG           Egypt
    print(cars[["country","drives_right"]])
               country  drives_right
    US   United States          True
    AUS      Australia         False
    JPN          Japan         False
    IN           India         False
    RU          Russia          True
    MOR        Morocco          True
    EG           Egypt          True
    
    • 选择行
      可以选取指定行,这个和R很相似
    # Import cars data
    import pandas as pd
    cars = pd.read_csv('cars.csv', index_col = 0)
    
    # Print out first 3 observations
    print(cars[0:3])
    
    # Print out fourth, fifth and sixth observation
    print(cars[3:6])
         cars_per_cap        country  drives_right
    US            809  United States          True
    AUS           731      Australia         False
    JPN           588          Japan         False
         cars_per_cap  country  drives_right
    IN             18    India         False
    RU            200   Russia          True
    MOR            70  Morocco          True
    print(cars[0:3])
         cars_per_cap        country  drives_right
    US            809  United States          True
    AUS           731      Australia         False
    JPN           588          Japan         False
    
    <script.py> output:
             cars_per_cap        country  drives_right
        US            809  United States          True
        AUS           731      Australia         False
        JPN           588          Japan         False
             cars_per_cap  country  drives_right
        IN             18    India         False
        RU            200   Russia          True
        MOR            70  Morocco          True
    

    loc and iloc

    # Import cars data
    import pandas as pd
    cars = pd.read_csv('cars.csv', index_col = 0)
    
    # Print out drives_right value of Morocco
    print(cars.loc[["MOR","drives_right"]])
    
    # Print sub-DataFrame
    print(cars.loc[["RU","MOR"],["country","drives_right"]])
    
    <script.py> output:
                      cars_per_cap  country drives_right
        MOR                   70.0  Morocco         True
        drives_right           NaN      NaN          NaN
             country  drives_right
        RU    Russia          True
        MOR  Morocco          True
    

    and, or, not

    看下面这个简单的例子就好,直接用and or

    # Define variables
    my_kitchen = 18.0
    your_kitchen = 14.0
    
    # my_kitchen bigger than 10 and smaller than 18?
    print(my_kitchen > 10 and my_kitchen <10)
    
    # my_kitchen smaller than 14 or bigger than 17?
    print(my_kitchen<14 or my_kitchen>17)
    
    # Double my_kitchen smaller than triple your_kitchen?
    print(2*my_kitchen < 3*your_kitchen)
    
    <script.py> output:
        False
        True
        True
    

    not也是直接用not

    In [1]:
    x = 8
    y = 9
    not(not(x < 3) and not(y > 14 or y > 10))
    Out[1]:
    False
    

    用numpy来进行类似的操作

    np.logical_and
    np.logical_or
    np.logical_not

    # Create arrays
    import numpy as np
    my_house = np.array([18.0, 20.0, 10.75, 9.50])
    your_house = np.array([14.0, 24.0, 14.25, 9.0])
    
    # my_house greater than 18.5 or smaller than 10
    print(np.logical_or(my_house >18.5,my_house<10))
    
    # Both my_house and your_house smaller than 11
    print(np.logical_and(my_house<11,your_house<11))
    

    得到结果

    <script.py> output:
        [False  True False  True]
        [False False False  True]
    

    if, elif, else

    条件后面用的:分隔

    area = 10.0
    if(area < 9) :
        print("small")
    elif(area < 12) :
        print("medium")
    else :
        print("large")
    

    比较一下,R语言里用的是{}

    if (test_expression) {
    statement1
    } else {
    statement2
    }
    

    Python的简洁易懂的优势就体现出来了。

    再举个例子, Python可以没有规矩到条件不需要加()

    # Define variables
    room = "kit"
    area = 14.0
    
    # if-else construct for room
    if room == "kit" :
        print("looking around in the kitchen.")
    else :
        print("looking around elsewhere.")
    
    # if-else construct for area
    if area > 15 :
        print("big place!")
    else :
        print("pretty small.")
    
    # if statement for room
    if room == "kit" :
        print("looking around in the kitchen.")
    
    # if statement for area
    if area>15 :
        print("big place!")
    

    练习题

    Remember about np.logical_and(), np.logical_or() and np.logical_not(), the Numpy variants of the and, or and notoperators? You can also use them on Pandas Series to do more advanced filtering operations.

    Take this example that selects the observations that have a cars_per_cap between 10 and 80. Try out these lines of code step by step to see what's happening.

    cpc = cars['cars_per_cap']
    between = np.logical_and(cpc > 10, cpc < 80)
    medium = cars[between]
    
    # Import cars data
    import pandas as pd
    cars = pd.read_csv('cars.csv', index_col = 0)
    
    # Import numpy, you'll need this
    import numpy as np
    
    # Create medium: observations with cars_per_cap between 100 and 500
    cpc = cars['cars_per_cap']
    between = np.logical_and(cpc > 100, cpc < 500)
    medium = cars[between]
    
    # Print medium
    print(medium)
    

    output

    <script.py> output:
            cars_per_cap country  drives_right
        RU           200  Russia          True
    

    while

    语法结构,用:隔开

    while condition :
        expression
    
    x = 1
    while x < 4 :
        print(x)
        x = x + 1
    
    # Initialize offset
    offset = 8
    
    # Code the while loop
    while offset != 0:
        print("correcting...")
        offset = offset-1
        print(offset)
    

    output

    <script.py> output:
        correcting...
        7
        correcting...
        6
        correcting...
        5
        correcting...
        4
        correcting...
        3
        correcting...
        2
        correcting...
        1
        correcting...
        0
    
    • whileif的混搭
    # Initialize offset
    offset = -6
    
    # Code the while loop
    while offset != 0 :
        print("correcting...")
        if offset > 0 :
          offset = offset -1
        else : 
          offset = offset + 1   
        print(offset)
    

    for循环

    List里的for循环

    # areas list
    areas = [11.25, 18.0, 20.0, 10.75, 9.50]
    
    # Code the for loop
    for i in areas :
        print(i)
    
    • enumerate, 可以同时输出变量的序列和数值
    l = ['Alice', 'Bob', 'Charlie']
    
    for name in l:
        print(name)
    # Alice
    # Bob
    # Charlie
    
    for i, name in enumerate(l):
        print(i, name)
    # 0 Alice
    # 1 Bob
    # 2 Charlie
    

    默认是从0开始,可以指定从1或者从任意数字开始。

    for i, name in enumerate(l, 1):
        print(i, name)
    # 1 Alice
    # 2 Bob
    # 3 Charlie
    

    自定义输出格式

    # areas list
    areas = [11.25, 18.0, 20.0, 10.75, 9.50]
    
    # Code the for loop
    for index, area in enumerate(areas) :
        print("room " + str(index+1) + ": " + str(area))
    

    output

    room 1: 11.25
    room 2: 18.0
    room 3: 20.0
    room 4: 10.75
    room 5: 9.5
    

    双层list的for循环也是毫不费力

    # house list of lists
    house = [["hallway", 11.25], 
             ["kitchen", 18.0], 
             ["living room", 20.0], 
             ["bedroom", 10.75], 
             ["bathroom", 9.50]]
             
    # Build a for loop from scratch
    for x, y in house:
        print("the " + str(x) + " is " + str(y) + " sqm" )
    
    the hallway is 11.25 sqm
    the kitchen is 18.0 sqm
    the living room is 20.0 sqm
    the bedroom is 10.75 sqm
    the bathroom is 9.5 sqm
    

    Dictionary里的for循环

    和list不一样的是需要用到items()这个指令

    # Definition of dictionary
    europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
              'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
              
    # Iterate over europe
    for k, v in europe.items() :
        print("the capital of " + k + " is " + v)
    

    Numpy array里的for循环

    numpy array可以是1D也可以是2D,1D的话和list一样,2D的话要用到np.nditer,把两列按照一列来显示。

    # Import numpy as np
    import numpy as np  
    
    # For loop over np_height
    for x in np_height : 
        print(str(x) + " inches")
    
    # For loop over np_baseball
    for val in np.nditer(np_baseball) :
        print(val)
    

    Dataframe里的for循环

    iterrows()按照行来分析。

    # Import cars data
    import pandas as pd
    cars = pd.read_csv('cars.csv', index_col = 0)
    cars.head()
    
    # Iterate over rows of cars
    for lab, row in cars.iterrows() :
        print(lab)
        print(row)
    

    output

    cars.head()
    
         cars_per_cap        country  drives_right
    US            809  United States          True
    AUS           731      Australia         False
    JPN           588          Japan         False
    IN             18          India         False
    RU            200         Russia          True
    
    <script.py> output:
        US
        cars_per_cap              809
        country         United States
        drives_right             True
        Name: US, dtype: object
        AUS
        cars_per_cap          731
        country         Australia
        drives_right        False
        Name: AUS, dtype: object
        JPN
        cars_per_cap      588
        country         Japan
        drives_right    False
        Name: JPN, dtype: object
        IN
        cars_per_cap       18
        country         India
        drives_right    False
        Name: IN, dtype: object
        RU
        cars_per_cap       200
        country         Russia
        drives_right      True
        Name: RU, dtype: object
        MOR
        cars_per_cap         70
        country         Morocco
        drives_right       True
        Name: MOR, dtype: object
        EG
        cars_per_cap       45
        country         Egypt
        drives_right     True
        Name: EG, dtype: object
    

    两种方法添加列

    # Import cars data
    import pandas as pd
    cars = pd.read_csv('cars.csv', index_col = 0)
    cars.head()
    
    # Code for loop that adds COUNTRY column
    for lab, row in cars.iterrows():
        cars.loc[lab,"COUNTRY"] = str.upper(row["country"])
    
    
    # Print cars
    print(cars)
    

    2.apply

    # Import cars data
    import pandas as pd
    cars = pd.read_csv('cars.csv', index_col = 0)
    cars["country"]
    
    # Use .apply(str.upper)
    for lab, row in cars.iterrows() :
        cars["COUNTRY"] = cars["country"].apply(str.upper)
    
    print(cars)
    

    相关文章

      网友评论

        本文标题:Python进阶学习笔记

        本文链接:https://www.haomeiwen.com/subject/iolvyltx.html