美文网首页
How to process csv file

How to process csv file

作者: 微雨旧时歌丶 | 来源:发表于2019-02-23 16:01 被阅读0次

    CSV is a common, simple file format used to store tabular data, such as a spreadsheet or database. We have a need to process CSV file, as to discover, extract, and analyze the information we are interested in. Pandas are competent.

    This wiki is intended as a guide to getting started with pandas, including examples for

    • Loading file into dataframe;
    • Picking sub dataframe of interest;
    • Displaying data;
    • Saving a dataframe to a CSV file;
    • Creating a new dataframe by hand.

    Let's start!

    1. Loading dataframe from an existing CSV file

    DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Loading the CSV(example airport data) file by using:

    import pandas as pd
    in_file = 'airports.csv'
    data = pd.read_csv(in_file)
    

    <table>
    <tr>
    <td ><center>[图片上传失败...(image-ec4b2f-1550908860557)] Load into dataframe</center></td>
    <td ><center>[图片上传失败...(image-209f06-1550908860557)] Open with excel</center></td>
    </tr>
    </table>

    For dataframe, just think of it like a spreadsheet.

    2. Picking sub dataframe of interest

    By typing data.shape, we know that the dataframe has 54872 rows, and 18 columns.

    Let's first see what types of airports are included:

    types = data['type'] # only the 'type' column is selected
    types = set(types)
    

    Above, data['type'] is actually a Series(a one-dimensional labeled array capable of holding any data type), by using set(), it is converted into set type. Result is

    types = {'balloonport',
     'closed',
     'heliport',
     'large_airport',
     'medium_airport',
     'seaplane_base',
     'small_airport''}
    

    Similarly, by typing countris = set(data['iso_country']), we know 'ZN'(China) is included.

    So, next, suppose we are just interested in the names and locations of large airports in China, how to do that?

    a. select several columns of interest:

    # a sub dataframe with only our interested columns
    sub_data = data.loc[:,['type','name','latitude_deg','longitude_deg','iso_country']]
    

    b. more precisely, select rows of interest (only 'large_airport' and 'CN'):

    cn_large_airport = sub_data[(sub_data['iso_country']=='CN') & (sub_data['type']=='large_airport')]
    

    3. Displaying data

    After that, we get a dataframe with only 33 rows and 5 columns. Let's display it:

    # pick 'lon/lat' columns and covert them into list.
    lons = cn_large_airport["longitude_deg"].tolist() 
    lats = cn_large_airport["latitude_deg"].tolist()
    # display
    import matplotlib.pyplot as plt
    plt.scatter(lons,lats, s=5)
    

    By using .tolist() function, we can convert Series type into list type (for this display example, it's also ok with no convertion).

    Also display all the airports in China, results are:
    [图片上传失败...(image-652589-1550908860557)]

    4. Saving a dataframe to a CSV file

    For saving a dataframe, use .to_csv() function, like:

    cn_large_airport.to_csv('cn_large_airport.csv')
    

    5. Creating a new dataframe by hand

    相关文章

      网友评论

          本文标题:How to process csv file

          本文链接:https://www.haomeiwen.com/subject/zbqurqtx.html