How to process csv file

作者: 微雨旧时歌丶 | 来源:发表于2019-02-23 16:01 被阅读0次

How to process csv file
GCC
python 常用I/O
读取csv文件中文乱码问题，解决方法
python（读写csv数据）
PHP快速导出csv
csv2json
failed to crunch file
python使用csv读写CSV文件
Tensorflow 读取Txt和Csv格式数据

CSV is a common, simple file format used to store tabular data, such as a spreadsheet or database. We have a need to process CSV file, as to discover, extract, and analyze the information we are interested in. Pandas are competent.

This wiki is intended as a guide to getting started with pandas, including examples for

Loading file into dataframe;
Picking sub dataframe of interest;
Displaying data;
Saving a dataframe to a CSV file;
Creating a new dataframe by hand.

Let's start!

1. Loading dataframe from an existing CSV file

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Loading the CSV(example airport data) file by using:

import pandas as pd
in_file = 'airports.csv'
data = pd.read_csv(in_file)

<table>
<tr>
<td ><center>[图片上传失败...(image-ec4b2f-1550908860557)] Load into dataframe</center></td>
<td ><center>[图片上传失败...(image-209f06-1550908860557)] Open with excel</center></td>
</tr>
</table>

For dataframe, just think of it like a spreadsheet.

2. Picking sub dataframe of interest

By typing data.shape, we know that the dataframe has 54872 rows, and 18 columns.

Let's first see what types of airports are included:

types = data['type'] # only the 'type' column is selected
types = set(types)

Above, data['type'] is actually a Series(a one-dimensional labeled array capable of holding any data type), by using set(), it is converted into set type. Result is

types = {'balloonport',
 'closed',
 'heliport',
 'large_airport',
 'medium_airport',
 'seaplane_base',
 'small_airport''}

Similarly, by typing countris = set(data['iso_country']), we know 'ZN'(China) is included.

So, next, suppose we are just interested in the names and locations of large airports in China, how to do that?

a. select several columns of interest:

# a sub dataframe with only our interested columns
sub_data = data.loc[:,['type','name','latitude_deg','longitude_deg','iso_country']]

b. more precisely, select rows of interest (only 'large_airport' and 'CN'):

cn_large_airport = sub_data[(sub_data['iso_country']=='CN') & (sub_data['type']=='large_airport')]

3. Displaying data

After that, we get a dataframe with only 33 rows and 5 columns. Let's display it:

# pick 'lon/lat' columns and covert them into list.
lons = cn_large_airport["longitude_deg"].tolist() 
lats = cn_large_airport["latitude_deg"].tolist()
# display
import matplotlib.pyplot as plt
plt.scatter(lons,lats, s=5)

By using .tolist() function, we can convert Series type into list type (for this display example, it's also ok with no convertion).

Also display all the airports in China, results are:
[图片上传失败...(image-652589-1550908860557)]

4. Saving a dataframe to a CSV file

For saving a dataframe, use .to_csv() function, like:

cn_large_airport.to_csv('cn_large_airport.csv')

5. Creating a new dataframe by hand

网友评论

本文标题：How to process csv file

本文链接：https://www.haomeiwen.com/subject/zbqurqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！