CSV is a common, simple file format used to store tabular data, such as a spreadsheet or database. We have a need to process CSV file, as to discover, extract, and analyze the information we are interested in. Pandas are competent.
This wiki is intended as a guide to getting started with pandas, including examples for
- Loading file into dataframe;
- Picking sub dataframe of interest;
- Displaying data;
- Saving a dataframe to a CSV file;
- Creating a new dataframe by hand.
Let's start!
1. Loading dataframe from an existing CSV file
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Loading the CSV(example airport data) file by using:
import pandas as pd
in_file = 'airports.csv'
data = pd.read_csv(in_file)
<table>
<tr>
<td ><center>[图片上传失败...(image-ec4b2f-1550908860557)] Load into dataframe</center></td>
<td ><center>[图片上传失败...(image-209f06-1550908860557)] Open with excel</center></td>
</tr>
</table>
For dataframe, just think of it like a spreadsheet.
2. Picking sub dataframe of interest
By typing data.shape
, we know that the dataframe has 54872 rows, and 18 columns.
Let's first see what types of airports are included:
types = data['type'] # only the 'type' column is selected
types = set(types)
Above, data['type']
is actually a Series(a one-dimensional labeled array capable of holding any data type), by using set()
, it is converted into set type. Result is
types = {'balloonport',
'closed',
'heliport',
'large_airport',
'medium_airport',
'seaplane_base',
'small_airport''}
Similarly, by typing countris = set(data['iso_country'])
, we know 'ZN'(China) is included.
So, next, suppose we are just interested in the names and locations of large airports in China, how to do that?
a. select several columns of interest:
# a sub dataframe with only our interested columns
sub_data = data.loc[:,['type','name','latitude_deg','longitude_deg','iso_country']]
b. more precisely, select rows of interest (only 'large_airport' and 'CN'):
cn_large_airport = sub_data[(sub_data['iso_country']=='CN') & (sub_data['type']=='large_airport')]
3. Displaying data
After that, we get a dataframe with only 33 rows and 5 columns. Let's display it:
# pick 'lon/lat' columns and covert them into list.
lons = cn_large_airport["longitude_deg"].tolist()
lats = cn_large_airport["latitude_deg"].tolist()
# display
import matplotlib.pyplot as plt
plt.scatter(lons,lats, s=5)
By using .tolist()
function, we can convert Series type into list type (for this display example, it's also ok with no convertion).
Also display all the airports in China, results are:
[图片上传失败...(image-652589-1550908860557)]
4. Saving a dataframe to a CSV file
For saving a dataframe, use .to_csv()
function, like:
cn_large_airport.to_csv('cn_large_airport.csv')
网友评论