在通往数据分析师的路上,最得心应手的工具,莫过于基于numpy库的pandas。今天拿到一个题目:把身份证号码的区域解析出来。于是用了两个小时学习,并做了出来。
实现思路:
1、先从excel读取身份证号码,并转换成str字符。
2、手工建一个DataFrame,把身份证的前两位对应的省做好对照。
3、将读取的身份证号码截取前两位,并转换成int字符。
4、从DataFrame里匹配身份证号前两位,输出区域。
实现效果:
#coding=utf-8
import os
import numpy as np
import pandas as pd
os.chdir('d:\\py\\')
#print('work_directory: ', os.getcwd())
#area={"11":"北京","12":"天津","13":"河北","14":"山西","15":"内蒙古","21":"辽宁","22":"吉林","23":"黑龙江","31":"上海","32":"江苏","33":"浙江","34":"安徽","35":"福建","36":"江西","37":"山东","41":"河南","42":"湖北","43":"湖南","44":"广东","45":"广西","46":"海南","50":"重庆","51":"四川","52":"贵州","53":"云南","54":"西藏","61":"陕西","62":"甘肃","63":"青海","64":"宁夏","65":"新疆","71":"台湾","81":"香港","82":"澳门","91":"国外"}
#手工建一个area,对照区域
area = {'areacode':[11,12,13,14,15,21,22,23,31,32,33,34,35,36,37,41,42,43,44,45,46,50,51,52,53,54,61,62,63,64,65,71,81,82,91],'areaname':['北京','天津','河北','山西','内蒙古','辽宁','吉林','黑龙江','上海','江苏','浙江','安徽','福建','江西','山东','河南','湖北','湖南','广东','广西','海南','重庆','四川','贵州','云南','西藏','陕西','甘肃','青海','宁夏','新疆','台湾','香港','澳门','国外']}
#将area加上索引,形成一个pandas表
area_df = pd.DataFrame(area)
#area_df.to_excel('区域表.xlsx')
#print(area_df)
#注意dtype,把身份证号取为str格式,否则数字格式无法拆分字符串
df = pd.read_excel("身份证号码.xlsx",dtype={'身份证号':str})
#print(df)
#定义code,从df里按坐标值取数
code = df.iloc[0,1]
#取code前两位字符串
s_area = code[0:2]
#print(type(s_area))
#loc可以对df进行筛选操作,int则是把s_area转换成整数,便于比较
print(area_df.loc[area_df["areacode"] == int(s_area)] )
网友评论