Excel文件解析
1. 访问Excel文件中的工作表:
import xlrd
book = xlrd.open_workbook('data/chp4/SOWC 2014 Stat Tables_Table 9.xlsx')
for sheet in book.sheets():
print(sheet)
用sheet
的时候回输出:<xlrd.sheet.Sheet object at 0x0000017E6B7DEF98> <xlrd.sheet.Sheet object at 0x0000017E6B82C390>
如果需要输出表单的名称则需要用 print(sheet.name)
这时候就会输出Excel文件中的表单名称:
Data Notes
Table 9
2. 访问表单中的数据
计算总行数
import xlrd
book = xlrd.open_workbook('data/chp4/SOWC 2014 Stat Tables_Table 9.xlsx')
sheet = book.sheet_by_name('Table 9 ')
print(sheet.nrows)
3. 找到数据中国家出现的位置
import xlrd
book = xlrd.open_workbook('data/chp4/SOWC 2014 Stat Tables_Table 9.xlsx')
sheet = book.sheet_by_name('Table 9 ')
count = 0
for i in range(sheet.nrows):
if count<20:
row = sheet.row_values(i)
print(i,row)
count +=1
我们将会得到下面的输出结果:
0 ['', 'TABLE 9. CHILD PROTECTION', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
1 ['', '', 'TABLEAU 9. PROTECTION DE L’ENFANT', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''
, '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
2 ['', '', '', 'TABLA 9. PROTECCIÓN INFANTIL', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
3 ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '', '', '', '', '', '', '', '']
4 ['', 'Countries and areas', '', '', 'Child labour (%)+\n2005–2012*', '', '', '', '', '', 'Child marriage (%)\n2005–2012*', '', '', '',
'Birth registration (%)+\n2005–2012*', '', 'Female genital mutilation/cutting (%)+\n2002–2012*', '', '', '', '', '', 'Justification of wi
fe beating (%)\n 2005–2012*', '', '', '', 'Violent discipline (%)+\n2005–2012*', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
'', '', '']
5 ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'prevalence', '', '', '', 'attitudes', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
6 ['', '', '', '', 'total', '', 'male', '', 'female', '', 'married by 15', '', 'married by 18', '', 'total', '', 'womena ', '', 'girlsb ', '', 'support for the practicec', '', 'male', '', 'female', '', 'total', '', 'male', '', 'female', '', '', '', '', '', '', '', '', '', '',
'', '', '']
7 ['', 'FRENCH HEADINGS', 'Pays et zones', '', 'Travail des enfants (%)+\n2005–2012*', '', '', '', '', '', 'Mariage d’enfants (%)\n2005–
2012*', '', '', '', 'Enregistrement\ndes naissances\n(%)+\n2005–2012*\n', '', 'Mutilations génitales féminines/excision (%)+\n2002–2012*'
, '', '', '', '', '', 'Justification de la\nviolence conjugale (%)\n2005–2012*', '', '', '', 'Discipline imposée par la violence (%)+\n200
5–2012*', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
8 ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'prévalence', '', '', '', 'attitudes', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
9 ['', '', '', '', 'total', '', 'garçons', '', 'filles', '', 'marié à 15 ans\n', '', 'marié à 18 ans', '', 'total', '', 'femmes a ', '', 'filles b ', '', 'soutien à cette pratique c', '', 'hommes', '', 'femmes', '', 'total', '', 'garçons', '', 'filles', '', '', '', '', '', '', '', '', '', '', '', '', '']
10 ['', 'SPANISH HEADINGS', '', 'Países y zonas', 'Trabajo infantil (%)+ 2005–2012*', '', '', '', '', '', 'Matrimonio precoz (%) 2005–201
2*', '', '', '', 'Inscripción del\nnacimiento (%) + 2005–2012*', '', 'Mutilación/excisión genital (%)+\n2002–2012*', '', '', '', '', '',
'Justificación de golpear\na la mujer (%)\n2005–2012*', '', '', '', 'Disciplina violenta (%)+ 2005–2012*', '', '', '', '', '', '', '', '
', '', '', '', '', '', '', '', '', '']
11 ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'prevalencia', '', '', '', 'actitudes', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
12 ['', '', '', '', 'total', '', 'hombre', '', 'mujer', '', 'casados a los 15 años', '', 'casados a los 18 años', '', 'total', '', 'Mujeresa ', '', 'hijasb ', '', 'apoyo a la prácticac', '', 'hombre', '', 'mujer', '', 'total', '', 'hombre', '', 'mujer', '', '', '', '', '', '', '', '', '', '', '', '', '']
13 ['', '', 'FRENCH COUNTRY NAMES', 'SPANISH COUNTRY NAMES', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
14 ['', 'Afghanistan', 'Afghanistan', 'Afganistán', 10.3, '', 11.0, '', 9.6, '', 15.0, '', 40.4, '', 37.4, '', '–', '', '–', '', '–', ''
, '–', '', 90.2, '', 74.4, '', 74.8, '', 74.1, '', '', '', '', '', '', '', '', '', '', '', '', '']
15 ['', 'Albania', 'Albanie', 'Albania', 12.0, ' ', 14.4, ' ', 9.4, ' ', 0.2, '', 9.6, '', 98.6, '', '–', '', '–', '', '–', '', 36.4,
'', 29.8, '', 75.1, '', 78.3, '', 71.4, '', '', '', '', '', '', '', '', '', '', '', '', '']
16 ['', 'Algeria', 'Algérie', 'Argelia', 4.7, 'y', 5.5, 'y', 3.9, 'y', 0.1, '', 1.8, '', 99.3, '', '–', '', '–', '', '–', '', '–', '',
67.9, '', 87.7, '', 88.8, '', 86.5, '', '', '', '', '', '', '', '', '', '', '', '', '']
17 ['', 'Andorra', 'Andorre', 'Andorra', '– ', ' ', '–', ' ', '–', ' ', '–', '', '–', '', 100.0, 'v', '–', '', '–', '', '–', '', '
–', '', '–', '', '–', '', '–', '', '–', '', '', '', '', '', '', '', '', '', '', '', '', '']
18 ['', 'Angola', 'Angola', 'Angola', 23.5, 'x', 22.1, 'x', 24.8, 'x', '–', '', '–', '', 36.0, 'x', '–', '', '–', '', '–', '', '–', '
', '–', '', '–', '', '–', '', '–', '', '', '', '', '', '', '', '', '', '', '', '', '']
19 ['', 'Antigua and Barbuda', 'Antigua-et-Barbuda', 'Antigua y Barbuda', '–', ' ', '–', ' ', '–', ' ', '–', '', '–', '', '–', '', '
–', '', '–', '', '–', '', '–', '', '–', '', '–', '', '–', '', '–', '', '', '', '', '', '', '', '', '', '', '', '', '']
从输出的结果可以看到在第14行的时候输出了国家的数据;这就是我们想要获得是数据;
在Excel表中可以看到的数据是这样的(其中法语和西班牙语的表头我隐藏了方便查看)
Countries and areas Child labour (%)+
2005–2012* Child marriage (%)
2005–2012* Birth registration (%)+
2005–2012* Female genital mutilation/cutting (%)+
2002–2012* Justification of wife beating (%)
2005–2012* Violent discipline (%)+
2005–2012*
prevalence attitudes
total male female married by 15 married by 18 total womena girlsb support for the practicec male female total male female
Afghanistan 10 11 10 15 40 37 – – – – 90 74 75 74
Albania 12 14 9 0 10 99 – – – 36 30 75 78 71
Algeria 5 y 6 y 4 y 0 2 99 – – – – 68 88 89 87
Andorra – – – – – 100 v – – – – – – – –
Angola 24 x 22 x 25 x – – 36 x – – – – – – – –
在表格中需要找到数据所在的位置:可以查看下图的数据
Countries and areas Child labour (%)+
2005–2012* Child marriage (%)
2005–2012*
total male female married by 15 married by 18
Afghanistan Afghanistan Afganistán 10 11 10 15 40
Albania Albanie Albania 12 14 9 0 10
Algeria Algérie Argelia 5 y 6 y 4 y 0 2
import xlrd
import pprint
book = xlrd.open_workbook('data/chp4/SOWC 2014 Stat Tables_Table 9.xlsx')
sheet = book.sheet_by_name('Table 9 ')
count = 0
data = {}
for i in range(14,sheet.nrows):
row = sheet.row_values(i)
country = row[1]
data[country] = {
'child_labor':{
'total':[row[4],row[5]],
'male':[row[6],row[7]],
'female':[row[8],row[9]],
},
'child_marriage':{
'married_by_15':[row[10],row[11]],
'married_by_18':[row[12],row[13]],
}
}
pprint.pprint(data['Afghanistan']
输出结果为:
{'child_labor': {'female': [9.6, ''], 'male': [11.0, ''], 'total': [10.3, '']},
'child_marriage': {'married_by_15': [15.0, ''], 'married_by_18': [40.4, '']}}
在这里我们引入了pprint
这个库;为了将结果输出好看些;否则在使用print
的时候会输出{'child_labor': {'total': [10.3, ''], 'male': [11.0, ''], 'female': [9.6, '']}, 'child_marriage': {'married_by_15': [15.0, ''], 'married_by_18': [40.4, '']}}
和部方便查看结果!
<font color=red size=5>另外需要注意的地方:row的索引也是从0开始的;</font>
网友评论