美文网首页
处理Excel文件.md

处理Excel文件.md

作者: Jacklylii | 来源:发表于2019-02-22 23:36 被阅读0次

    Excel文件解析

    1. 访问Excel文件中的工作表:

    import xlrd
    book = xlrd.open_workbook('data/chp4/SOWC 2014 Stat Tables_Table 9.xlsx')
    for sheet in book.sheets():
        print(sheet)
    

    sheet 的时候回输出:<xlrd.sheet.Sheet object at 0x0000017E6B7DEF98> <xlrd.sheet.Sheet object at 0x0000017E6B82C390> 如果需要输出表单的名称则需要用 print(sheet.name)
    这时候就会输出Excel文件中的表单名称:

    Data Notes
    Table 9
    

    2. 访问表单中的数据

    计算总行数

    import xlrd
    book = xlrd.open_workbook('data/chp4/SOWC 2014 Stat Tables_Table 9.xlsx')
    sheet = book.sheet_by_name('Table 9 ')
    print(sheet.nrows)
    

    3. 找到数据中国家出现的位置

    import xlrd
    book = xlrd.open_workbook('data/chp4/SOWC 2014 Stat Tables_Table 9.xlsx')
    sheet = book.sheet_by_name('Table 9 ')
    count = 0
    for i in range(sheet.nrows):
        if count<20:
            row = sheet.row_values(i)
            print(i,row)
        count +=1
    

    我们将会得到下面的输出结果:

    0 ['', 'TABLE 9. CHILD PROTECTION', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    1 ['', '', 'TABLEAU 9. PROTECTION DE L’ENFANT', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''
    , '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    2 ['', '', '', 'TABLA 9. PROTECCIÓN INFANTIL', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
    '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    3 ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
    '', '', '', '', '', '', '', '', '', '']
    4 ['', 'Countries and areas', '', '', 'Child labour (%)+\n2005–2012*', '', '', '', '', '', 'Child marriage (%)\n2005–2012*', '', '', '',
    'Birth registration (%)+\n2005–2012*', '', 'Female genital mutilation/cutting (%)+\n2002–2012*', '', '', '', '', '', 'Justification of wi
    fe beating (%)\n 2005–2012*', '', '', '', 'Violent discipline (%)+\n2005–2012*', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
    '', '', '']
    5 ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'prevalence', '', '', '', 'attitudes', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    6 ['', '', '', '', 'total', '', 'male', '', 'female', '', 'married by 15', '', 'married by 18', '', 'total', '', 'womena ', '', 'girlsb ', '', 'support for the practicec', '', 'male', '', 'female', '', 'total', '', 'male', '', 'female', '', '', '', '', '', '', '', '', '', '',
    '', '', '']
    7 ['', 'FRENCH HEADINGS', 'Pays et zones', '', 'Travail des enfants (%)+\n2005–2012*', '', '', '', '', '', 'Mariage d’enfants (%)\n2005–
    2012*', '', '', '', 'Enregistrement\ndes naissances\n(%)+\n2005–2012*\n', '', 'Mutilations génitales féminines/excision (%)+\n2002–2012*'
    , '', '', '', '', '', 'Justification de la\nviolence conjugale (%)\n2005–2012*', '', '', '', 'Discipline imposée par la violence (%)+\n200
    5–2012*', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    8 ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'prévalence', '', '', '', 'attitudes', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    9 ['', '', '', '', 'total', '', 'garçons', '', 'filles', '', 'marié à 15 ans\n', '', 'marié à 18 ans', '', 'total', '', 'femmes a ', '', 'filles b ', '', 'soutien à cette pratique c', '', 'hommes', '', 'femmes', '', 'total', '', 'garçons', '', 'filles', '', '', '', '', '', '', '', '', '', '', '', '', '']
    10 ['', 'SPANISH HEADINGS', '', 'Países y zonas', 'Trabajo infantil (%)+ 2005–2012*', '', '', '', '', '', 'Matrimonio precoz (%) 2005–201
    2*', '', '', '', 'Inscripción del\nnacimiento (%) + 2005–2012*', '', 'Mutilación/excisión genital (%)+\n2002–2012*', '', '', '', '', '',
    'Justificación de golpear\na la mujer (%)\n2005–2012*', '', '', '', 'Disciplina violenta (%)+ 2005–2012*', '', '', '', '', '', '', '', '
    ', '', '', '', '', '', '', '', '', '']
    11 ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'prevalencia', '', '', '', 'actitudes', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    12 ['', '', '', '', 'total', '', 'hombre', '', 'mujer', '', 'casados a los 15 años', '', 'casados a los 18 años', '', 'total', '', 'Mujeresa ', '', 'hijasb ', '', 'apoyo a la prácticac', '', 'hombre', '', 'mujer', '', 'total', '', 'hombre', '', 'mujer', '', '', '', '', '', '', '', '', '', '', '', '', '']
    13 ['', '', 'FRENCH COUNTRY NAMES', 'SPANISH COUNTRY NAMES', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    14 ['', 'Afghanistan', 'Afghanistan', 'Afganistán', 10.3, '', 11.0, '', 9.6, '', 15.0, '', 40.4, '', 37.4, '', '–', '', '–', '', '–', ''
    , '–', '', 90.2, '', 74.4, '', 74.8, '', 74.1, '', '', '', '', '', '', '', '', '', '', '', '', '']
    15 ['', 'Albania', 'Albanie', 'Albania', 12.0, ' ', 14.4, ' ', 9.4, ' ', 0.2, '', 9.6, '', 98.6, '', '–', '', '–', '', '–', '', 36.4,
     '', 29.8, '', 75.1, '', 78.3, '', 71.4, '', '', '', '', '', '', '', '', '', '', '', '', '']
    16 ['', 'Algeria', 'Algérie', 'Argelia', 4.7, 'y', 5.5, 'y', 3.9, 'y', 0.1, '', 1.8, '', 99.3, '', '–', '', '–', '', '–', '', '–', '',
    67.9, '', 87.7, '', 88.8, '', 86.5, '', '', '', '', '', '', '', '', '', '', '', '', '']
    17 ['', 'Andorra', 'Andorre', 'Andorra', '– ', ' ', '–', ' ', '–', ' ', '–', '', '–', '', 100.0, 'v', '–', '', '–', '', '–', '', '
    –', '', '–', '', '–', '', '–', '', '–', '', '', '', '', '', '', '', '', '', '', '', '', '']
    18 ['', 'Angola', 'Angola', 'Angola', 23.5, 'x', 22.1, 'x', 24.8, 'x', '–', '', '–', '', 36.0, 'x', '–', '', '–', '', '–', '', '–', '
    ', '–', '', '–', '', '–', '', '–', '', '', '', '', '', '', '', '', '', '', '', '', '']
    19 ['', 'Antigua and Barbuda', 'Antigua-et-Barbuda', 'Antigua y Barbuda', '–', ' ', '–', ' ', '–', ' ', '–', '', '–', '', '–', '', '
    –', '', '–', '', '–', '', '–', '', '–', '', '–', '', '–', '', '–', '', '', '', '', '', '', '', '', '', '', '', '', '']
    

    从输出的结果可以看到在第14行的时候输出了国家的数据;这就是我们想要获得是数据;
    在Excel表中可以看到的数据是这样的(其中法语和西班牙语的表头我隐藏了方便查看)

    Countries and areas Child labour (%)+
    2005–2012*  Child marriage (%)
    2005–2012*  Birth registration (%)+
    2005–2012*  Female genital mutilation/cutting (%)+
    2002–2012*  Justification of wife beating (%)
     2005–2012* Violent discipline (%)+
    2005–2012*
    prevalence  attitudes
    total   male    female  married by 15   married by 18   total   womena      girlsb  support for the practicec   male    female  total   male    female
    Afghanistan 10      11      10      15      40      37      –       –       –       –       90      74      75      74   
    Albania 12      14      9       0       10      99      –       –       –       36      30      75      78      71   
    Algeria 5   y   6   y   4   y   0       2       99      –       –       –       –       68      88      89      87   
    Andorra –       –       –       –       –       100 v   –       –       –       –       –       –       –       –    
    Angola  24  x   22  x   25  x   –       –       36  x   –       –       –       –       –       –       –       –    
    

    在表格中需要找到数据所在的位置:可以查看下图的数据

    
    Countries and areas         Child labour (%)+
    2005–2012*  Child marriage (%)
    2005–2012*
    
    
    total   male    female  married by 15   married by 18
    
    Afghanistan Afghanistan Afganistán  10      11      10      15      40   
    
    Albania Albanie Albania 12      14      9       0       10   
    
    Algeria Algérie Argelia 5   y   6   y   4   y   0       2    
    
    import xlrd
    import pprint
    book = xlrd.open_workbook('data/chp4/SOWC 2014 Stat Tables_Table 9.xlsx')
    sheet = book.sheet_by_name('Table 9 ')
    count = 0
    data = {}
    
    for i in range(14,sheet.nrows):
        row = sheet.row_values(i)
        country = row[1]
    
        data[country] = {
            'child_labor':{
                'total':[row[4],row[5]],
                'male':[row[6],row[7]],
                'female':[row[8],row[9]],
            },
            'child_marriage':{
                'married_by_15':[row[10],row[11]],
                'married_by_18':[row[12],row[13]],
            }
        }
    pprint.pprint(data['Afghanistan']
    

    输出结果为:

    {'child_labor': {'female': [9.6, ''], 'male': [11.0, ''], 'total': [10.3, '']},
     'child_marriage': {'married_by_15': [15.0, ''], 'married_by_18': [40.4, '']}}
    

    在这里我们引入了pprint这个库;为了将结果输出好看些;否则在使用print的时候会输出{'child_labor': {'total': [10.3, ''], 'male': [11.0, ''], 'female': [9.6, '']}, 'child_marriage': {'married_by_15': [15.0, ''], 'married_by_18': [40.4, '']}}
    和部方便查看结果!
    <font color=red size=5>另外需要注意的地方:row的索引也是从0开始的;</font>

    相关文章

      网友评论

          本文标题:处理Excel文件.md

          本文链接:https://www.haomeiwen.com/subject/zgtuyqtx.html