美文网首页
Python-ElementTree处理XML文件

Python-ElementTree处理XML文件

作者: 微雨旧时歌丶 | 来源:发表于2019-05-25 15:56 被阅读0次

    https://www.datacamp.com/community/tutorials/python-xml-elementtree

    数据

    <?xml version='1.0' encoding='utf8'?>
    <collection>
        <genre category="Action">
            <decade years="1980s">
                <movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
                    <format multiple="No">DVD</format>
                    <year>1981</year>
                    <rating>PG</rating>
                    <description>
                    'Archaeologist and adventurer Indiana Jones 
                    is hired by the U.S. government to find the Ark of the 
                    Covenant before the Nazis.'
                    </description>
                </movie>
                   <movie favorite="True" title="THE KARATE KID">
                   <format multiple="Yes">DVD,Online</format>
                   <year>1984</year>
                   <rating>PG</rating>
                   <description>None provided.</description>
                </movie>
                <movie favorite="False" title="Back 2 the Future">
                   <format multiple="False">Blu-ray</format>
                   <year>1985</year>
                   <rating>PG</rating>
                   <description>Marty McFly</description>
                </movie>
            </decade>
            <decade years="1990s">
                <movie favorite="False" title="X-Men">
                   <format multiple="Yes">dvd, digital</format>
                   <year>2000</year>
                   <rating>PG-13</rating>
                   <description>Two mutants come to a private academy for their kind whose resident superhero team must 
                   oppose a terrorist organization with similar powers.</description>
                </movie>
                <movie favorite="True" title="Batman Returns">
                   <format multiple="No">VHS</format>
                   <year>1992</year>
                   <rating>PG13</rating>
                   <description>NA.</description>
                </movie>
                   <movie favorite="False" title="Reservoir Dogs">
                   <format multiple="No">Online</format>
                   <year>1992</year>
                   <rating>R</rating>
                   <description>WhAtEvER I Want!!!?!</description>
                </movie>
            </decade>    
        </genre>
    
        <genre category="Thriller">
            <decade years="1970s">
                <movie favorite="False" title="ALIEN">
                    <format multiple="Yes">DVD</format>
                    <year>1979</year>
                    <rating>R</rating>
                    <description>"""""""""</description>
                </movie>
            </decade>
            <decade years="1980s">
                <movie favorite="True" title="Ferris Bueller's Day Off">
                    <format multiple="No">DVD</format>
                    <year>1986</year>
                    <rating>PG13</rating>
                    <description>Funny movie about a funny guy</description>
                </movie>
                <movie favorite="FALSE" title="American Psycho">
                    <format multiple="No">blue-ray</format>
                    <year>2000</year>
                    <rating>Unrated</rating>
                    <description>psychopathic Bateman</description>
                </movie>
            </decade>
        </genre>
    
        <genre category="Comedy">
            <decade years="1960s">
                <movie favorite="False" title="Batman: The Movie">
                    <format multiple="Yes">DVD,VHS</format>
                    <year>1966</year>
                    <rating>PG</rating>
                    <description>What a joke!</description>
                </movie>
            </decade>
            <decade years="2010s">
                <movie favorite="True" title="Easy A">
                    <format multiple="No">DVD</format>
                    <year>2010</year>
                    <rating>PG--13</rating>
                    <description>Emma Stone = Hester Prynne</description>
                </movie>
                <movie favorite="True" title="Dinner for SCHMUCKS">
                    <format multiple="Yes">DVD,digital,Netflix</format>
                    <year>2011</year>
                    <rating>Unrated</rating>
                    <description>Tim (Rudd) is a rising executive
                     who “succeeds” in finding the perfect guest, 
                     IRS employee Barry (Carell), for his boss’ monthly event, 
                     a so-called “dinner for idiots,” which offers certain 
                     advantages to the exec who shows up with the biggest buffoon.
                     </description>
                </movie>
            </decade>
            <decade years="1980s">
                <movie favorite="False" title="Ghostbusters">
                    <format multiple="No">Online,VHS</format>
                    <year>1984</year>
                    <rating>PG</rating>
                    <description>Who ya gonna call?</description>
                </movie>
            </decade>
            <decade years="1990s">
                <movie favorite="True" title="Robin Hood: Prince of Thieves">
                    <format multiple="No">Blu_Ray</format>
                    <year>1991</year>
                    <rating>Unknown</rating>
                    <description>Robin Hood slaying</description>
                </movie>
            </decade>
        </genre>
    </collection>
    

    基本操作

    import xml.etree.ElementTree as ET
    tree = ET.parse('movies.xml')
    root = tree.getroot()
    root.tag # 查看根
    root.attrib # 查看根的属性和值
    # 用for循环遍历subelements(children)
    for child in root:
        print(child.tag, child.attrib)
    # 看一下总共有多少元素,但是显示不出属性,以及在第几级
    [elem.tag for elem in root.iter()]
    
    # 打印整个XML文档
    print(ET.tostring(root, encoding='utf8').decode('utf8'))
    
    # 用iter()方法,可以帮助找到感兴趣的特定的elements
    # root.iter()会列出root下的与指定的element对应的所有subelements 
    for movie in root.iter('movie'):
        print(movie.attrib)
    

    XPath Expressions

    # 许多时候elements没有属性,只有文本内容,用.text方法
    for description in root.iter('description'):
        print(description.text)
    
    # .findall()方法:遍历referenced element的直接children的函数
    # 例子:搜索树中1992年的电影
    for movie in root.findall("./genre/decade/movie/[year='1992']"):
        print(movie.attrib)
    
    # 例子:也可以按属性值搜索,例如,找到支持multiple formats 的电影
    for movie in root.findall("./genre/decade/movie/format/[@multiple='Yes']"):
        print(movie.attrib)
    # 在XPath中使用'...'来返回当前元素的父元素。
    for movie in root.findall("./genre/decade/movie/format[@multiple='Yes']..."):
        print(movie.attrib)
    

    修改XML

    # 名称很乱,需要修改,例:找到电影名为Back 2 the Future,保存为一个变量
    b2tf = root.find("./genre/decade/movie[@title='Back 2 the Future']")
    print(b2tf)
    
    # 修改题目为 Back to the Future, 很简单,直接为属性赋值
    b2tf.attrib["title"] = "Back to the Future"
    print(b2tf.attrib)
    
    # 写回文件,以使更改永久生效
    tree.write("movies.xml")
    

    修改属性值

    The multiple attribute is incorrect in some places. Use ElementTree to fix the designator based on how many formats the movie comes in. First, print the format attribute and text to see which parts need to be fixed.

    for form in root.findall("./genre/decade/movie/format"):
        print(form.attrib, form.text)
    

    用正则表达式观察是否是多格式(因为多格式中间有,)

    import re
    
    for form in root.findall("./genre/decade/movie/format"):
        # Search for the commas in the format text
        match = re.search(',',form.text)
        if match:
            form.set('multiple','Yes')
        else:
            form.set('multiple','No')
    
    # Write out the tree to the file again
    tree.write("movies.xml")
    
    tree = ET.parse('movies.xml')
    root = tree.getroot()
    
    for form in root.findall("./genre/decade/movie/format"):
        print(form.attrib, form.text)
    

    移动elements

    一些数据被置于错误的decade。使用您学习的XML和ElementTree来查找和修复decade数据错误。 在整个文档中打印decade标签和year标签将非常有用。

    for decade in root.findall("./genre/decade"):
        print(decade.attrib)
        for year in decade.findall("./movie/year"):
            print(year.text, '\n')
    
    for movie in root.findall("./genre/decade/movie/[year='2000']"):
        print(movie.attrib)
    

    You have to add a new decade tag, the 2000s, to the Action genre in order to move the X-Men data. The .SubElement() method can be used to add this tag to the end of the XML.

    action = root.find("./genre[@category='Action']")
    new_dec = ET.SubElement(action, 'decade')
    new_dec.attrib["years"] = '2000s'
    
    print(ET.tostring(action, encoding='utf8').decode('utf8'))
    

    Now append the X-Men movie to the 2000s and remove it from the 1990s, using .append() and .remove(), respectively.

    xmen = root.find("./genre/decade/movie[@title='X-Men']")
    dec2000s = root.find("./genre[@category='Action']/decade[@years='2000s']")
    dec2000s.append(xmen)
    dec1990s = root.find("./genre[@category='Action']/decade[@years='1990s']")
    dec1990s.remove(xmen)
    
    print(ET.tostring(action, encoding='utf8').decode('utf8'))
    
    

    Build XML Documents

    tree.write("movies.xml")
    
    tree = ET.parse('movies.xml')
    root = tree.getroot()
    
    print(ET.tostring(root, encoding='utf8').decode('utf8'))
    

    相关文章

      网友评论

          本文标题:Python-ElementTree处理XML文件

          本文链接:https://www.haomeiwen.com/subject/ryoazqtx.html