美文网首页
Python3之xml处理(xml.etree.ElementT

Python3之xml处理(xml.etree.ElementT

作者: 羋学僧 | 来源:发表于2021-12-23 16:07 被阅读0次

    XML简介

    XML(可扩展性标记语言eXtensible Markup Language)是一种非常常用的文件类型,被设计用来传输和存储数据而不是显示数据(HTML用于显示数据),XML 标签没有被预定义。您需要自行定义标签。python3.3以后使用xml.etree.ElementTree模块

    XML格式

    (1)标签/元素
    (2)属性
    (3)数据

    例如:

    <?xml version="1.0" encoding="UTF-8"?>
    <breakfast_menu>
        <food>
            <name>Belgian Waffles</name>
            <price>$5.95</price>
            <description>
                Two of our famous Belgian Waffles with plenty of real maple syrup
            </description>
            <calories>650</calories>
        </food>
        <food>
            <name>Strawberry Belgian Waffles</name>
            <price>$7.95</price>
            <description>
                Light Belgian waffles covered with strawberries and whipped cream
            </description>
            <calories>900</calories>
            </food>
    </breakfast_menu>
    

    XML读取操作

    有以下两种方式可以对XML字符串进行操作,都是拿到根节点breakfast_menu元素,在此基础上增删改查XML
    (1)通过字符串方式读取,参数为XML字符串,直接返回的是一个根Element对象

    import xml.etree.ElementTree as ET
    xml_string ='''
    <breakfast_menu>
        <food>
            <name>Belgian Waffles</name>
            <price>$5.95</price>
            <description>
                Two of our famous Belgian Waffles with plenty of real maple syrup
            </description>
            <calories>650</calories>
        </food>
        <food>
            <name>Strawberry Belgian Waffles</name>
            <price>$7.95</price>
            <description>
                Light Belgian waffles covered with strawberries and whipped cream
            </description>
            <calories>900</calories>
            </food>
        <food>
            <name>Berry-Berry Belgian Waffles</name>
            <price>$8.95</price>
            <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
            <calories>900</calories>
        </food>
        <food>
            <name>French Toast</name>
            <price>$4.50</price>
            <description>
                Thick slices made from our homemade sourdough bread
            </description>
            <calories>600</calories>
        </food>
        <food>
            <name>Homestyle Breakfast</name>
            <price>$6.95</price>
            <description>
                Two eggs, bacon or sausage, toast, and our ever-popular hash browns
            </description>
            <calories>950</calories>
        </food>
    </breakfast_menu>
    '''
    root = ET.fromstring(xml_string)
    print(root) ==> <Element 'breakfast_menu' at 0x101f94a98>
    

    (2)从xml文件中读取,用getroot获取根节点,根节点也是Element对象

    import xml.etree.ElementTree as ET
    tree = ET.parse('xml_test')
    root = tree.getroot()
    print(tree) ==> <xml.etree.ElementTree.ElementTree object at 0x104100a20>
    print(root) ==> <Element 'breakfast_menu' at 0x101d94a98>这个通第一种方式直接获取的结果一样
    

    访问XML元素:标签(tag)、属性(attrib)、值(text)

    (1)访问Element元素对象的标签、属性、值

    tag = element.tag
    attrib = element.attrib # 字典
    value = element.text
    

    (2)访问子节点元素对象及其标签、属性、值

    # 这里的for i in root只能访问root的直接子元素,下面的for I in root.iter()是访问直接子元素。
    for child in root:
        print(child,child.tag,child.attrib,child.text)
        for child_child in child:
            print(child_child, child_child.tag, child_child.attrib, child_child.text)
    
    #结果类似如下:
    
    <Element 'food' at 0x101e94bd8> food {} 
            <Element 'name' at 0x104111cc8> name {} Belgian Waffles<Element 'price' at                   0x104570278> price {} $5.95
            <Element 'description' at 0x1045702c8> description {} 
                      Two of our famous Belgian Waffles with plenty of real maple syrup
            <Element 'calories' at 0x104570908> calories {} 650
            
    <Element 'food' at 0x10457a9f8> food {} 
            <Element 'name' at 0x10457aa48> name {} Strawberry Belgian Waffles
            <Element 'description' at 0x10457ab38> description {} 
                      Light Belgian waffles covered with strawberries and whipped cream
            <Element 'calories' at 0x10457ab88> calories {} 900
    

    (3)Elements元素对象都是可迭代的对象,可以直接对其list(Element)将其转化为列表或者直接索引取:

    import xml.etree.ElementTree as ET
    tree = ET.parse('xml_test')
    root = tree.getroot()
    print(list(root)) ==>[<Element 'food' at 0x101c94bd8>, <Element 'food' at 0x10457aa48>, <Element 'food' at 0x10457ac28>, <Element 'food' at 0x10457ae08>, <Element 'food' at 0x10457af98>]
    print(root[0],root[1])
    

    如上,list(root)的结果就是其3个子元素组成的列表,这时可以访问其标签、属性、值,然后对其每个子元素也可以同样的方法转换为列表访问各个属性,当然可以通过迭代的方法用for循环来操作。

    (4)按照元素名字访问或者迭代元素

     1. Element.iter("tag"),可以罗列该节点所包含的所有其他节点(element对象)
        print(root.iter()) :返回一个可迭代对象,迭代这个对象可以迭代出包括根节点在内的所有元素节点
        print(list(root.iter())) :返回一个列表,将所有元素对象放在一个列表中
        print(root.iter('name')) :返回一个可迭代对象,迭代这个对象可以迭代出所有元素标签名为name的元素elemen对象
        print(list(root.iter('name'))):返回一个列表,将所有标签名为name的元素对象放到一个列表中
     2. Element.findall("tag"):查找当前元素为“tag”的直接子元素,tag不能省略
     3. Element.find("tag"):查找为tag的第一个直接子元素,如没有,返回None
    

    (5)修改XML文件

    ElementTree.write("xml_test"):更新xml文件
    Element.append(element):为当前element对象添加子元素(element)
    Element.set(key,value):为当前element的key属性设置value值
    Element.remove(element):删除为element的节点
    
    #读取待修改xml文件
    updateTree = ET.parse("xml_test")
    root = updateTree.getroot()
    #创建新节点并添加为root的子节点
    newEle = ET.Element("NewElement")
    newEle.attrib = {"name":"NewElement","age":"20"}
    newEle.text = "This is a new element"
    root.append(newEle)
    
    #修改sub1的name属性
    sub1 = root.find("food")
    sub1.set("name","New Name")
    
    #修改sub2的数据值
    sub2 = root.find("sub2")
    sub2.text = "New Value"
    
    #写回原文件
    updateTree.write("xml_test")
    
    # sample.xml
    <data data_attrib="hello xml" data_attrib2="hello xml2">
        <country name="Liechtenstein">
            <rank>1</rank>
            <year>2008</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria">textqqq</neighbor>
            <neighbor direction="W" name="Switzerland" />
        </country>
        <country name="Singapore">
            <rank>4</rank>
            <year>2011</year>
            <gdppc>59900</gdppc>
            <neighbor direction="N" name="Malaysia" />
        </country>
        <country name="Panama">
            <rank>68</rank>
            <year>2012</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
        <country name="China">
            <rank>8</rank>
            <neighbor direction="E" name="Japan">I am Japan</neighbor>
        </country>
    </data>
    
    tree = ElementTree.parse('sample.xml')
    root = tree.getroot()
    for country in root.findall('country'):
        if country.attrib["name"] == "China":
            neighbor = country.find("neighbor")
            neighbor.text = "I am Japan"
            tree.write('sample.xml')
    

    相关文章

      网友评论

          本文标题:Python3之xml处理(xml.etree.ElementT

          本文链接:https://www.haomeiwen.com/subject/uughqrtx.html