https://www.datacamp.com/community/tutorials/python-xml-elementtree
数据
<?xml version='1.0' encoding='utf8'?>
<collection>
<genre category="Action">
<decade years="1980s">
<movie favorite="True" title="Indiana Jones: The raiders of the lost Ark">
<format multiple="No">DVD</format>
<year>1981</year>
<rating>PG</rating>
<description>
'Archaeologist and adventurer Indiana Jones
is hired by the U.S. government to find the Ark of the
Covenant before the Nazis.'
</description>
</movie>
<movie favorite="True" title="THE KARATE KID">
<format multiple="Yes">DVD,Online</format>
<year>1984</year>
<rating>PG</rating>
<description>None provided.</description>
</movie>
<movie favorite="False" title="Back 2 the Future">
<format multiple="False">Blu-ray</format>
<year>1985</year>
<rating>PG</rating>
<description>Marty McFly</description>
</movie>
</decade>
<decade years="1990s">
<movie favorite="False" title="X-Men">
<format multiple="Yes">dvd, digital</format>
<year>2000</year>
<rating>PG-13</rating>
<description>Two mutants come to a private academy for their kind whose resident superhero team must
oppose a terrorist organization with similar powers.</description>
</movie>
<movie favorite="True" title="Batman Returns">
<format multiple="No">VHS</format>
<year>1992</year>
<rating>PG13</rating>
<description>NA.</description>
</movie>
<movie favorite="False" title="Reservoir Dogs">
<format multiple="No">Online</format>
<year>1992</year>
<rating>R</rating>
<description>WhAtEvER I Want!!!?!</description>
</movie>
</decade>
</genre>
<genre category="Thriller">
<decade years="1970s">
<movie favorite="False" title="ALIEN">
<format multiple="Yes">DVD</format>
<year>1979</year>
<rating>R</rating>
<description>"""""""""</description>
</movie>
</decade>
<decade years="1980s">
<movie favorite="True" title="Ferris Bueller's Day Off">
<format multiple="No">DVD</format>
<year>1986</year>
<rating>PG13</rating>
<description>Funny movie about a funny guy</description>
</movie>
<movie favorite="FALSE" title="American Psycho">
<format multiple="No">blue-ray</format>
<year>2000</year>
<rating>Unrated</rating>
<description>psychopathic Bateman</description>
</movie>
</decade>
</genre>
<genre category="Comedy">
<decade years="1960s">
<movie favorite="False" title="Batman: The Movie">
<format multiple="Yes">DVD,VHS</format>
<year>1966</year>
<rating>PG</rating>
<description>What a joke!</description>
</movie>
</decade>
<decade years="2010s">
<movie favorite="True" title="Easy A">
<format multiple="No">DVD</format>
<year>2010</year>
<rating>PG--13</rating>
<description>Emma Stone = Hester Prynne</description>
</movie>
<movie favorite="True" title="Dinner for SCHMUCKS">
<format multiple="Yes">DVD,digital,Netflix</format>
<year>2011</year>
<rating>Unrated</rating>
<description>Tim (Rudd) is a rising executive
who “succeeds” in finding the perfect guest,
IRS employee Barry (Carell), for his boss’ monthly event,
a so-called “dinner for idiots,” which offers certain
advantages to the exec who shows up with the biggest buffoon.
</description>
</movie>
</decade>
<decade years="1980s">
<movie favorite="False" title="Ghostbusters">
<format multiple="No">Online,VHS</format>
<year>1984</year>
<rating>PG</rating>
<description>Who ya gonna call?</description>
</movie>
</decade>
<decade years="1990s">
<movie favorite="True" title="Robin Hood: Prince of Thieves">
<format multiple="No">Blu_Ray</format>
<year>1991</year>
<rating>Unknown</rating>
<description>Robin Hood slaying</description>
</movie>
</decade>
</genre>
</collection>
基本操作
import xml.etree.ElementTree as ET
tree = ET.parse('movies.xml')
root = tree.getroot()
root.tag # 查看根
root.attrib # 查看根的属性和值
# 用for循环遍历subelements(children)
for child in root:
print(child.tag, child.attrib)
# 看一下总共有多少元素,但是显示不出属性,以及在第几级
[elem.tag for elem in root.iter()]
# 打印整个XML文档
print(ET.tostring(root, encoding='utf8').decode('utf8'))
# 用iter()方法,可以帮助找到感兴趣的特定的elements
# root.iter()会列出root下的与指定的element对应的所有subelements
for movie in root.iter('movie'):
print(movie.attrib)
XPath Expressions
# 许多时候elements没有属性,只有文本内容,用.text方法
for description in root.iter('description'):
print(description.text)
# .findall()方法:遍历referenced element的直接children的函数
# 例子:搜索树中1992年的电影
for movie in root.findall("./genre/decade/movie/[year='1992']"):
print(movie.attrib)
# 例子:也可以按属性值搜索,例如,找到支持multiple formats 的电影
for movie in root.findall("./genre/decade/movie/format/[@multiple='Yes']"):
print(movie.attrib)
# 在XPath中使用'...'来返回当前元素的父元素。
for movie in root.findall("./genre/decade/movie/format[@multiple='Yes']..."):
print(movie.attrib)
修改XML
# 名称很乱,需要修改,例:找到电影名为Back 2 the Future,保存为一个变量
b2tf = root.find("./genre/decade/movie[@title='Back 2 the Future']")
print(b2tf)
# 修改题目为 Back to the Future, 很简单,直接为属性赋值
b2tf.attrib["title"] = "Back to the Future"
print(b2tf.attrib)
# 写回文件,以使更改永久生效
tree.write("movies.xml")
修改属性值
The multiple
attribute is incorrect in some places. Use ElementTree
to fix the designator based on how many formats the movie comes in. First, print the format
attribute and text to see which parts need to be fixed.
for form in root.findall("./genre/decade/movie/format"):
print(form.attrib, form.text)
用正则表达式观察是否是多格式(因为多格式中间有,)
import re
for form in root.findall("./genre/decade/movie/format"):
# Search for the commas in the format text
match = re.search(',',form.text)
if match:
form.set('multiple','Yes')
else:
form.set('multiple','No')
# Write out the tree to the file again
tree.write("movies.xml")
tree = ET.parse('movies.xml')
root = tree.getroot()
for form in root.findall("./genre/decade/movie/format"):
print(form.attrib, form.text)
移动elements
一些数据被置于错误的decade。使用您学习的XML和ElementTree
来查找和修复decade数据错误。 在整个文档中打印decade
标签和year
标签将非常有用。
for decade in root.findall("./genre/decade"):
print(decade.attrib)
for year in decade.findall("./movie/year"):
print(year.text, '\n')
for movie in root.findall("./genre/decade/movie/[year='2000']"):
print(movie.attrib)
You have to add a new decade tag, the 2000s, to the Action genre in order to move the X-Men data. The .SubElement()
method can be used to add this tag to the end of the XML.
action = root.find("./genre[@category='Action']")
new_dec = ET.SubElement(action, 'decade')
new_dec.attrib["years"] = '2000s'
print(ET.tostring(action, encoding='utf8').decode('utf8'))
Now append the X-Men movie to the 2000s and remove it from the 1990s, using .append()
and .remove()
, respectively.
xmen = root.find("./genre/decade/movie[@title='X-Men']")
dec2000s = root.find("./genre[@category='Action']/decade[@years='2000s']")
dec2000s.append(xmen)
dec1990s = root.find("./genre[@category='Action']/decade[@years='1990s']")
dec1990s.remove(xmen)
print(ET.tostring(action, encoding='utf8').decode('utf8'))
Build XML Documents
tree.write("movies.xml")
tree = ET.parse('movies.xml')
root = tree.getroot()
print(ET.tostring(root, encoding='utf8').decode('utf8'))
网友评论