一、HTML文件格式
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>My test page</title>
</head>
<body>
<p>This is my page</p>
</body>
</html>
二、Python下生成HTML文档
.html 文档外在表现为许多行包含各个标签的文本,实际上可将其抽象为一棵标签树。
使用 xml.etree.ElementTree
来管理 .html 的标签树,并将该树转换为 .html 文档。
2.1 基本树结构
import xml.etree.ElementTree as et
class HtmlTree(object):
doctype_str = "<!DOCTYPE html>"
def __init__(self):
self.html_ele = et.Element("html")
self.head_ele = et.SubElement(self.html_ele, "head")
self.body_ele = et.SubElement(self.html_ele, "body")
self.charset_ele = et.SubElement(self.head_ele, "meta", attrib={"charset": "utf-8"})
self.title_ele = et.SubElement(self.head_ele, "title")
2.2 将树转为字符串
class HtmlTree(object):
# ...
def __str__(self):
html_str = et.tostring(self.html_ele, encoding="unicode")
return self.doctype_str + '\n' + html_str
2.3 设置 title
class HtmlTree(object):
# ...
def set_title(self, title_str):
self.title_ele.text = title_str
2.4 设置 body
class HtmlTree(object):
# ...
def set_body(self, body_str):
body_str = "<body>" + body_str + "</body>"
body_subtree = et.fromstring(body_str)
# 复制body元素的内容,参考 Element.copy() 函数源码
self.body_ele.text = body_subtree.text
self.body_ele.tail = body_subtree.tail
self.body_ele[:] = body_subtree # 复制子节点
网友评论