Web Crawler (2): Web Structure

作者: Yang_silin | 来源:发表于2020-11-03 00:21 被阅读0次

Web Crawler (2): Web Structure
Web Crawler (1): What is Web-Cra
MAC版IDEA给web项目添加META-INF
500 Lines or Less:A Web Crawler
Idea打包项目war
500 Lines or Less:A Web Crawler
500 lines or less | 异步协程实现的网络爬虫
（GeekBand）系统设计与实践案例分析
Web Information Paper Review (1)
Android相关工具与项目整理

Hypertext Makeup Language

HTML is the language to transport information between servers and computers. With CSS and some scripting language built and make up the website. It connects pictures, text, or others through URLs. It looks like a tree and there are many nodes. Every <> represent a node. The node in another node is a child node, so the external node is the father node.

Structure

HTML creates a document by denoting structure semantic to test like TItle, Head, Body
Referer: http://lamyoung.com/img/in-post/201911/2019-11-01-tree.png

image

Header

Tag	meaning
<Header>	information of document
<Title>	Title of document
<base>	URL of default tag
<link>	connection with external resouce
<meta>	original data
<script>	script document
<style>	styling

Body

In the angle brackets, class represents the attribute, and after an equal sign is the value. For example, <div class=“container”> means that there is an attribute class with value container. An element can be located through an attribute pair.

XPATH

There is an absolute and relevant path for describing the location of every tag. With XPath can man easy to locate the element you need.

the absolute path

with the example above, if we want to take the element <div class="row">, we can so express like /body/div/div[@class:“row”]

the relevant path

It's different from the absolute path, the relevant path isn't needed to express the particle path surround the lowest tag rather than the whole path including the topmost tag like <body>
It will express like that: //div[@class:“row”], the most important is it should start with //.

the other expressions

expression	description
.	the current node
..	the father node
//@ attribute	choose the attribute named attribute
*	match any element node
@ *	match any attribute node
//title \| //price	match node "title" and "price"

网友评论

本文标题：Web Crawler (2): Web Structure

本文链接：https://www.haomeiwen.com/subject/pbghvktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Web Crawler (2): Web Structure

Hypertext Makeup Language

Structure

Header

Body

XPATH

the absolute path

the relevant path

the other expressions

相关文章

Web Crawler (2): Web Structure

Web Crawler (1): What is Web-Cra

MAC版IDEA给web项目添加META-INF

500 Lines or Less:A Web Crawler

Idea打包项目war

500 Lines or Less:A Web Crawler

500 lines or less | 异步协程实现的网络爬虫

（GeekBand）系统设计与实践案例分析

Web Information Paper Review (1)

Android相关工具与项目整理

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读