美文网首页
Web Crawler (2): Web Structure

Web Crawler (2): Web Structure

作者: Yang_silin | 来源:发表于2020-11-03 00:21 被阅读0次

    Hypertext Makeup Language

    HTML is the language to transport information between servers and computers. With CSS and some scripting language built and make up the website. It connects pictures, text, or others through URLs. It looks like a tree and there are many nodes. Every <> represent a node. The node in another node is a child node, so the external node is the father node.

    Structure

    HTML creates a document by denoting structure semantic to test like TItle, Head, Body
    Referer: http://lamyoung.com/img/in-post/201911/2019-11-01-tree.png

    image

    Header

    Tag meaning
    <Header> information of document
    <Title> Title of document
    <base> URL of default tag
    <link> connection with external resouce
    <meta> original data
    <script> script document
    <style> styling

    Body

    In the angle brackets, class represents the attribute, and after an equal sign is the value. For example, <div class=“container”> means that there is an attribute class with value container. An element can be located through an attribute pair.

    XPATH

    There is an absolute and relevant path for describing the location of every tag. With XPath can man easy to locate the element you need.

    the absolute path

    with the example above, if we want to take the element <div class="row">, we can so express like /body/div/div[@class:“row”]

    the relevant path

    It's different from the absolute path, the relevant path isn't needed to express the particle path surround the lowest tag rather than the whole path including the topmost tag like <body>
    It will express like that: //div[@class:“row”], the most important is it should start with //.

    the other expressions

    expression description
    . the current node
    .. the father node
    //@ attribute choose the attribute named attribute
    * match any element node
    @ * match any attribute node
    //title | //price match node "title" and "price"

    相关文章

      网友评论

          本文标题:Web Crawler (2): Web Structure

          本文链接:https://www.haomeiwen.com/subject/pbghvktx.html