美文网首页
python实战计划第一周作业:1.2解析网页

python实战计划第一周作业:1.2解析网页

作者: chudi1245 | 来源:发表于2016-08-07 15:11 被阅读0次

    实现了用python代码读取本地网页的并解析出其中的内容

    需要解析的网页

    实现代码

    from bs4 import BeautifulSoup
    
    info = []
    starslist = []
    with open('/Users/Trudy/Desktop/plan-for-combating/week1/1_2/1_2answer_of_homework/index.html', 'r') as wb_data:
        soup = BeautifulSoup(wb_data, 'lxml')
        images = soup.select(
            "body > div > div > div.col-md-9 > div > div > div > img")
        prices = soup.select(
            "body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right")
        titles = soup.select(
            "body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a")
        stars = soup.select(
            "body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)")
        reviews = soup.select(
            "body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right")
    
    for image,price,title,star,review in zip(images,prices,titles,stars,reviews):
        data={
            'image':image.get_text(),
            'price':price.get_text(),
            'title':title.get_text(),
            'star':len(star.find_all("span","glyphicon glyphicon-star")),
            'review':review.get_text()
        }
        info.append(data)
    
    for i in info:
        print(i['title'],i['price'],i['image'],i['review'],i['star'])
    
    

    总结:

    • nth-of-type(2)父元素的第二个 p 元素的每个 p
    • find_all() 方法搜索当前tag的所有tag子节点,并判断是否符合过滤器的条件.这里有几个例子:
    soup.find_all("title")
    #[<title>The Dormouse's story</title>]
    soup.find_all("p", "title"
    [<p class="title"><b>The Dormouse's story</b></p>]
    soup.find_all("a")
    #[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    #<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
    #<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    soup.find_all(id="link2")
    #[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    import re
    soup.find(string=re.compile("sisters"))
    # u'Once upon a time there were three little sisters; and their names were\n'
    

    相关文章

      网友评论

          本文标题:python实战计划第一周作业:1.2解析网页

          本文链接:https://www.haomeiwen.com/subject/xkslsttx.html