美文网首页
Python实战计划学习笔记0629

Python实战计划学习笔记0629

作者: 个十滴水 | 来源:发表于2016-06-29 01:50 被阅读0次

    实战计划第一天,抓了一个本地网页。

    最终成果是这样的:

    Paste_Image.png

    我的代码:

    from bs4 import BeautifulSoup
    info = []
    with open('E:/PycharmProjects/homework2/homework2/1_2_homework_required/index.html','r') as data:
    Soup = BeautifulSoup(data,'lxml')
    images = Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
    titles = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
    prices = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
    grades = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')
    counts = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
    #  print(images,titles,grades,prices,counts)
    for title,image,price,grade,count in zip(titles,images,prices,grades,counts):
    data1 = {
    'title' : title.get_text(),
    'image' : image.get('src'),
    'price' : price.get_text(),
    'grade' : len(grade.find_all("span" , class_ = "glyphicon glyphicon-star" )),
    'count' : count.get_text()
    }
    print(data1)
    info.append(data1)
    

    总结

    • lxml在内的三种解析方式
    • :nth-child(1)>img 代表具体到每一个子节点,抓所有元素时要删除或 变成nth-of-type
    • 步骤1.soup解析2.复制CSS path(注意格式要对,尤其空格等)3.筛选信息4.字典扩充info.append(data1)
    • ()tupple []list {}dic
    • grade和grades区别:抓网页时grades是父节点个数,grade是每个父节点下星星构成的list

    相关文章

      网友评论

          本文标题:Python实战计划学习笔记0629

          本文链接:https://www.haomeiwen.com/subject/neatjttx.html