知识点总结【爬虫的入口写法和url循环】

作者: NoValue | 来源:发表于2017-05-18 22:56 被阅读180次

知识点总结【爬虫的入口写法和url循环】
[spider]爬虫入口之URL
jQuery入口函数
tp5使用curl特性进行定时多线程爬虫(或任务),使用redi
tp5使用curl特性进行定时多线程爬虫(或任务),使用redi
爬虫基础
2019-05-05: 三：Swift中循环使用？
Sqlserver存储过程如何写循环
shell的for while读取文件写法和区别
Python开发简单爬虫（慕课网学习资料总结）

直接看代码：

# -*- coding:utf-8 -*-
# **********************************
# ** http://weibo.com/lixiaodaoaaa #
# ****** by:lixiaodaoaaa ***********


from bs4 import BeautifulSoup
import sys
import requests
import time


def detailOper(url):
    web_data = requests.get(url)
    soup = BeautifulSoup(web_data.text, 'lxml')
    titles = soup.select('div.list > ul > li > div > p.infoBox > a')
    prices = soup.select('div.list > ul > li > div > p.priType-s > span > i')
    print (" open  the url is  " + url)



    for title, price in zip(titles, prices):
        data = {
            'title': title.get_text().encode(encoding="utf-8"),
            'detailHerf': title.get('href'),
            'price': price.get_text().replace(u'万', '').replace(' ', '')
        }
        print(data['title'])
        print(data['detailHerf'])
        print(data['price'])

def start():
    urls = ['http://www.guazi.com/tj/buy/o{}/'.format(str(i)) for i in range(1, 30, 1)]
    for url in urls:
        time.sleep(5)
        detailOper(url)

if __name__ == '__main__':
    start()

首先我们看这句话：

if __name__ == '__main__':

解释：直接调用了main函数启动了页面。

我们经常需要访问一个ulr的第1页第2页第3页要做一个循环怎么办呢？我们举个例子http://www.guazi.com/tj/buy/o1 o2 o3 o3 只有后面变。前面不变。这个是一个列表说白了就是个List.一句话怎么写呢？看这里：

  urls = ['http://www.guazi.com/tj/buy/o{}/'.format(str(i)) for i in range(1, 30, 1)]

这就是一个可以遍历的对象。OK了总结完毕。

网友评论

本文标题：知识点总结【爬虫的入口写法和url循环】

本文链接：https://www.haomeiwen.com/subject/iceexxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

知识点总结【爬虫的入口写法和url循环】

直接看代码：

首先我们看这句话：

相关文章

知识点总结【爬虫的入口写法和url循环】

[spider]爬虫入口之URL

jQuery入口函数

tp5使用curl特性进行定时多线程爬虫(或任务),使用redi

tp5使用curl特性进行定时多线程爬虫(或任务),使用redi

爬虫基础

2019-05-05: 三：Swift中循环使用？

Sqlserver存储过程如何写循环

shell的for while读取文件写法和区别

Python开发简单爬虫（慕课网学习资料总结）

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读