最简单的爬虫

作者: 一只失去梦想的程序猿 | 来源:发表于2017-06-15 11:36 被阅读146次

最简单的爬虫
简单爬虫架构的实现
各语言简单爬虫
最简单的爬虫入门
python 最简单爬虫
Golang实现简单爬虫框架（4）——队列实现并发任务调度
Python网络爬虫一
几行Python代码爬取3000+上市公司的信息
Python代码爬取3000+ 上市公司的信息！能上市的都有这样
BeautifulSoup的简单使用

抓取网址bt蚂蚁
搜一个烂片试试,比如长城http://www.btans.com/search/%E9%95%BF%E5%9F%8E-first-asc-1
其实就是http://www.btans.com/search/长城-first-asc-1
然后code

# -*- coding: utf-8 -*-

import os
import sys
import re
import requests
from lxml import html
reload(sys)
sys.setdefaultencoding('utf8')
# http://www.btans.com/search/%E9%95%BF%E5%9F%8E-first-asc-1
def analyUrl(name):
    url='http://www.btans.com/search/%s-first-asc-1'%name
    print url   
    response=requests.get(url).content
    selector = html.fromstring(response)
    hrefs=selector.xpath('//div[@class="search-item"]')
    sourcelist=[]
    if len(hrefs)>0:
        href=hrefs[0]
        name=href.xpath('div[@class="item-title"]/a/span')
        sourcelist.append(name[0].text)
        downUrl=href.xpath('div[@class="item-bar"]/a/@href')
        # print len(downUrl)
        for x in downUrl:
            sourcelist.append(x)
    return sourcelist
def searchFH(name):
    seedstr = '\n'.join(analyUrl(name))
    return  seedstr
print searchFH('长城')

会些python的基础知识,然后照着别人的爬虫写,自然而然也就会了.反正我是这么认为的.

网友评论

本文标题：最简单的爬虫

本文链接：https://www.haomeiwen.com/subject/limyqxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

最简单的爬虫

相关文章

最简单的爬虫

简单爬虫架构的实现

各语言简单爬虫

最简单的爬虫入门

python 最简单爬虫

Golang实现简单爬虫框架（4）——队列实现并发任务调度

Python网络爬虫一

几行Python代码爬取3000+上市公司的信息

Python代码爬取3000+ 上市公司的信息！能上市的都有这样

BeautifulSoup的简单使用

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Python 运维

Python语言与信息数据获取和机器学习