爬取python异步社区图书并写入excel

作者: 肥宅_Sean | 来源:发表于2018-01-29 15:47 被阅读195次

爬取python异步社区图书并写入excel
Python爬取豆瓣图书信息并写入Excel
Python爬取前程无忧招聘信息并写入excel
OpenPyXL的使用教程（一）
Python爬虫学习100练001
2018-03-05爬取的数据写入excel和Postgresq
Scrapy爬取豆瓣图书数据并写入MySQL
使用Python爬取拉勾网职位写入Excel
爬取豆瓣图书TOP250的内容，并写入cvs文件
Python3爬取ACM近期比赛数据并写入Excel文档

爬取python异步社区图书
关于Python的搜索前一页的书

这次的爬取风格比较靠谱，先将爬取到的异步社区的html存起来，这样一方面，加快了测试速度，另一方面，也不会由于爬取的过于频繁，对对方的服务器，造成负担。
所谓“盗亦有道”，大概如是也

import requests
from bs4 import BeautifulSoup
import os, re
import xlwt

def getToThetxt(url):
    res = requests.get(url)
    res.encoding = res.apparent_encoding
    # print(res.text)
    soup = BeautifulSoup(res.text, 'lxml')
    with open(os.getcwd() + '/book.txt', 'w') as f:
        f.write(soup.prettify().replace('\u0142', '').replace('\xa9', ''))


def getfromtxt():
    with open(os.getcwd() + '/book.txt', 'r') as f:
        return f.read()


def getBookMeg(html):
    soup = BeautifulSoup(html, 'lxml')
    search = soup.find(attrs={'id': 'search-result'})
    bookimg = []
    bookNames = []
    bookAuthor = []
    translator = []
    summary = []
    price = []
    books = search.div.ul.find_all('li', attrs={'class': 'block-item bookList__item'})
    for book in books:
        divs = book.find_all('div')
        bookimg.append(divs[0].find('img'))
        bookNames.append(divs[1].find('h3').contents[1].string.replace(' ', '').replace('\n', ''))
        bookAuthor.append(divs[1].find(attrs={'class': 'bookList__author'}).text.replace(' ', '').replace('\n', ''))
        translator.append(divs[1].find(attrs={'class': 'bookList__translator'}).text.replace(' ', '').replace('\n', ''))
        summary.append(divs[1].find(attrs={'class': 'bookList__summary'}).text.replace(' ', '').replace('\n', ''))
        price.append(divs[2].find_all('li')[0].find('em').find('del').text.replace(' ', '').replace('\n', ''))
    # img先放着,先处理好data先
    work_book = xlwt.Workbook("D:\\Code\\python\\BookGet\\")
    sheet = work_book.add_sheet('sheet1')
    sheet.write(0, 0, "书名")
    sheet.write(0, 1, "作者")
    sheet.write(0, 2, "译者")
    sheet.write(0, 3, "大纲")
    sheet.write(0, 4, "价格")
    for i in range(1, len(books)+1):
        sheet.write(i, 0, bookNames[i-1])
        sheet.write(i, 1, bookAuthor[i-1])
        sheet.write(i, 2, translator[i-1])
        sheet.write(i, 3, summary[i-1])
        sheet.write(i, 4, price[i-1])
    work_book.save("book.xls")
    # data


if __name__ == "__main__":
    url = "http://www.epubit.com.cn/search?q=python&type=book"
    path = os.getcwd()
    path = path + '\\book.txt'
    # if not os.path.exists(path):
    if not os.path.exists(path):
        getToThetxt(url)
    getBookMeg(getfromtxt())

爬取python异步社区图书并写入excel
爬取python异步社区图书关于Python的搜索前一页的书这次的爬取风格比较靠谱，先将爬取到的异步社区的htm...
Python爬取豆瓣图书信息并写入Excel
豆瓣算是一个文艺者的栖息地了，也是程序员们的虫子喜欢光顾的地方。对豆瓣的书籍和电影比较感兴趣，下面是一个小爬虫抓取...
Python爬取前程无忧招聘信息并写入excel
作为一名Pythoner，相信大家对Python的就业前景或多或少会有一些关注。索性我们就写一个爬虫去获取一些我们...
OpenPyXL的使用教程（一）
最近在网上爬取奥运项目资料，并写入Excel中。在写到Excel中是用到了OpenPyXL，翻译了一部分自己用到的...
Python爬虫学习100练001
爬取菜鸟教程最新文章标题以及查看链接并写入excel文件中 -- coding:utf-8 -- 2018年3月2...
2018-03-05爬取的数据写入excel和Postgresq
之前爬取数据的时候，一般喜欢直接写入txt，然后在导入excel清洗，觉得写入excel之类的会比较麻烦。今天抽...
Scrapy爬取豆瓣图书数据并写入MySQL
项目地址 BookSpider 介绍本篇涉及的内容主要是获取分类下的所有图书数据，并写入MySQL 准备 Pyt...
使用Python爬取拉勾网职位写入Excel
Excel展示爬取的数据结构
爬取豆瓣图书TOP250的内容，并写入cvs文件
爬取豆瓣图书TOP250的内容，并写入cvs文件简化一下，20行代码可以完成，为了美观和便于理解，加了几行，控制...
Python3爬取ACM近期比赛数据并写入Excel文档
这个爬虫是今年暑假时学完小甲鱼的Python视频后写的。关于Python3的爬虫教程不多，下面只是使用了一些简单的...