美文网首页Python3.X爬虫基础、爬虫实例
python3爬虫实例(三)-bs4抓取 就爱广场舞舞队名单存储

python3爬虫实例(三)-bs4抓取 就爱广场舞舞队名单存储

作者: leelian | 来源:发表于2018-11-20 11:50 被阅读0次

    运行版本:
    Python 3.7.0
    完整代码如下:

    # -*- coding: utf-8 -*-
    """
    @author:lee
    @create_time:2018/10/25 14:41
    """
    import requests
    from bs4 import BeautifulSoup
    import  bs4
    import csv
    import time
    
    def gethtml(url,headers):
        response =  requests.get(url,headers=headers)
        try:
            if response.status_code == 200:
                print('抓取成功网页长度:',len(response.text))
                response.encoding = 'utf-8'
                return response.text
        except BaseException as e:
            print('抓取出现错误:',e)
    
    def get_list(html,list):
        soup = BeautifulSoup(html,'lxml')
        for zimu in soup.find_all(attrs={'id':'zimu_content'}):
            for a in zimu.find_all('a'):
                list.append(a.string)
    
        print(list)
    
    def write_file(data):
        m = len(data)
        with open('wudui.csv','w',newline='') as f:
            for i in data:
                writer = csv.writer(f)
                writer.writerow([i]) #更改后不再占用多个单元格
    
    if __name__ == '__main__':
        url = 'http://www.9igcw.com/wudui/'
        headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36'}
        list = []
        html = gethtml(url,headers)
        get_list(html,list)
        write_file(list)
    

    运行结果:

    图片.png

    相关文章

      网友评论

        本文标题:python3爬虫实例(三)-bs4抓取 就爱广场舞舞队名单存储

        本文链接:https://www.haomeiwen.com/subject/jhbtqqtx.html