python 爬取搜狐新闻

作者: 还有半个小时 | 来源:发表于2018-05-19 08:51 被阅读0次

python 爬取搜狐新闻
各类链接
python爬虫
Python 爬虫（爬取腾讯新闻）
行业垂直搜索引擎的构建
爬取Python教程博客并转成PDF
Python3 基于asyncio的新闻爬虫思路
Python学习
0.Python 爬虫之Scrapy入门实践指南（Scrapy基
python爬取手机app

python2.7,通过urllib2和BeautifulSoup爬取新闻

文中还包括一些BeautifulSoup的内置函数

# -*- coding:utf-8 -*-

import urllib2

from bs4import BeautifulSoup

import re

# 爬取搜狐新闻

url='http://news.sohu.com/'

header_={'User-Agent':'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0'}

# 将url包装成一个请求

res=urllib2.Request(url,headers=header_)

try:

# 将url请求返回給resp

resp=urllib2.urlopen(res)

except urllib2.URLError,e:

print e

#读取resp

info=resp.read()

# print(info)

# 将都取出来的数据转化为BeautifulSoup类型

soup=BeautifulSoup(info,'lxml')

# print type(soup)

#这里的格式只能获取这些标签的第一个

# print soup.title

# print soup.head

# print type(soup.script)

# print soup.a

# get方法用于得到标签下的属性值

# print soup.a

# print soup.a.get('href')#得到第一个a标签下的href属性

# string

# 得到标签下的文本内容，只有在此标签下没有子标签，或者只有一个子标签的情况下才能返回其中的内容，否则返回的是None

# print soup.a.string

# print soup.script.string

# print soup.script

# get_text() 可以获得一个标签中的所有文本内容，包括子孙节点的内容，这是最常用的方法

# print soup.select('a')[2]

# find_all和select都是查找，find_all是用于搜索节点中所有符合过滤条件的节点，

# for title in soup.find_all('a'):

# print title.get_text()

# select 方法返回的结果都是列表形式，可以遍历形式输出

# 通过属性查找，select查找class属性通过‘.focus-news’，id通过'#'标识

#实现代码

for titlein soup.select('.focus-news'):

# print title

for iin range(len(title.select('a'))):

f=title.select('a')[i].get_text()

# vals = f.strip("\n").split("\t")

print f.strip(),

print title.select('a')[0]['href']

# 通过属性查找，这里的class必须用class_传入参数，因为class是python中的关键词

# for title in soup.find_all(class_='focus-news'):

# print title.get_text()

网友评论

本文标题：python 爬取搜狐新闻

本文链接：https://www.haomeiwen.com/subject/iclodftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python 爬取搜狐新闻

python2.7,通过urllib2和BeautifulSoup爬取新闻

文中还包括一些BeautifulSoup的内置函数

相关文章

python 爬取搜狐新闻

各类链接

python爬虫

Python 爬虫（爬取腾讯新闻）

行业垂直搜索引擎的构建

爬取Python教程博客并转成PDF

Python3 基于asyncio的新闻爬虫思路

Python学习

0.Python 爬虫之Scrapy入门实践指南（Scrapy基

python爬取手机app

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读