人生若只如初见,何事西风悲画扇
sudo apt-get install python
python --version
sudo apt-get install pip
pip install scrapy
python3 ***.py
你知道,哪里是天涯嘛 --urlopen
from urllib.request import urlopen
#Retrieve HTML string from the URL
html=urlopen("http://www.pythonscraping.com/exercises/exercise1.html")
print(html.read())
美味的汤,绿色的浓汤 --BeautifulSoup
pip install beautifulsoup4
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/exercises/exercise1.html")
bsObj = BeautifulSoup(html.read(),"lxml")
print(bsObj.h1)
蓝色的玫瑰,还有红色的雪花-- error
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import sys
def getTitle(url):
try:
html = urlopen(url)
except HTTPError as e:
print(e)
return None
try:
bsObj = BeautifulSoup(html.read() ,"lxml")
title = bsObj.body.h1
except AttributeError as e:
return None
return title
title = getTitle("http://www.pythonscraping.com/exercises/exercise1.html")
if title == None:
print("Title could not be found")
else:
print(title)
网友评论