准备工作
1、这里默认用户配置了Python的开发环境,pip页安装成功
2、用pip安装BeautifulSoup
开始把你的想法变用Python来实现
Python、SQLite3 创建数据库
import sqlite3 as sql
connect = sql.connect('tnl.db')
connect.execute('CREATE TABLE IF NOT EXISTS info(_id INTEGER PRIMARY KEY,name TEXT NOT NULL,via_url TEXT NOT NULL UNIQUE,zone_url TEXT NOT NULL UNIQUE);')
for i in range(10, 1000000):
show_name_by_page(i, connect)
connect.close()
上面代码主要逻辑是创建一个保存美女信息的数据库和数据表
让我们来实现show_name_by_page函数吧
我们用request
https://mm.taobao.com/json/request_top_list.htm?page=1获取信息用BeautifulSoup解析出美女的昵称、头像地址、空间地址
def show_name_by_page(page, connect: object):
response = net.get('https://mm.taobao.com/json/request_top_list.htm?page=%d' % page)
soup = BeautifulSoup(response.text, 'html.parser')
personal_info_list = soup.find_all(name='div', attrs={'class': 'personal-info'})
for personal_info in personal_info_list:
lady_avatar = personal_info.find(name='div', attrs={'class': 'pic s60'})
lady_name = personal_info.find(name='a', attrs={'class': 'lady-name'})
print('姓名:%s' % lady_name.text)
print('头像地址:http:%s' % lady_avatar.find('img').get('src'))
print('空间地址:http:%s' % lady_name.get('href'))
try:
connect.execute(('INSERT INTO info (name,via_url,zone_url) VALUES ("%s","http:%s","http:%s")'
% (lady_name.text, lady_avatar.find('img').get('src'), lady_name.get('href'))))
connect.commit()
except sql.OperationalError as e:
print(e)
except sql.IntegrityError as e:
print(e)
运行结果
姓名:KingKing
头像地址:http://gtd.alicdn.com/sns_logo/i2/T1kCZiFlVXXXb1upjX.jpg_60x60.jpg
空间地址:http://mm.taobao.com/self/model_card.htm?user_id=14306938
姓名:冷玩妹
头像地址:http://gtd.alicdn.com/sns_logo/i3/T1YIKwFqhgXXb1upjX.jpg_60x60.jpg
空间地址:http://mm.taobao.com/self/model_card.htm?user_id=23104539
姓名:张瑞
头像地址:http://gtd.alicdn.com/sns_logo/i2/T1HmqKFrNdXXb1upjX.jpg_60x60.jpg
空间地址:http://mm.taobao.com/self/model_card.htm?user_id=913423950
姓名:程汐儿
头像地址:http://gtd.alicdn.com/sns_logo/i4/TB1SRe5GXXXXXcfXXXXSutbFXXX.jpg_60x60.jpg
空间地址:http://mm.taobao.com/self/model_card.htm?user_id=299135017
姓名:佟小七
头像地址:http://gtd.alicdn.com/sns_logo/i6/TB1SSNcHpXXXXcAXpXXSutbFXXX.jpg_60x60.jpg
空间地址:http://mm.taobao.com/self/model_card.htm?user_id=654742757
姓名:陈孝霙
头像地址:http://gtd.alicdn.com/sns_logo/i6/T1RiWUXzRhXXb1upjX.jpg_60x60.jpg
空间地址:http://mm.taobao.com/self/model_card.htm?user_id=280219592
姓名:魏媛
头像地址:http://gtd.alicdn.com/sns_logo/i6/T1NNwiFopXXXb1upjX.jpg_60x60.jpg
空间地址:http://mm.taobao.com/self/model_card.htm?user_id=511955884
姓名:潘琪琪
头像地址:http://gtd.alicdn.com/sns_logo/i4/T1d8GgFydhXXb1upjX.jpg_60x60.jpg
空间地址:http://mm.taobao.com/self/model_card.htm?user_id=736634638
完整代码
from bs4 import BeautifulSoup
import requests as net
import sqlite3 as sql
def show_name_by_page(page, connect: object):
response = net.get('https://mm.taobao.com/json/request_top_list.htm?page=%d' % page)
soup = BeautifulSoup(response.text, 'html.parser')
personal_info_list = soup.find_all(name='div', attrs={'class': 'personal-info'})
for personal_info in personal_info_list:
lady_avatar = personal_info.find(name='div', attrs={'class': 'pic s60'})
lady_name = personal_info.find(name='a', attrs={'class': 'lady-name'})
print('姓名:%s' % lady_name.text)
print('头像地址:http:%s' % lady_avatar.find('img').get('src'))
print('空间地址:http:%s' % lady_name.get('href'))
try:
connect.execute(('INSERT INTO info (name,via_url,zone_url) VALUES ("%s","http:%s","http:%s")'
% (lady_name.text, lady_avatar.find('img').get('src'), lady_name.get('href'))))
connect.commit()
except sql.OperationalError as e:
print(e)
except sql.IntegrityError as e:
print(e)
connect = sql.connect('tnl.db')
connect.execute('CREATE TABLE IF NOT EXISTS info(_id INTEGER PRIMARY KEY,name TEXT NOT NULL'
',via_url TEXT NOT NULL UNIQUE,zone_url TEXT NOT NULL UNIQUE);')
for i in range(10, 1000000):
show_name_by_page(i, connect)
connect.close()
网友评论