阶段小结

作者: qianxun0921 | 来源:发表于2018-12-30 14:04 被阅读0次

咨询效果评估
阶段小结
阶段小结
阶段小结
阶段小结
阶段小结
阶段小结
阶段小结
阶段小结
阶段小结

requests请求

response的常用方法：

response.text

返回解码后的字符串

respones.content

以字节形式（二进制）返回。

response.status_code

响应状态码

response.request.headers

请求的请求头

response.headers

响应头

response.encoding = 'utf-8'

可以设置编码类型

response.encoding

获取当前的编码

response.json()

内置的JSON解码器，以json形式返回,前提返回的内容确保是json格式的，不然解析出错会抛异常

基本POST请求（data参数）

(一)最基本post方法

response = requests.post(url=url, data = data)

url:post请求的目标url
data:post请求的表单数据

(二)post请求上传文件

url = 'https://httpbin.org/post'
files = {'file': open('image.png', 'rb')}
response = requests.post(url, files=files)
print(response.text)

(三)设置代理

import requests

# 根据协议类型，选择不同的代理
proxies = {
"http": "http://12.34.56.79:9527",
"https": "http://12.34.56.79:9527",
}

response = requests.get(
"http://www.baidu.com",
proxies = proxies
)
print(response.text)

（四）Cookie

import requests

response = requests.get("https://www.douban.com/")

# 7\. 返回CookieJar对象:
cookiejar = response.cookies

# 8\. 将CookieJar转为字典：
cookiedict = requests.utils.dict_from_cookiejar(
cookiejar
)

print (cookiejar)

print (cookiedict)

(五)session

import requests

# 1\. 创建session对象，可以保存Cookie值
ssion = requests.session()

# 2\. 处理 headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"
}

# 3\. 需要登录的用户名和密码
data = {
"email":"18518753265",
"password":"ljh123456"
}

# 4\. 发送附带用户名和密码的请求，并获取登录后的Cookie值，保存在ssion里
ssion.post(
"http://www.renren.com/PLogin.do",
data = data
)

# 5\. ssion包含用户登录后的Cookie值，可以直接访问那些登录后才可以访问的页面
response = ssion.get(
"http://www.renren.com/965722397/profile"
)

# 6\. 打印响应内容
print (response.text)

xpath

（一）什么是XPath？

XPath (XML Path Language) 是一门在 XML 文档中查找信息的语言，可用来在 XML 文档中对元素和属性进行遍历。

（二）什么是xml

xml：被设计的目的是为了传输数据，结构和html非常相似，是一种标记语言

（三）xpath的常见语法：

nodename 选取此节点的所有子节点

/ 从根节点开始查找

// 匹配节点，不考虑节点的位置

. 选取当前节点

.. 选取当前节点的父节点

a/@href 取标签的数据

a/text() 选取标签文本

a[@class="123"] 根据class属性寻找标签

a[@id="123"] 根据id属性寻找标签

a[@id="123"][last()] 取最后一个id为123的a标签
a[@id="123"][postion() < 2] 取id为123的前两个a标签

BeautifulSoup4

和 lxml 一样，Beautiful Soup 也是一个HTML/XML的解析器，主要的功能也是如何解析和提取 HTML/XML 数据。

四大对象种类

Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种:

Tag:通俗点讲就是 HTML 中的一个个标签

<head><title>The Dormouse's story</title></head>
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

NavigableString

既然我们已经得到了标签的内容，那么问题来了，我们要想获取标签内部的文字怎么办呢？很简单，用 .string 即可，例如

print (soup.p.string)
# The Dormouse's story

print (type(soup.p.string))
# In [13]: <class 'bs4.element.NavigableString'>

BeautifulSoup

BeautifulSoup 对象表示的是一个文档的内容。大部分时候,可以把它当作 Tag 对象，是一个特殊的 Tag，我们可以分别获取它的类型，名称，以及属性来感受一下

print type(soup.name)
# <type 'unicode'>

print soup.name
# [document]

print soup.attrs # 文档本身的属性为空
# {}

Comment

Comment 对象是一个特殊类型的 NavigableString 对象，其输出的内容不包括注释符号。

print soup.a
# <a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>

print soup.a.string
# Elsie

print type(soup.a.string)
# <class 'bs4.element.Comment'>

pyquery

pyquery语法规则类似于Jquery，可以对html文本进行解析

基本用法：

pq = PyQuery(html文档)
pq('css选择器')
items()：获取到多个标签时，使用items（）将PyQuery转换为一个生成器，然后再使用for in循环
filter('css选择器')：过滤
text（）：获取标签文本
attr('属性名')获取属性值

.html()和.text() 获取相应的 HTML 块或者文本内容，
(selector)：通过选择器来获取目标内容，
.eq(index)：根据索引号获取指定元素（index 从 0 开始）
.find()：查找嵌套元素，
.filter()：根据 class、id 筛选指定元素，
.attr()：获取、修改属性值，

多任务

实现多任务的方式

多线程
多进程
协程
多线程+多进程

并行：

同时发起，同时执行（4核，4个任务）

并发：

同时发起，单个执行

在python语言中，并不能真正意义上实现多线程，因为CPython解释器有一个全局的GIL解释器锁，来保证同一时刻只有一个线程在执行

线程：

是CPU执行的一个基本单元，占用的资源非常少，并且线程和线程之间的资源是共享的，线程是依赖于进程而存在的，
多线程一般适用于I/O密集型操作，线程的执行是无序的

进程：

是操作系统进行资源分配的基本单元，进程的执行也是无序的，
每一个进程都有自己的存储空间，进程之间的资源是不共享的，
多进程能够充分利用CPU，所以一般适用于计算密集型的操作

多线程的创建和使用

from threading import Thread
import threading
import time

data = []
def download_image(url,num):
    """
    下载图片
    :param url:
    :param num:
    :return:
    """
    global data
    time.sleep(2)
    print(url,num)
    data.append(num)

def read_data():
    global data
    for i in data:
        print(i)

if __name__ == '__main__':

    # 获取当前线程的名称threading.currentThread().name
    print('主线程开启',threading.currentThread().name)

    # 创建一个子线程
    """
    target=None:线程要执行的目标函数
    name=None:创建线程时，指定线程的名称
    args=():为目标函数，传递参数，（tuple元组类型）
    """
    thread_sub1 = Thread(
        threading=download_image,
        name='下载线程',
        args=('http://p0.so.qhimgs1.com/bdr/200_200_/t01fea94cc488d93c80.jpg',))

    thread_sub2 = Thread(
        target=read_data,
        name='读取线程',
    )
    # join阻塞：等待子线程中的任务执行完毕后，再回到主线程中继续
    thread_sub1.join()

    # 启动线程
    thread_sub1.start()

    thread_sub2.start()

    thread_sub2.join()

    # 是否开启守护进程(在开启线程之前设置)
    # daemon：False，在主线程结束的时候会检测子线程任务是否会结束，如果子线程中任务没有结束，则会让子线程正常结束任务
    # daemon：True，
    thread_sub1.daemon = True

    print('主线程结束',threading.currentThread().name)

多进程

# 案例网站：世纪佳缘
"""
1、创建任务队列
2、创建爬虫进程，执行爬虫任务
3、创建数据队列
4、创建解析线程，解析获取数据
"""
# 武汉
# http://date.jiayuan.com/eventslist_new.php?
# http://date.jiayuan.com/eventslist_new.php?page=1&city_id=4201&shop_id=33(第一页)
# http://date.jiayuan.com/eventslist_new.php?
# http://date.jiayuan.com/eventslist_new.php?page=2&city_id=4201&shop_id=33(第二页)
# http://date.jiayuan.com/eventslist_new.php?
# http://date.jiayuan.com/eventslist_new.php?page=3&city_id=4201&shop_id=33(第三页)

# 青岛
# http://date.jiayuan.com/eventslist_new.php?page=2&city_id=3702&shop_id=42
# 重庆
# http://date.jiayuan.com/eventslist_new.php?page=2&city_id=50&shop_id=5
# 上海徐家汇店
# http://date.jiayuan.com/eventslist_new.php?page=2&city_id=31&shop_id=15
from multiprocessing import Process,Queue
import requests,json,re
from lxml.html import etree
import time

def down_page_data(taskQueue,dataQueue):
    """
    执行任务的下载
    :param taskQueue:任务队列
    :param dataQueue: 数据队列
    :return:
    """
    sumTime = 0
    while not taskQueue.empty():

        if not taskQueue.empty():
            sumTime = 0
            url = taskQueue.get()# 取出任务队列中的数据
            response, cur_page = download_page_data(url)
            data_dict = {'data': response.text, 'page': cur_page}
            dataQueue.put(data_dict)

            # 获取下一页
            if cur_page != 1:
                print('====',cur_page)
                if isinstance(response.json(),list):
                    next_page = cur_page + 1
                    next_url = re.sub('page=\d+','page='+str(next_page),url)
                    taskQueue.put(next_url)
                else:
                    print('已获取到'+str(cur_page)+'页','没有数据了',response.json())
            elif cur_page == 1:
                next_page = cur_page + 1
                next_url = re.sub('page=\d+', 'page=' + str(next_page), url)
                taskQueue.put(next_url)
        else:
            # 数据队列中没有任务了
            time.sleep(0.001)
            sumTime += 1
            if sumTime > 5000:
                break

def download_page_data(url):
    """
    下载每一个分页的数据
    :param url: 分页的url地址
    :return:
    """
    # http://date.jiayuan.com/eventslist_new.php?
    # page =1&city_id=4201&shop_id=3390(武汉)
    pattern = re.compile('.*?page=(\d+)&city_id=(\d+)&shop_id=(\d+)')
    result = re.findall(pattern,url)[0]# findall 以列表的形式返回
    cur_page = result[0]
    DATE_SHOW_LOC = result[1]
    DATE_SHOW_SHOP = result[2]
    print(cur_page,DATE_SHOW_LOC,DATE_SHOW_SHOP)
    cookie = 'accessID=20181222134852405410; SESSION_HASH=094567ad0821e39554b5022d5ecb88fda6f5b952; user_access=1; PHPSESSID=574da5dabcd848c1cc0a812df9961ac9; plat=date_pc; DATE_FROM=daohang; Hm_lvt_cdce8cda34e84469b1c8015204129522=1545478311,1545479565,1545538425,1545703166; uv_flag=124.205.158.242; DATE_SHOW_LOC=%s; DATE_SHOW_SHOP=%s;'%(DATE_SHOW_LOC,DATE_SHOW_SHOP)
    print(cookie)

    req_header = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
        'Cookie':'accessID=20181222134852405410; SESSION_HASH=094567ad0821e39554b5022d5ecb88fda6f5b952; user_access=1; PHPSESSID=574da5dabcd848c1cc0a812df9961ac9; plat=date_pc; DATE_FROM=daohang; Hm_lvt_cdce8cda34e84469b1c8015204129522=1545478311,1545479565,1545538425,1545703166; uv_flag=124.205.158.242; DATE_SHOW_LOC=50; DATE_SHOW_SHOP=5; Hm_lpvt_cdce8cda34e84469b1c8015204129522=1545704238',
        'Referer':'http://date.jiayuan.com/eventslist.php'
    }
    # cookie_dict = {sub_str.split('=')[0]:sub_str.split('=')[1] for  sub_str in cookie.split('; ')}
    # print(cookie_dict)
    # cookies(cookiejar object or dict)
    response = requests.get(url,headers=req_header)
    if response.status_code == 200:
        print('第'+cur_page+'页获取成功',DATE_SHOW_SHOP,DATE_SHOW_LOC)
        return response,int(cur_page)

def parse_page_data(dataQueue):
    """
    解析数据
    :param dataQueue:
    :return:
    """
    while not dataQueue.empty():
        data = dataQueue.get()# data取的字典类型
        page = data['page']
        html = data['data']
        if page == 1:
            print('解析第一页数据，静态页面')
            html_element = etree.HTML(html)
            hot_active = html_element.xpath('//div[@class="hot_detail fn-clear"]')
            for hot_li in hot_active:
                full_detail_url = 'http://date.jiayuan.com' + hot_li.xpath('.//h2[@class="hot_title"]/a/@href')[0]
                # print(full_detail_url)
                response =download_detail_data(full_detail_url)
                parse_detail_data(response)

            more_active = html_element.xpath('//ul[@class="review_detail fn-clear t-activiUl"]/li')
            for more_li in more_active:
                full_detail_url = 'http://date.jiayuan.com' + more_li.xpath('.//a[@class="review_link"]/@href')[0]
                response = download_detail_data(full_detail_url)
                parse_detail_data(response)
        else:
            print('解析第'+str(page)+'页数据','非静态页面')

            # 使用json.loads将json字符串转换为python数据类型
            json_obj = json.loads(html)
            if isinstance(json_obj,list):# 判断数据是否为list
                for sub_dict in json_obj:
                    id = sub_dict['id']
                    full_detail_url = 'http://date.jiayuan.com/activityreviewdetail.php?id=%s' % id
                    response = download_detail_data(full_detail_url)
                    parse_detail_data(response)

def download_detail_data(url):
    """
    根据活动详情的url发起请求
    :param url:
    :return:
    """
    req_header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
        'Cookie':'accessID=20181222134852405410; SESSION_HASH=094567ad0821e39554b5022d5ecb88fda6f5b952; user_access=1; PHPSESSID=574da5dabcd848c1cc0a812df9961ac9; plat=date_pc; DATE_FROM=daohang; Hm_lvt_cdce8cda34e84469b1c8015204129522=1545478311,1545479565,1545538425,1545703166; DATE_SHOW_LOC=50; DATE_SHOW_SHOP=5; uv_flag=106.121.135.204;',
        'Referer': 'http://date.jiayuan.com/eventslist.php'
    }

    response = requests.get(url, headers=req_header)
    if response.status_code == 200:
        print('详情页面获取成功'+response.url)
        return response

def parse_detail_data(response):
    """
    解析活动详情
    :param response:
    :return:
    """
    html_element = etree.HTML(response.text)
    with open('detail.html', 'a+',encoding='utf-8') as file:
        file.write(response.text)
    # 实例化item
    item = {}
    # 活动标题
    item['title'] = ''.join(html_element.xpath('//h1[@class="detail_title"]/text()')[0])
    # 活动时间
    item['time'] = ''.join(
        html_element.xpath('//div[@class="detail_right fn-left"]/ul[@class="detail_info"]/li[1]//text()')[0])
    # 活动地址
    item['adress'] = html_element.xpath('//ul[@class="detail_info"]/li[2]/text()')[0]
    # 参加人数
    item['joinnum'] = html_element.xpath('//ul[@class="detail_info"]/li[3]/span[1]/text()')[0]
    # 预约人数
    item['yuyue'] = html_element.xpath('//ul[@class="detail_info"]/li[3]/span[2]/text()')[0]
    # 介绍
    item['intreduces'] = \
    html_element.xpath('//div[@class="detail_act fn-clear"][1]//p[@class="info_word"]/span[1]/text()')[0]
    # 提示
    item['point'] = html_element.xpath('//div[@class="detail_act fn-clear"][2]//p[@class="info_word"]/text()')[0]
    # 体验店介绍
    item['introductionStore'] = ''.join(
        html_element.xpath('//div[@class="detail_act fn-clear"][3]//p[@class="info_word"]/text()'))
    # 图片连接
    item['coverImage'] = html_element.xpath('//div[@class="detail_left fn-left"]/img/@data-original')[0]

    print(item)

    with open('shijijiayuan.json','a+',encoding='utf-8') as file:
        json_str = json.dumps(item,ensure_ascii=False)+'\n'
        file.write(json_str)

if __name__ == '__main__':
    # 创建任务队列
    taskQueue = Queue()

    # 设置起始任务
    taskQueue.put('http://date.jiayuan.com/eventslist_new.php?page=1&city_id=4201&shop_id=33')
    taskQueue.put('http://date.jiayuan.com/eventslist_new.php?page=2&city_id=31&shop_id=15')
    taskQueue.put('http://date.jiayuan.com/eventslist_new.php?page=2&city_id=3702&shop_id=42')
    taskQueue.put('http://date.jiayuan.com/eventslist_new.php?page=2&city_id=50&shop_id=5')

    # 创建数据队列
    dataQueue = Queue()

    # 创建进程爬取任务
    for i in range(0,4):
        process_crawl = Process(target=down_page_data,args=(taskQueue,dataQueue))

        process_crawl.start()

    time.sleep(10)

    # 创建解析进程
    for i in range(0,4):
        process_parse = Process(target=parse_page_data,args=(dataQueue,))

        process_parse.start()

正则表达式

"""
. : 表示匹配除了换行符之外的任意字符
\ ：转义字符
{a-z} :匹配a到z之间的任意字符
\d : 匹配数字（等价于[0-9]）
\D :匹配非数字 [^\d]
\s :匹配空白字符（空格,\n,\t...）
\D :匹配非空白字符
\w ：匹配单词字符 [A-Za-z0-9]
\W : 匹配非单词字符

^ : 匹配以……开头
$ : 匹配以……结尾

() ：表示分组
/ : 表示或的意思

多字符匹配
* ： 匹配*号前面任意字符
+ : 匹配+前面的字符至少一次
? ：匹配？前面的字符0次或1次
{m} : 匹配{m}前面的字符m次
{m,n} ：匹配{m,n}前面的字符m~n次

非贪婪匹配
*?
+?
??
{m,n}?
"""
import re

#把正则表达式构建为一个pattern对象
sub_str = 'abcdefabcd'
pattern = re.compile('b')
#从字符串的起始位置开始匹配，开头就必须符合正则规则，
# 如果匹配到结果了就返回结果，
# 如果匹配不到返回None,单次匹配
result = re.match(pattern,sub_str)
print(type(result))
if result:
    print(result.group())

# 在整个字符串中进行匹配，同样是单次匹配，匹配到结果立即返回
# 匹配不到，就返回None
result1 = re.search(pattern,sub_str)
print(result1.group())

# 在整个字符串中进行匹配，匹配出所有符合正则规则的结果
# 以列表的形式返回
result2 = re.findall(pattern,sub_str)
print(result2)

# 在整个字符串中进行匹配，匹配出所有符合正则规则的结果
# 以迭代器的形式返回
result3 = re.finditer(pattern,sub_str)
print(type(result3))
for note in result3:
    print(type(note))
    print(note.group())

#替换
url = 'http://www.baidu.com/s?kw=aaa&pn=20'
pattern = re.compile('pn=\d+')
# re.sub()
#sub方法中的参数
#pattern \正则规则
#repl， \要替换的字符串
#string, 原始字符串
result4 = re.sub(pattern,'pn=30',url)
print(result4)

#分割re.split()
pattern = re.compile('[=:&]')
result5 = re.split(pattern,url)
print(result5)


sub_html = """
<div class="threadlist_title pull_left j_th_tit ">
    
    <a rel="noreferrer" href="/p/5980575445" title="困啊，早起赶飞机，傻一天" target="_blank" class="j_th_tit ">困啊，早起赶飞机，傻一天</a>
</div>
"""
#re.S让点可以匹配包含换行符的任意字符
pattern = re.compile(

    '<div.*?class="threadlist_title pull_left j_th_tit">'+
    ''
)

selenium的使用

#　selenium:是一个web的自动化测试工具,可以直接运行在浏览器上,
# 但是并不自带浏览器,需要有浏览器驱动,selenium可以根据我们的代码指令
# 让浏览器自动加载页面,这时得到的页面源码是经过浏览器渲染之后的,
# 然后我们就可以在页面源码中寻找节点(动态加载的网页,模拟登录)

#pip3 install selenium
from selenium import webdriver
import time

#加载页面
# driver = webdriver.Firefox(
#     executable_path='/home/ljh/桌面/driver/geckodriver'
# )
# #使用get方法打开页面
# driver.get('https://www.baidu.com/')

#加载页面(PhantomJS,无头浏览器)
#warnings.warn('Selenium support for PhantomJS
# has been deprecated, please use headless '
#目前推荐使用谷歌的无头浏览器
# driver = webdriver.PhantomJS(
#     executable_path='/home/ljh/桌面/driver/phantomjs'
# )
# driver.get('https://www.baidu.com/')
#
# driver.save_screenshot('baidu.png')

# 加载页面(使用谷歌的浏览器驱动)
# 设置为无头浏览器
opt = webdriver.ChromeOptions()
opt.set_headless()
driver = webdriver.Chrome(
    options=opt,
    executable_path='E:\\Chrome\\chromdriver2.33\\chromedriver_win32\\chromedriver.exe'
)
#设置页面的加载时间
driver.set_page_load_timeout(10)

#导入容错的模块
from selenium.common import exceptions
try:
    driver.get('https://www.baidu.com/')
except exceptions.TimeoutException as err:
    print(err,'请求超时')

#可以获得信息
# 获取页面源码(经过浏览器渲染之后的)
page_html = driver.page_source
with open('baidu.html','w',encoding='utf-8') as file:
    file.write(page_html)
#获取cookies信息
"""
[
{'domain': 
'.baidu.com', 
'httpOnly': False, 
'path': '/', 
'secure': False, 
'value': '1431_21080_28206_28131_27750_28139_27509', 
'name': 'H_PS_PSSID'}, 
{'domain': '.baidu.com', 'httpOnly': False, 'path': '/', 'expiry': 3693275324.184597, 'secure': False, 'value': '8C1C72599F01E693A201BA4B33C6DFE0', 'name': 'BIDUPSID'}, {'domain': '.baidu.com', 'httpOnly': False, 'path': '/', 'secure': False, 'value': '0', 'name': 'delPer'}, {'domain': '.baidu.com', 'httpOnly': False, 'path': '/', 'expiry': 3693275324.184649, 'secure': False, 'value': '1545791676', 'name': 'PSTM'}, {'domain': 'www.baidu.com', 'httpOnly': False, 'path': '/', 'expiry': 1546655678, 'secure': False, 'value': '123353', 'name': 'BD_UPN'}, {'domain': 'www.baidu.com', 'httpOnly': False, 'path': '/', 'secure': False, 'value': '0', 'name': 'BD_HOME'}, {'domain': '.baidu.com', 'httpOnly': False, 'path': '/', 'expiry': 3693275324.18448, 'secure': False, 'value': '8C1C72599F01E693A201BA4B33C6DFE0:FG=1', 'name': 'BAIDUID'}]

"""
#获取所有的cookies值
cookies = driver.get_cookies()
#获取某一个cookies值
driver.get_cookie('BD_UPN')
cookies_dict = {cookie['name']:cookie['value'] for cookie in cookies}
print(cookies)
print(cookies_dict)

# # 删除cookie
# driver.delete_cookie('BD_UPN')
# # 删除所有的cookies
# driver.delete_all_cookies()
# 添加cookies
# cookie_dict（字典，存放的cookies信息）
# driver.add_cookie()

# 获取当前加载的页面url地址
cur_url = driver.current_url
print(cur_url)

# 获取当前使用的浏览器的名称
name = driver.name
print(name)

# 定位和操作节点（标签）
"""
driver.find_element_by_xpath()# 根据xpath路径定位标签（到找单个）
driver.find_elements_by_xpath()# 根据xpath路径定位标签（找所有）
driver.find_element_by_css_selector()# 根据css选择器定位标签
driver.find_element_by_partial_link_text()# 根据标签文本（完整）定位
driver.find_element_by_id()# 根据id寻找节点
driver.find_element_by_class_name()# 根据class属性寻找节点
"""
# 找到节点，并输入内容
driver.find_element_by_id('kw').send_keys('隔壁老王')
# 清空数据框
driver.find_element_by_id('kw').clear()
time.sleep(2)
driver.find_element_by_id('kw').send_keys('隔壁老王')
# 找到按钮，模拟点解
driver.find_element_by_id('su').click()

# 保存屏幕的截图
driver.save_screenshot('baiduyixia.png')


# # 前进和后退
# time.sleep(2)
# # 后退
# driver.back()
#
# time.sleep(2)
# # 前进
# driver.forward()


# 设置页面等待：
#   原因是selenium加载页面和浏览器一样都需要时间，特别是动态页面，
#   如果在页面加载出来之前，寻找节点会报异常，
#   所以这是需要设置页面等待

time.sleep(3)

# 设置隐式等待：是指定特定的时间，如果没有出现我们的节点，隐式等待将会等待一段时间继续查找
driver.implicitly_wait(10)

# 设置显式等待：
#   指定一个最长等待时间，
#   直到某一条件成立继续执行，
#   如果在指定时间内没有满足条件（没有找到节点），
#   这时就会抛出异常
from selenium.webdriver.common.by import By # 导入By，根据某个条件查找节点
from selenium.webdriver.support.ui import WebDriverWait # WebDriverWait设置等待时间
from selenium.webdriver.support import expected_conditions #expected_conditions 设置等待条件

# driver,timeout
a_element = WebDriverWait(driver,10).until(
    expected_conditions.presence_of_element_located((By.CLASS_NAME,'n'))
)
print(a_element.text)

# 获取节点的文本和属性
# .get_attribute('href') 获取标签的属性值
try:
    href = driver.find_element_by_xpath('//h3[@class="t"]/a').get_attribute('href')
    title = driver.find_element_by_xpath('//h3[@class="t"]/a').text
    print(href,title)
except exceptions.NoSuchElementException as err:
    print('没有找到节点')


# 隐藏所有图片
imgs = driver.find_elements_by_xpath('//img')
for img in imgs:
    driver.execute_script('$(arguments[0]).fadeOut()',img)

#关闭操作
#关闭当前所在的窗口
driver.close()
#退出浏览器
driver.quit()