多线程与多进程

作者: 田小田txt | 来源:发表于2018-12-30 17:10 被阅读0次

一、多线程：

对于io操作来说，多线程和多进程性能差别不大（线程调度更轻量）；
可以通过Thread类实例化或集成Thread来实现多线程。

模拟多线程爬虫（并发爬取列表页和详情页

import time
import threading

# 爬取详情页
def get_detail_html(url):
    print("get detail html started")
    time.sleep(2)
    print("get detail html end")

 # 从列表页爬取详情页url
 def get_detail_url(url):
      print("get detail url started")
      time.sleep(4)
      print("get detail url end")

  class GetDetailHtml(threading.Thread):
     def __init__(self, name):
          super().__init__(name=name)

      def run(self):
          print("get detail html started")
          time.sleep(2)
          print("get detail html end")

  class GetDetailUrl(threading.Thread):
      def __init__(self, name):
          super().__init__(name=name)

      def run(self):
          print("get detail url started")
          time.sleep(4)
          print("get detail url end")

  if  __name__ == "__main__":
      thread1 = GetDetailHtml("get_detail_html")
      thread2 = GetDetailUrl("get_detail_url")
      start_time = time.time()
      thread1.start()
      thread2.start()

      thread1.join()    # 等待完成后再继续执行下面的
      thread2.join()

      # 当主线程退出的时候，子线程才会杀死
      print ("last time: {}".format(time.time() - start_time))

二、线程池：

使用线程池实现线程重用、状态与返回值管理（使用done方法当一个线程完成的时候主线程能立即知道）
futures包中多线程与多进程接口一致，能减少开发难度
task的返回容器：Future对象（当时未完成，但完成后可以通过对象获取结果）。

  from concurrent.futures import ThreadPoolExecutor
  import time

  def get_html(times):
      time.sleep(times)
      print("get page {} success".format(times))
      return times

  executor = ThreadPoolExecutor(max_workers=2)
  # 通过submit函数提交执行的函数到线程池中, 立即返回
  task1 = executor.submit(get_html, (3))
  task2 = executor.submit(get_html, (2))
  task1.done()            # 获取task1执行状态
  task1.result()          # 获取task1执行结果
  task2.cancel()          # 取消task2执行

三、多进程：

对于在Python中存在GIL，消耗CPU的操作无法利用多核优势，使用多线程无法实现并行操作，此时应使用多进程；
进程切换代价比较高，对于频繁IO操作使用多线程更好（开销更小、更稳定）；
进程间通信:
(1) 注意多线程和多进程通信的包不一样，不能重用；
(2) 多线程中共享全局变量的方法不能用于多进程（数据全部复制到子进程）；
(3) 线程池：multiprocessing中的Queue不能用于进程池，而应使用Manager.Queue；
(4) 管道性能比Queue高，但只适用于两个进程之间的通信；
(5) Python内置有很多内存共享的数据结构，在multiprocessing.Manager，需要注意数据同步。

  import time
  from multiprocessing import Process, Queue, Pool

  def producer(queue):
      queue.put("a")
      time.sleep(2)

  def consumer(queue):
      time.sleep(2)
      data = queue.get()
print(data)

  queue = Queue(10)
  my_producer = Process(target=producer, args=(queue,))
  my_consumer = Process(target=consumer, args=(queue,))
  my_producer.start()
  my_consumer.start()
  my_producer.join()

  my_consumer.join()

四、进程池：

  from multiprocessing import Process, Manager

  def producer(queue):
      queue.put("a")
      time.sleep(2)

  def consumer(queue):
      time.sleep(2)
      data = queue.get()
      print(data)

    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()

  from queue import Queue                 # 多线程
  from multiprocessing import Queue       # 多进程
  from multiprocessing import Manager     # 进程池

网友评论

本文标题：多线程与多进程

本文链接：https://www.haomeiwen.com/subject/kdivlqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！