美文网首页Python Web
Gunicorn - 如何绑定worker到不同的GPU卡

Gunicorn - 如何绑定worker到不同的GPU卡

作者: 红薯爱帅 | 来源:发表于2023-02-24 12:29 被阅读0次

    1. 概述

    在部署推理服务时,对于多卡服务器的使用,需要gunicorn能够将不同的worker分配到不同的gpu卡上。
    另外,收到多个request时,需要平均分配到不同的worker,这样可以最大限度利用硬件资源。

    这里采用的worker-class是sync,主要为了解决request平均分配的问题。采用gevent测试结果不理想。
    如何将worker分配到不同的gpu卡上呢,需要通过server hooks来解决。

    Dockerfile的默认配置:

    ENV GUNICORN_CMD_ARGS="-b 0.0.0.0:9090 -c gunicorn_conf.py -w 4 --backlog 4 --timeout 600"
    ENTRYPOINT ["gunicorn", "wsgi:app"]
    

    参考

    相关Issue

    2. 源码奉上

    In pre_fork and child_exit, maintain the worker configuration data, such as the mapping of worker and cuda index.
    In post_fork hook function, give os.environ a new key/value, and the worker could get the value from os.environ.
    I have tried this method above with sync worker type, and it works. Notice that, the worker is forked from master, and the mapping should be maintained in master process.
    If have any other questions, the source code may be help you.

    • gunicorn_conf.py
    import os
    
    USE_GPU = int(os.environ.get("USE_GPU", 1))
    CUDA_NUMBER = int(os.environ.get("CUDA_NUMBER", 1))
    WORKER_ENV = {}
    
    if USE_GPU and CUDA_NUMBER > 1:
        def get_worker_env():
            counter = {f'cuda:{i}': 0 for i in range(CUDA_NUMBER)}
            for k in WORKER_ENV.keys():
                assert WORKER_ENV[k] in counter
                counter[WORKER_ENV[k]] += 1
            
            min_count, min_cuda = 1024, 'cuda:0'
            for cuda, count in counter.items():
                if count < min_count:
                    min_cuda = cuda
                    min_count = count
            return min_cuda
    
        def get_worker_id(worker):
            return f'WORKER-{worker.age}'
    
        # running in master process
        def pre_fork(server, worker):
            _id = get_worker_id(worker)
            WORKER_ENV[_id] = get_worker_env()
            server.log.info(f'set master env {_id}: {WORKER_ENV[_id]}')
    
        # running in worker process, and environment is in process scope, not os scope
        def post_fork(server, worker):
            _id = get_worker_id(worker)
            os.environ['CUDA_INDEX'] = WORKER_ENV[_id]
            server.log.info(f'set worker (age: {worker.age}, pid {worker.pid}) env CUDA_INDEX: {WORKER_ENV[_id]}')
    
        # running in master process
        def child_exit(server, worker):
            _id = get_worker_id(worker)
            server.log.info(f'remove worker env {_id}: {WORKER_ENV[_id]}')
            del WORKER_ENV[_id]
    

    3. 推理代码

    在推理代码中,获取到CUDA_INDEX环境变量,然后调用nn.ModuleTensorto方法,将内存数据copy到显存中,例如

    data: Tensor
    data.to('cuda:0')
    

    相关文章

      网友评论

        本文标题:Gunicorn - 如何绑定worker到不同的GPU卡

        本文链接:https://www.haomeiwen.com/subject/ytpxwrtx.html