gunicorn不停服重启更新服务
每次项目更新最头疼的就是重启服务的那一段空白期,如果没有负载均衡或者负载均衡没有做好,那么在重启服务的这段时间中都会造成短暂的“宕机”,给用户的体验很不好,gunicorn使用prefork master-worker模型,可以管理自己fork的进程,这就可以让你动态的添加减少worker进程。这次就直接讲gunicorn如何不停机更新服务,这里是官方文档 https://docs.gunicorn.org/en/stable/signals.html
信号
gunicorn是通过信号处理来达到对进程管理的目的,先看一下他接收的几种信号
-
QUIT
: 快速关闭 -
TERM
: 优雅的关闭。等待worker完成当前请求直到达到超时时间 -
HUP
: 重新加载配置,使用新的配置启动新的工作进程,并优雅地关闭较老的工作进程。 -
TTIN
: 增加一个进程 -
TTOU
: 减少一个进程 -
USR1
: 重新打开日志文件 -
USR2
: 在线升级gunicorn -
WINCH
: 优雅地关闭守护进程(后台运行的进程)
上面的信号这次只说三个HUP,USR2,TERM
HUP
文档中的意思使用HUP可以达到重启的效果,测试的日志是这样的
[2021-01-21 17:25:14 +0800] [20388] [INFO] Handling signal: hup
[2021-01-21 17:25:14 +0800] [20388] [INFO] Hang up: Master
[2021-01-21 17:25:14 +0800] [29249] [INFO] Booting worker with pid: 29249
[2021-01-21 17:25:14 +0800] [29248] [INFO] Booting worker with pid: 29248
[2021-01-21 17:25:14 +0800] [29250] [INFO] Booting worker with pid: 29250
[2021-01-21 17:25:14 +0800] [28643] [INFO] Shutting down
[2021-01-21 17:25:14 +0800] [28643] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2021-01-21 17:25:14 +0800] [28640] [INFO] Shutting down
[2021-01-21 17:25:14 +0800] [28640] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2021-01-21 17:25:14 +0800] [28642] [INFO] Shutting down
[2021-01-21 17:25:14 +0800] [28642] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2021-01-21 17:25:14 +0800] [28643] [INFO] Finished server process [28643]
[2021-01-21 17:25:14 +0800] [28643] [INFO] Worker exiting (pid: 28643)
[2021-01-21 17:25:14 +0800] [28640] [INFO] Finished server process [28640]
[2021-01-21 17:25:14 +0800] [28640] [INFO] Worker exiting (pid: 28640)
[2021-01-21 17:25:14 +0800] [28642] [INFO] Finished server process [28642]
[2021-01-21 17:25:14 +0800] [28642] [INFO] Worker exiting (pid: 28642)
[2021-01-21 17:25:15 +0800] [29248] [INFO] Started server process [29248]
[2021-01-21 17:25:15 +0800] [29248] [INFO] Waiting for application startup.
[2021-01-21 17:25:15 +0800] [29248] [INFO] ASGI 'lifespan' protocol appears unsupported.
[2021-01-21 17:25:15 +0800] [29248] [INFO] Application startup complete.
[2021-01-21 17:25:15 +0800] [29249] [INFO] Started server process [29249]
[2021-01-21 17:25:15 +0800] [29249] [INFO] Waiting for application startup.
[2021-01-21 17:25:15 +0800] [29249] [INFO] ASGI 'lifespan' protocol appears unsupported.
[2021-01-21 17:25:15 +0800] [29249] [INFO] Application startup complete.</pre>
通过日志可以看到他是先停止了旧进程然后再启动了新的进程,但是从gunicorn源码中看是先启动了进程然后通过进程数和配置的进程数对比来kill掉老的进程:
# 简化后的处理HUP方法
# spawn new workers
for _ in range(self.cfg.workers):
self.spawn_worker() # 这里启动了进程
# manage workers
self.manage_workers() # 这里根据进程启动的时候给的一个age值来kill掉老的进程</pre>
# manage_workers方法
def manage_workers(self):
"""\
Maintain the number of workers by spawning or killing
as required.
"""
if len(self.WORKERS) < self.num_workers:
self.spawn_workers()
workers = self.WORKERS.items()
workers = sorted(workers, key=lambda w: w[1].age)
while len(workers) > self.num_workers:
(pid, _) = workers.pop(0)
self.kill_worker(pid, signal.SIGTERM)
active_worker_count = len(workers)
if self._last_logged_active_worker_count != active_worker_count:
self._last_logged_active_worker_count = active_worker_count
self.log.debug("{0} workers".format(active_worker_count),
extra={"metric": "gunicorn.workers",
"value": active_worker_count,
"mtype": "gauge"})
测试了一下也确实会有问题(我用的django3.1服务用的uvicorn,因为uvicorn没有进程管理的功能所以用gunicorn来启动uvicorn,uvicorn官方文档也是这么建议的),在重启的瞬间发起请求会有异常抛出
USR2
It executes a new binary whose PID file is postfixed with .2
(e.g. /var/run/gunicorn.pid.2
), which in turn starts a new master process and new worker processes
大概的意思发送USR2信号后会启动新的主进程和工作进程也就是新的master进程和worker进程
先看一下当前的进程(为了方便观看我删除了ps命令结果的最后一列信息):
[root@Luckybamboo report-web]# ps -ef | grep uvicorn.workers
root 9146 1 0 17:30 pts/7 00:00:00 gunicorn
root 9168 9146 1 17:30 pts/7 00:00:00 gunicorn
root 9169 9146 1 17:30 pts/7 00:00:00 gunicorn
root 9170 9146 1 17:30 pts/7 00:00:00 gunicorn
可以看到当前的master进程为9146,工作进程分别为9168,9169,9170
发送信号后的变化为:
[root@Luckybamboo report-web]# kill -USR2 9146
[root@Luckybamboo report-web]# ps -ef | grep uvicorn.workers
root 9146 1 0 17:30 pts/7 00:00:00 gunicorn
root 9168 9146 1 17:30 pts/7 00:00:00 gunicorn
root 9169 9146 1 17:30 pts/7 00:00:00 gunicorn
root 9170 9146 1 17:30 pts/7 00:00:00 gunicorn
root 11562 9146 9 17:32 pts/7 00:00:00 gunicorn
root 11564 11562 30 17:32 pts/7 00:00:00 gunicorn
root 11565 11562 64 17:32 pts/7 00:00:00 gunicorn
root 11566 11562 60 17:32 pts/7 00:00:00 gunicorn
这时候可以看到启动了新的master进程11562,新的工作进程11564,11565,11566
这个时候可以通过TERM信号来停止老的进程9146只保留新的进程就可以了
[root@Luckybamboo report-web]# kill -TERM 9146
[root@Luckybamboo report-web]# ps -ef | grep uvicorn.workers
root 11562 1 0 17:32 pts/7 00:00:00 gunicorn
root 11564 11562 2 17:32 pts/7 00:00:00 gunicorn
root 11565 11562 2 17:32 pts/7 00:00:00 gunicorn
root 11566 11562 2 17:32 pts/7 00:00:00 gunicorn
可以看到这时候就只有新的进程了。我期望的是在新的进程启动之后旧的进程将不再处理新的请求,测试了一下确实是这样,但是因为测试的比较少而且源码中没有看到这个逻辑,而且这个信号是用来在线升级gunicorn的,所以最好还是把旧的进程当成正常的进程来看待处理,文档中也说如果不用新的进程可以kill掉新的进程,也可以接着对旧的进程进行各种信号处理,希望有人能补充我这种期望该怎么操作
网友评论