RLHF(Reinforcement Learning from Human Feedback,基于人类反馈的强化学习),一种机器学习方法,它使智能系统能够从环境中学习并最大化特定目标。在RLHF中,通过对同一输入的多个生成结果进行人工排序,获得包含人类偏好反馈的标注数据,从而训练出一个奖励模型(Reward Model)。在强化学习的过程中,奖励模型将对大语言模型的多个生成结果的排序进行判定。最终,强化学习通过更新大模型的参数,使得输出结果符合奖励模型的判定要求。这种方法减轻了传统强化学习中需要大量试错的问题,也降低了完全依赖于人工对所有大模型生成结果进行排序调整反馈的成本,使得智能系统更加高效、快速地学习任务。
开发流程
准备奖励模型训练数据
确保已准备好奖励模型的训练数据,或前往数据集管理标注数据。
训练奖励模型
使用准备好的训练数据及平台提供的预训练奖励模型,训练自己的奖励模型。
准备强化学习训练数据
确保已准备好强化学习的训练数据,或前往数据集管理标注数据。
强化学习训练
使用训练数据及强化学习机制,进一步训练大模型,确保模型效果贴近业务场景。
发布模型
完成强化学习训练后,可以发布训练后的模型到模型仓库。
先训练一个奖励模型:
基于奖励模型进行强化训练:
最后将强化训练后的模型发布,即可API调用。
如果下载该模型,点击导出后,在任务列表查看。
下载日志输出:
2023-11-21 13:37:56.949 - Job [13999] Info : [job init done]
12023-11-21 13:37:57.050 - Job [13999] Info : [start to export.]
22023-11-21 13:37:57.172 - Job [13999] Info : [init temp dir]
32023-11-21 13:37:57.356 - Job [13999] Info : [packing model file begin...]
42023-11-21 13:37:57.489 - Job [13999] Info : [ready to download the list of files below]
5┌────────────────────────────────┬──────────────┐
│ FILENAME │ SIZE │
├────────────────────────────────┼──────────────┤
│ 1/config.json │ 0.0008 MB │
│ 1/generation_config.json │ 0.0001 MB │
│ 1/model_desc.json │ 0.0001 MB │
│ 1/prompt_template.json │ 0.0003 MB │
│ 1/pytorch_model-00001.bin │ 1960.0009 MB │
│ 1/pytorch_model-00002.bin │ 896.2922 MB │
│ 1/pytorch_model-00003.bin │ 896.2428 MB │
│ 1/pytorch_model-00004.bin │ 992.2597 MB │
│ 1/pytorch_model-00005.bin │ 928.2675 MB │
│ 1/pytorch_model-00006.bin │ 992.2597 MB │
│ 1/pytorch_model-00007.bin │ 928.2675 MB │
│ 1/pytorch_model-00008.bin │ 992.2597 MB │
│ 1/pytorch_model-00009.bin │ 928.2675 MB │
│ 1/pytorch_model-00010.bin │ 992.2597 MB │
│ 1/pytorch_model-00011.bin │ 928.2675 MB │
│ 1/pytorch_model-00012.bin │ 992.2597 MB │
│ 1/pytorch_model-00013.bin │ 928.2675 MB │
│ 1/pytorch_model-00014.bin │ 128.0251 MB │
│ 1/pytorch_model-00015.bin │ 1960.0008 MB │
│ 1/pytorch_model.bin.index.json │ 0.0273 MB │
│ 1/special_tokens_map.json │ 0.0001 MB │
│ 1/tokenizer.json │ 13.8290 MB │
│ 1/tokenizer_config.json │ 0.0011 MB │
└────────────────────────────────┴──────────────┘
62023-11-21 13:37:57.759 - Job [13999] Info : [download model file to temp dir begin...]
72023-11-21 13:37:57.881 - Job [13999] Info : [download progress:1 / 23 [-->__________________________________________________________] 4.35% ? p/s]
82023-11-21 13:37:58.094 - Job [13999] Info : [download progress:2 / 23 [----->_______________________________________________________] 8.70% ? p/s]
92023-11-21 13:37:58.220 - Job [13999] Info : [download progress:3 / 23 [------->____________________________________________________] 13.04% ? p/s]
102023-11-21 13:37:58.424 - Job [13999] Info : [download progress:4 / 23 [---------->_________________________________________________] 17.39% 6 p/s]
112023-11-21 13:38:15.024 - Job [13999] Info : [download progress:5 / 23 [------------->______________________________________________] 21.74% 1 p/s]
122023-11-21 13:38:22.704 - Job [13999] Info : [download progress:6 / 23 [--------------->____________________________________________] 26.09% 0 p/s]
132023-11-21 13:38:30.400 - Job [13999] Info : [download progress:7 / 23 [------------------>_________________________________________] 30.43% 0 p/s]
142023-11-21 13:38:38.827 - Job [13999] Info : [download progress:8 / 23 [-------------------->_______________________________________] 34.78% 0 p/s]
152023-11-21 13:38:46.747 - Job [13999] Info : [download progress:9 / 23 [----------------------->____________________________________] 39.13% 0 p/s]
162023-11-21 13:38:55.305 - Job [13999] Info : [download progress:10 / 23 [------------------------->_________________________________] 43.48% 0 p/s]
172023-11-21 13:39:03.306 - Job [13999] Info : [download progress:11 / 23 [---------------------------->______________________________] 47.83% 0 p/s]
182023-11-21 13:39:11.745 - Job [13999] Info : [download progress:12 / 23 [------------------------------>____________________________] 52.17% 0 p/s]
192023-11-21 13:39:19.646 - Job [13999] Info : [download progress:13 / 23 [--------------------------------->_________________________] 56.52% 0 p/s]
202023-11-21 13:39:29.537 - Job [13999] Info : [download progress:14 / 23 [----------------------------------->_______________________] 60.87% 0 p/s]
212023-11-21 13:39:37.527 - Job [13999] Info : [download progress:15 / 23 [-------------------------------------->____________________] 65.22% 0 p/s]
222023-11-21 13:39:45.963 - Job [13999] Info : [download progress:16 / 23 [----------------------------------------->_________________] 69.57% 0 p/s]
232023-11-21 13:39:53.942 - Job [13999] Info : [download progress:17 / 23 [------------------------------------------->_______________] 73.91% 0 p/s]
242023-11-21 13:39:55.636 - Job [13999] Info : [download progress:18 / 23 [---------------------------------------------->____________] 78.26% 0 p/s]
252023-11-21 13:40:12.087 - Job [13999] Info : [download progress:19 / 23 [------------------------------------------------>__________] 82.61% 0 p/s]
262023-11-21 13:40:13.369 - Job [13999] Info : [download progress:20 / 23 [--------------------------------------------------->_______] 86.96% 0 p/s]
272023-11-21 13:40:13.550 - Job [13999] Info : [download progress:21 / 23 [----------------------------------------------------->_____] 91.30% 0 p/s]
282023-11-21 13:40:13.880 - Job [13999] Info : [download progress:22 / 23 [-------------------------------------------------------->__] 95.65% 0 p/s]
292023-11-21 13:40:14.029 - Job [13999] Info : [download progress:23 / 23 [--------------------------------------------------------->] 100.00% 0 p/s]
302023-11-21 13:40:14.189 - Job [13999] Info : [download model file to temp dir done.]
312023-11-21 13:40:14.387 - Job [13999] Info : [archive model path to temp dir begin...]
322023-11-21 13:40:26.308 - Job [13999] Info : [archive model path to temp dir done.]
332023-11-21 13:40:26.461 - Job [13999] Info : [packing model file done.]
342023-11-21 13:40:26.586 - Job [13999] Info : [calculate model tar file md5...]
352023-11-21 13:40:55.013 - Job [13999] Info : [model tar md5 is:de0bb810218721af491575979301849f]
362023-11-21 13:40:55.194 - Job [13999] Info : [calculate model tar size...]
372023-11-21 13:40:55.336 - Job [13999] Info : [model tar size is:15457.0737 MB]
382023-11-21 13:40:55.510 - Job [13999] Info : [upload archive file begin ...]
392023-11-21 13:41:26.125 - Job [13999] Info : [upload archive file done...]
402023-11-21 13:41:26.271 - Job [13999] Info : [export done.]
412023-11-21 13:41:26.402 - Job [13999] Info : [no err, transaction will be commit.]
422023-11-21 13:41:26.485 - Job [13999] Info : [transaction commit complete.]
任务的详情页面可以下载该模型。
网友评论