美文网首页
单机双卡3090指令精调训练chinese llama alpa

单机双卡3090指令精调训练chinese llama alpa

作者: 水他 | 来源:发表于2023-06-22 11:20 被阅读0次

    大模型并不能很好的回答行业问题,需要用领域数据给模型做instruct tunning。

    精调训练之前,已经完成了Chinese-Alpaca-Plus 7B的合并操作,可以参考

    simon li:离线环境用docker部署Chinese-LLaMA-Alpaca 33B0 赞同 · 0 评论文章

    https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E6%89%8B%E5%8A%A8%E6%A8%A1%E5%9E%8B%E5%90%88%E5%B9%B6%E4%B8%8E%E8%BD%AC%E6%8D%A2github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E6%89%8B%E5%8A%A8%E6%A8%A1%E5%9E%8B%E5%90%88%E5%B9%B6%E4%B8%8E%E8%BD%AC%E6%8D%A2

    本次在Chinese-Alpaca-Plus 7B模型的基础上进行领域数据的精调,最后再将精调模型与Chinese-Alpaca-Plus 7B模型合并成为领域模型。

    环境和过程概述

    公司服务器机器有两块24GB显存的3090,python3.9,NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0。

    1、 第一次训练,用0号单卡,报错oom。

    OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 23.69 GiB total
    capacity; 9.38 GiB already allocated; 30.94 MiB free; 9.40 GiB reserved in total by PyTorch)
    If reserved memory is >> allocated memory try setting max_split_size_mb to avoid
    fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 109181) of binary: /root/anaconda3/envs/py39/bin/python
    

    2、第二次训练,改为单机双卡,双卡OOM。

    修改参数: --nnodes 1 --nproc_per_node 2

    OutOfMemoryError: CUDA out of memory.
    

    3、第三次训练,单机双卡,且改为节省内存模式,训练跑成功,merge时报错。

    删除脚本中三行

    • --modules_to_save ${modules_to_save} \
    • --peft_path ${peft_model} \
    • --gradient_checkpointing \
    Merging with merge_and_unload...
    ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
    │ /data/lsy/Chinese-LLaMA-Alpaca-main/scripts/merge_llama_with_chinese_lora.py:327 in <module>     │
    │                                                                                                  │
    │   324 │   │   │   │   assert base_model_sd[original_key].dtype == torch.float16                  │
    │   325 │   │                                                                                      │
    │   326 │   │   # did we do anything?                                                              │
    │ ❱ 327 │   │   assert not torch.allclose(first_weight_old, first_weight)                          │
    │   328 │                                                                                          │
    │   329 │   tokenizer.save_pretrained(output_dir)                                                  │
    │   330                                                                                            │
    ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
    AssertionError
    

    4、第四次训练,单机双卡,节省内存模型,改peft为指定commit版本,训练和merge跑成功。

    源码安装peft ,修改脚本重新训练,重新merge

    GitHub - huggingface/peft at 13e53fc7ee5d89d59b16523051006dddf0fb7a49github.com/huggingface/peft/tree/13e53fc[图片上传失败...(image-973789-1687490378231)]

    数据准备

    领域中自己准备的数据,改为instruct格式。

    [
        {
            "instruction":"指令",
            "input":"问题",
            "output":"回答"
        },
        {
            ...
        }
    ]
    

    脚本

    run_sft.sh

    lora_rank和lora_alpha和Chinese-Alpaca-Plus 7B模型一致。

    脚本参考

    https://github.com/ymcui/Chinese-LLaMA-Alpaca/issues/618github.com/ymcui/Chinese-LLaMA-Alpaca/issues/618

    lr=3e-4
    lora_rank=64
    lora_alpha=128
    lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
    modules_to_save="embed_tokens,lm_head"
    lora_dropout=0.05
    
    pretrained_model=/data/lsy/Chinese-LLaMA-Alpaca-main/cn_llama_alpaca/7B
    chinese_tokenizer_path=/data/lsy/Chinese-LLaMA-Alpaca-main/cn_llama_alpaca/7B
    dataset_dir=/data/lsy/Chinese-LLaMA-Alpaca-main/data/transportation_train
    per_device_train_batch_size=1
    per_device_eval_batch_size=1
    training_steps=1485
    gradient_accumulation_steps=8
    output_dir=/data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model
    peft_model=/data/lsy/Chinese-LLaMA-Alpaca-main/chinese-alpaca-plus-lora-7b
    validation_file=/data/lsy/Chinese-LLaMA-Alpaca-main/data/transportation_val/data_valid.json
    
    deepspeed_config_file=ds_zero2_no_offload.json
    
    torchrun --nnodes 1 --nproc_per_node 2 run_clm_sft_with_peft.py \
        --deepspeed ${deepspeed_config_file} \
        --model_name_or_path ${pretrained_model} \
        --tokenizer_name_or_path ${chinese_tokenizer_path} \
        --dataset_dir ${dataset_dir} \
        --validation_split_percentage 0.001 \
        --per_device_train_batch_size ${per_device_train_batch_size} \
        --per_device_eval_batch_size ${per_device_eval_batch_size} \
        --do_train \
        --do_eval \
        --seed $RANDOM \
        --fp16 \
        --max_steps ${training_steps} \
        --lr_scheduler_type cosine \
        --learning_rate ${lr} \
        --warmup_ratio 0.03 \
        --weight_decay 0 \
        --logging_strategy steps \
        --logging_steps 10 \
        --save_strategy steps \
        --save_total_limit 3 \
        --evaluation_strategy steps \
        --eval_steps 250 \
        --save_steps 500 \
        --gradient_accumulation_steps ${gradient_accumulation_steps} \
        --preprocessing_num_workers 8 \
        --max_seq_length 512 \
        --output_dir ${output_dir} \
        --overwrite_output_dir \
        --ddp_timeout 30000 \
        --logging_first_step True \
        --lora_rank ${lora_rank} \
        --lora_alpha ${lora_alpha} \
        --trainable ${lora_trainable} \
        --lora_dropout ${lora_dropout} \
        --torch_dtype float16 \
        --validation_file ${validation_file} \
        --ddp_find_unused_parameters False
    

    源码安装peft

    pip uinstall peft
    cd peft-13e53fc7ee5d89d59b16523051006dddf0fb7a49
    python setup.py install
    pip list | grep peft
    peft                     0.3.0.dev0
    

    精调训练

    占了40多G 显存,用了2个小时

    nvidia-smi
    Thu Jun 22 10:31:00 2023
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 525.116.04   Driver Version: 525.116.04   CUDA Version: 12.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  Off  | 00000000:3B:00.0 Off |                  N/A |
    | 30%   44C    P2   197W / 350W |  20434MiB / 24576MiB |     35%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  NVIDIA GeForce ...  Off  | 00000000:AF:00.0 Off |                  N/A |
    | 30%   43C    P2   189W / 350W |  20754MiB / 24576MiB |     26%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A    187816      C   ...nda3/envs/py39/bin/python    20432MiB |
    |    1   N/A  N/A    187817      C   ...nda3/envs/py39/bin/python    20752MiB |
    

    训练日志

    (py39) [root@localhost training]# sh run_sft.sh
    WARNING:torch.distributed.run:
    *****************************************
    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
    *****************************************
    [2023-06-22 14:34:01,185] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
    [2023-06-22 14:34:01,212] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
    [2023-06-22 14:34:04,122] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
    [2023-06-22 14:34:04,122] [INFO] [comm.py:594:init_distributed] cdb=None
    [2023-06-22 14:34:04,122] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
    [2023-06-22 14:34:04,262] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
    [2023-06-22 14:34:04,262] [INFO] [comm.py:594:init_distributed] cdb=None
    06/22/2023 14:34:09 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True
    06/22/2023 14:34:10 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
    [INFO|configuration_utils.py:666] 2023-06-22 14:34:10,150 >> loading configuration file /data/lsy/Chinese-LLaMA-Alpaca-main/cn_llama_alpaca/7B/config.json
    [INFO|configuration_utils.py:720] 2023-06-22 14:34:10,152 >> Model config LlamaConfig {
      "_name_or_path": "/data/lsy/Chinese-LLaMA-Alpaca-main/cn_llama_alpaca/7B",
      "architectures": [
        "LlamaForCausalLM"
      ],
      "bos_token_id": 1,
      "eos_token_id": 2,
      "hidden_act": "silu",
      "hidden_size": 4096,
      "initializer_range": 0.02,
      "intermediate_size": 11008,
      "max_position_embeddings": 2048,
      "model_type": "llama",
      "num_attention_heads": 32,
      "num_hidden_layers": 32,
      "pad_token_id": 0,
      "rms_norm_eps": 1e-06,
      "tie_word_embeddings": false,
      "torch_dtype": "float16",
      "transformers_version": "4.28.1",
      "use_cache": true,
      "vocab_size": 49954
    }
    
    [INFO|tokenization_utils_base.py:1807] 2023-06-22 14:34:10,154 >> loading file tokenizer.model
    [INFO|tokenization_utils_base.py:1807] 2023-06-22 14:34:10,154 >> loading file added_tokens.json
    [INFO|tokenization_utils_base.py:1807] 2023-06-22 14:34:10,154 >> loading file special_tokens_map.json
    [INFO|tokenization_utils_base.py:1807] 2023-06-22 14:34:10,154 >> loading file tokenizer_config.json
    06/22/2023 14:34:10 - INFO - __main__ - training files: /data/lsy/Chinese-LLaMA-Alpaca-main/data/transportation_train/data_train.json
    06/22/2023 14:34:10 - WARNING - root - building dataset...
    06/22/2023 14:34:10 - INFO - __name__ - training datasets-/data/lsy/Chinese-LLaMA-Alpaca-main/data/transportation_train/data_train.json has been loaded from disk
    06/22/2023 14:34:12 - INFO - __main__ - Num train_samples  13437
    06/22/2023 14:34:12 - WARNING - root - building dataset...
    06/22/2023 14:34:12 - INFO - __main__ - training example:
    06/22/2023 14:34:12 - INFO - __name__ - training datasets-/data/lsy/Chinese-LLaMA-Alpaca-main/data/transportation_train/data_train.json has been loaded from disk
    06/22/2023 14:34:12 - INFO - __main__ - <s> Below is an instruction that describes a task. Write a response that appropriately completes the request.
    
    ### Instruction:
    请将以xx
    ### Response:  xxxx窗口</s>
    06/22/2023 14:34:12 - INFO - __main__ - training files: /data/lsy/Chinese-LLaMA-Alpaca-main/data/transportation_val/data_valid.json
    06/22/2023 14:34:12 - WARNING - root - building dataset...
    06/22/2023 14:34:12 - INFO - __name__ - training datasets-/data/lsy/Chinese-LLaMA-Alpaca-main/data/transportation_val/data_valid.json has been loaded from disk
    06/22/2023 14:34:12 - WARNING - root - building dataset...
    06/22/2023 14:34:12 - INFO - __main__ - Num eval_samples  9141
    06/22/2023 14:34:12 - INFO - __main__ - eval example:
    06/22/2023 14:34:12 - INFO - __name__ - training datasets-/data/lsy/Chinese-LLaMA-Alpaca-main/data/transportation_val/data_valid.json has been loaded from disk
    06/22/2023 14:34:12 - INFO - __main__ - <s> Below is an instruction that describes a task. Write a response that appropriately completes the request.
    
    ### Instruction:
    请将以下xxxx:
    情况描述:xxxx
    ### Response:  xxx/s>
    [INFO|modeling_utils.py:2531] 2023-06-22 14:34:12,377 >> loading weights file /data/lsy/Chinese-LLaMA-Alpaca-main/cn_llama_alpaca/7B/pytorch_model.bin.index.json
    [INFO|modeling_utils.py:1176] 2023-06-22 14:34:12,378 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
    [INFO|configuration_utils.py:575] 2023-06-22 14:34:12,379 >> Generate config GenerationConfig {
      "_from_model_config": true,
      "bos_token_id": 1,
      "eos_token_id": 2,
      "pad_token_id": 0,
      "transformers_version": "4.28.1"
    }
    
    Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.03s/it]
    Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00,  5.06s/it]
    [INFO|modeling_utils.py:3190] 2023-06-22 14:34:22,670 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
    
    [INFO|modeling_utils.py:3198] 2023-06-22 14:34:22,670 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /data/lsy/Chinese-LLaMA-Alpaca-main/cn_llama_alpaca/7B.
    If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
    [INFO|configuration_utils.py:535] 2023-06-22 14:34:22,673 >> loading configuration file /data/lsy/Chinese-LLaMA-Alpaca-main/cn_llama_alpaca/7B/generation_config.json
    [INFO|configuration_utils.py:575] 2023-06-22 14:34:22,673 >> Generate config GenerationConfig {
      "_from_model_config": true,
      "bos_token_id": 1,
      "eos_token_id": 2,
      "pad_token_id": 0,
      "transformers_version": "4.28.1"
    }
    
    06/22/2023 14:34:22 - INFO - __main__ - len(tokenizer):49954
    06/22/2023 14:34:22 - INFO - __main__ - Init new peft model
    06/22/2023 14:34:22 - INFO - __main__ - target_modules: ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj']
    06/22/2023 14:34:22 - INFO - __main__ - lora_rank: 64
    trainable params: 159907840 || all params: 7045402624 || trainable%: 2.26967639088897
    06/22/2023 14:36:00 - INFO - __main__ - model.modules_to_save: None
    [INFO|trainer.py:564] 2023-06-22 14:36:00,761 >> max_steps is given, it will override any value given in num_train_epochs
    [INFO|trainer.py:621] 2023-06-22 14:36:00,761 >> Using cuda_amp half precision backend
    /root/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
      warnings.warn(
    [2023-06-22 14:36:00,785] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.4, git-hash=unknown, git-branch=unknown
    trainable params: 159907840 || all params: 7045402624 || trainable%: 2.26967639088897
    /root/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
      warnings.warn(
    06/22/2023 14:36:10 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 0
    06/22/2023 14:36:19 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 1
    06/22/2023 14:36:19 - INFO - torch.distributed.distributed_c10d - Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes.
    06/22/2023 14:36:19 - INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes.
    [2023-06-22 14:36:19,787] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
    [2023-06-22 14:36:19,788] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
    [2023-06-22 14:36:19,788] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
    [2023-06-22 14:36:19,832] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
    [2023-06-22 14:36:19,832] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'transformers.optimization.AdamW'>
    [2023-06-22 14:36:19,832] [WARNING] [engine.py:1116:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
    [2023-06-22 14:36:19,832] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
    [2023-06-22 14:36:19,833] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 100000000
    [2023-06-22 14:36:19,833] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 100000000
    [2023-06-22 14:36:19,833] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False
    [2023-06-22 14:36:19,833] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
    Using /data0/root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
    Using /data0/root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
    Emitting ninja build file /data0/root/.cache/torch_extensions/py39_cu117/utils/build.ninja...
    Building extension module utils...
    Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
    ninja: no work to do.
    Loading extension module utils...
    Time to load utils op: 0.6561455726623535 seconds
    Loading extension module utils...
    Time to load utils op: 0.6084239482879639 seconds
    Rank: 1 partition count [2] and sizes[(79953920, False)]
    Rank: 0 partition count [2] and sizes[(79953920, False)]
    Using /data0/root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
    No modifications detected for re-loaded extension module utils, skipping build step...
    Loading extension module utils...
    Time to load utils op: 0.0009965896606445312 seconds
    [2023-06-22 14:36:22,319] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
    [2023-06-22 14:36:22,320] [INFO] [utils.py:786:see_memory_usage] MA 13.45 GB         Max_MA 13.6 GB         CA 13.66 GB         Max_CA 14 GB
    [2023-06-22 14:36:22,320] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 10.98 GB, percent = 17.6%
    [2023-06-22 14:36:22,423] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states
    [2023-06-22 14:36:22,424] [INFO] [utils.py:786:see_memory_usage] MA 14.05 GB         Max_MA 14.65 GB         CA 14.85 GB         Max_CA 15 GB
    [2023-06-22 14:36:22,425] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 10.98 GB, percent = 17.6%
    [2023-06-22 14:36:22,425] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized
    [2023-06-22 14:36:22,523] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer
    [2023-06-22 14:36:22,524] [INFO] [utils.py:786:see_memory_usage] MA 14.05 GB         Max_MA 14.05 GB         CA 14.85 GB         Max_CA 15 GB
    [2023-06-22 14:36:22,524] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 10.98 GB, percent = 17.6%
    [2023-06-22 14:36:22,531] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
    [2023-06-22 14:36:22,531] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
    [2023-06-22 14:36:22,531] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f4a1d8d5a60>
    [2023-06-22 14:36:22,531] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.999)]
    [2023-06-22 14:36:22,533] [INFO] [config.py:960:print] DeepSpeedEngine configuration:
    [2023-06-22 14:36:22,534] [INFO] [config.py:964:print]   activation_checkpointing_config  {
        "partition_activations": false,
        "contiguous_memory_optimization": false,
        "cpu_checkpointing": false,
        "number_checkpoints": null,
        "synchronize_checkpoint_boundary": false,
        "profile": false
    }
    [2023-06-22 14:36:22,534] [INFO] [config.py:964:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
    [2023-06-22 14:36:22,534] [INFO] [config.py:964:print]   amp_enabled .................. False
    [2023-06-22 14:36:22,534] [INFO] [config.py:964:print]   amp_params ................... False
    [2023-06-22 14:36:22,534] [INFO] [config.py:964:print]   autotuning_config ............ {
        "enabled": false,
        "start_step": null,
        "end_step": null,
        "metric_path": null,
        "arg_mappings": null,
        "metric": "throughput",
        "model_info": null,
        "results_dir": "autotuning_results",
        "exps_dir": "autotuning_exps",
        "overwrite": true,
        "fast": true,
        "start_profile_step": 3,
        "end_profile_step": 5,
        "tuner_type": "gridsearch",
        "tuner_early_stopping": 5,
        "tuner_num_trials": 50,
        "model_info_path": null,
        "mp_size": 1,
        "max_train_batch_size": null,
        "min_train_batch_size": 1,
        "max_train_micro_batch_size_per_gpu": 1.024000e+03,
        "min_train_micro_batch_size_per_gpu": 1,
        "num_tuning_micro_batch_sizes": 3
    }
    [2023-06-22 14:36:22,534] [INFO] [config.py:964:print]   bfloat16_enabled ............. False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   checkpoint_parallel_write_pipeline  False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   checkpoint_tag_validation_enabled  True
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   checkpoint_tag_validation_fail  False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f4a1d8d5b20>
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   communication_data_type ...... None
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   curriculum_enabled_legacy .... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   curriculum_params_legacy ..... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   data_efficiency_enabled ...... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   dataloader_drop_last ......... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   disable_allgather ............ False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   dump_state ................... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1e-10}
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   eigenvalue_enabled ........... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   eigenvalue_gas_boundary_resolution  1
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   eigenvalue_layer_name ........ bert.encoder.layer
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   eigenvalue_layer_num ......... 0
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   eigenvalue_max_iter .......... 100
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   eigenvalue_stability ......... 1e-06
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   eigenvalue_tol ............... 0.01
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   eigenvalue_verbose ........... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   elasticity_enabled ........... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   flops_profiler_config ........ {
        "enabled": false,
        "recompute_fwd_factor": 0.0,
        "profile_step": 1,
        "module_depth": -1,
        "top_modules": 1,
        "detailed": true,
        "output_file": null
    }
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   fp16_auto_cast ............... False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   fp16_enabled ................. True
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   fp16_master_weights_and_gradients  False
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   global_rank .................. 0
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   grad_accum_dtype ............. None
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   gradient_accumulation_steps .. 8
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   gradient_clipping ............ 1.0
    [2023-06-22 14:36:22,535] [INFO] [config.py:964:print]   gradient_predivide_factor .... 1.0
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   initial_dynamic_scale ........ 65536
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   load_universal_checkpoint .... False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   loss_scale ................... 0
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   memory_breakdown ............. False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   mics_hierarchial_params_gather  False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   mics_shard_size .............. -1
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   nebula_config ................ {
        "enabled": false,
        "persistent_storage_path": null,
        "persistent_time_interval": 100,
        "num_of_version_in_retention": 2,
        "enable_nebula_load": true,
        "load_path": null
    }
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   optimizer_legacy_fusion ...... False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   optimizer_name ............... None
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   optimizer_params ............. None
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   pld_enabled .................. False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   pld_params ................... False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   prescale_gradients ........... False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   scheduler_name ............... None
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   scheduler_params ............. None
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   sparse_attention ............. None
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   sparse_gradients_enabled ..... False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   steps_per_print .............. 2000
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   train_batch_size ............. 16
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   train_micro_batch_size_per_gpu  1
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   use_node_local_storage ....... False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   wall_clock_breakdown ......... False
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   world_size ................... 2
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   zero_allow_untested_optimizer  True
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=100000000 allgather_partitions=True allgather_bucket_size=100000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   zero_enabled ................. True
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   zero_force_ds_cpu_optimizer .. True
    [2023-06-22 14:36:22,536] [INFO] [config.py:964:print]   zero_optimization_stage ...... 2
    [2023-06-22 14:36:22,536] [INFO] [config.py:950:print_user_config]   json = {
        "fp16": {
            "enabled": true,
            "loss_scale": 0,
            "loss_scale_window": 100,
            "initial_scale_power": 16,
            "hysteresis": 2,
            "min_loss_scale": 1e-10
        },
        "zero_optimization": {
            "stage": 2,
            "allgather_partitions": true,
            "allgather_bucket_size": 1.000000e+08,
            "overlap_comm": true,
            "reduce_scatter": true,
            "reduce_bucket_size": 1.000000e+08,
            "contiguous_gradients": true
        },
        "gradient_accumulation_steps": 8,
        "gradient_clipping": 1.0,
        "steps_per_print": 2.000000e+03,
        "train_batch_size": 16,
        "train_micro_batch_size_per_gpu": 1,
        "wall_clock_breakdown": false,
        "zero_allow_untested_optimizer": true
    }
    Using /data0/root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
    No modifications detected for re-loaded extension module utils, skipping build step...
    Loading extension module utils...
    Time to load utils op: 0.0003883838653564453 seconds
    [INFO|trainer.py:1769] 2023-06-22 14:36:22,537 >> ***** Running training *****
    [INFO|trainer.py:1770] 2023-06-22 14:36:22,537 >>   Num examples = 13,437
    [INFO|trainer.py:1771] 2023-06-22 14:36:22,537 >>   Num Epochs = 2
    [INFO|trainer.py:1772] 2023-06-22 14:36:22,537 >>   Instantaneous batch size per device = 1
    [INFO|trainer.py:1773] 2023-06-22 14:36:22,537 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
    [INFO|trainer.py:1774] 2023-06-22 14:36:22,537 >>   Gradient Accumulation steps = 8
    [INFO|trainer.py:1775] 2023-06-22 14:36:22,537 >>   Total optimization steps = 1,485
    [INFO|trainer.py:1776] 2023-06-22 14:36:22,541 >>   Number of trainable parameters = 159,907,840
    {'loss': 4.959, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.0}
    {'loss': 3.6324, 'learning_rate': 6.666666666666666e-05, 'epoch': 0.01}
      1%|█▎                                                                                                          | 18/1485 [00:58<1:13:37,  3.01s/it][2023-06-22 14:37:23,729] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2\. Reducing hysteresis to 1
    {'loss': 1.124, 'learning_rate': 0.00012666666666666666, 'epoch': 0.02}
    {'loss': 0.6802, 'learning_rate': 0.00019333333333333333, 'epoch': 0.04}
    {'loss': 0.5168, 'learning_rate': 0.00026, 'epoch': 0.05}
    {'loss': 0.4843, 'learning_rate': 0.00029999428845962564, 'epoch': 0.06}
    {'loss': 0.4373, 'learning_rate': 0.0002999300386255069, 'epoch': 0.07}
    {'loss': 0.3823, 'learning_rate': 0.000299794430213186, 'epoch': 0.08}
      5%|█████▌                                                                                                      | 76/1485 [03:56<1:12:13,  3.08s/it][2023-06-22 14:40:22,407] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
    {'loss': 0.3796, 'learning_rate': 0.00029961142370060274, 'epoch': 0.1}
    {'loss': 0.2669, 'learning_rate': 0.000299340439849399, 'epoch': 0.11}
    {'loss': 0.3113, 'learning_rate': 0.0002989983780370781, 'epoch': 0.12}
    {'loss': 0.3273, 'learning_rate': 0.00029858540106653656, 'epoch': 0.13}
    {'loss': 0.2959, 'learning_rate': 0.00029810170549244566, 'epoch': 0.14}
    {'loss': 0.251, 'learning_rate': 0.0002975475215277018, 'epoch': 0.15}
    {'loss': 0.2843, 'learning_rate': 0.0002969231129338577, 'epoch': 0.17}
    {'loss': 0.3103, 'learning_rate': 0.00029622877689558616, 'epoch': 0.18}
    {'loss': 0.3339, 'learning_rate': 0.00029546484387923624, 'epoch': 0.19}
    {'loss': 0.2398, 'learning_rate': 0.00029463167747554935, 'epoch': 0.2}
    {'loss': 0.271, 'learning_rate': 0.00029372967422660944, 'epoch': 0.21}
    {'loss': 0.2381, 'learning_rate': 0.00029275926343711074, 'epoch': 0.23}
    {'loss': 0.2446, 'learning_rate': 0.0002917209069700317, 'epoch': 0.24}
    {'loss': 0.2385, 'learning_rate': 0.0002906150990268135, 'epoch': 0.25}
    {'loss': 0.1768, 'learning_rate': 0.00028944236591214666, 'epoch': 0.26}
    {'loss': 0.2687, 'learning_rate': 0.00028820326578347884, 'epoch': 0.27}
    {'loss': 0.2425, 'learning_rate': 0.00028689838838536185, 'epoch': 0.29}
     16%|█████████████████▌                                                                                         | 243/1485 [11:59<1:05:17,  3.15s/it][2023-06-22 14:48:25,063] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2\. Reducing hysteresis to 1
     16%|█████████████████▌                                                                                         | 244/1485 [12:02<1:05:06,  3.15s/it][2023-06-22 14:48:28,214] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768
    {'loss': 0.2365, 'learning_rate': 0.00028580754313469326, 'epoch': 0.3}
     17%|██████████████████                                                                                         | 250/1485 [12:21<1:04:53,  3.15s/it][INFO|trainer.py:3129] 2023-06-22 14:48:44,028 >> ***** Running Evaluation *****
    [INFO|trainer.py:3131] 2023-06-22 14:48:44,028 >>   Num examples = 9141
    [INFO|trainer.py:3134] 2023-06-22 14:48:44,028 >>   Batch size = 1
    {'eval_loss': 0.2139892578125, 'eval_runtime': 449.5554, 'eval_samples_per_second': 20.333, 'eval_steps_per_second': 10.168, 'epoch': 0.3}
    {'loss': 0.2332, 'learning_rate': 0.00028438585254959896, 'epoch': 0.31}
    {'loss': 0.2509, 'learning_rate': 0.00028290020157700443, 'epoch': 0.32}
    {'loss': 0.2162, 'learning_rate': 0.00028135129730630994, 'epoch': 0.33}
    {'loss': 0.2229, 'learning_rate': 0.00027973987693206005, 'epoch': 0.35}
    {'loss': 0.2175, 'learning_rate': 0.0002780667074030786, 'epoch': 0.36}
    {'loss': 0.1922, 'learning_rate': 0.00027633258505744293, 'epoch': 0.37}
    {'loss': 0.2571, 'learning_rate': 0.00027453833524346976, 'epoch': 0.38}
    {'loss': 0.2227, 'learning_rate': 0.0002726848119268943, 'epoch': 0.39}
     23%|████████████████████████▊                                                                                    | 338/1485 [24:20<58:52,  3.08s/it][2023-06-22 15:00:45,289] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384
    {'loss': 0.2608, 'learning_rate': 0.0002709666906401224, 'epoch': 0.4}
    {'loss': 0.1999, 'learning_rate': 0.00026900300104368524, 'epoch': 0.42}
    {'loss': 0.2033, 'learning_rate': 0.0002669826724639322, 'epoch': 0.43}
    {'loss': 0.2301, 'learning_rate': 0.00026490666646784665, 'epoch': 0.44}
    {'loss': 0.2162, 'learning_rate': 0.0002627759711218466, 'epoch': 0.45}
    {'loss': 0.1683, 'learning_rate': 0.0002605916005215186, 'epoch': 0.46}
    {'loss': 0.2407, 'learning_rate': 0.0002583545943089633, 'epoch': 0.48}
    {'loss': 0.1855, 'learning_rate': 0.00025606601717798207, 'epoch': 0.49}
    {'loss': 0.1697, 'learning_rate': 0.0002537269583673404, 'epoch': 0.5}
    {'loss': 0.1852, 'learning_rate': 0.00025133853114234905, 'epoch': 0.51}
    {'loss': 0.1903, 'learning_rate': 0.0002489018722650103, 'epoch': 0.52}
    {'loss': 0.1633, 'learning_rate': 0.0002464181414529809, 'epoch': 0.54}
    {'loss': 0.1809, 'learning_rate': 0.00024388852082760884, 'epoch': 0.55}
    {'loss': 0.1877, 'learning_rate': 0.00024131421435130807, 'epoch': 0.56}
    {'loss': 0.174, 'learning_rate': 0.00023869644725453735, 'epoch': 0.57}
    {'loss': 0.1743, 'learning_rate': 0.00023603646545265687, 'epoch': 0.58}
     33%|████████████████████████████████████▍                                                                        | 497/1485 [31:43<51:50,  3.15s/it][2023-06-22 15:08:09,138] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, but hysteresis is 2\. Reducing hysteresis to 1
    {'loss': 0.2046, 'learning_rate': 0.00023360743420764165, 'epoch': 0.6}
     34%|████████████████████████████████████▋                                                                        | 500/1485 [31:52<51:43,  3.15s/it][INFO|trainer.py:3129] 2023-06-22 15:08:15,475 >> ***** Running Evaluation *****
    [INFO|trainer.py:3131] 2023-06-22 15:08:15,475 >>   Num examples = 9141
    [INFO|trainer.py:3134] 2023-06-22 15:08:15,475 >>   Batch size = 1
    {'eval_loss': 0.198486328125, 'eval_runtime': 448.6031, 'eval_samples_per_second': 20.377, 'eval_steps_per_second': 10.189, 'epoch': 0.6}
     34%|████████████████████████████████████▋                                                                        | 500/1485 [39:21<51:43,  3.15s/it[INFO|trainer.py:2868] 2023-06-22 15:15:44,081 >> Saving model checkpoint to /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500
    [INFO|trainer.py:2880] 2023-06-22 15:15:44,087 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.
    [INFO|tokenization_utils_base.py:2171] 2023-06-22 15:15:44,655 >> tokenizer config file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/tokenizer_config.json
    [INFO|tokenization_utils_base.py:2178] 2023-06-22 15:15:44,655 >> Special tokens file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/special_tokens_map.json
    [2023-06-22 15:15:44,657] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step500 is about to be saved!
    [2023-06-22 15:15:56,311] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/global_step500/mp_rank_00_model_states.pt
    [2023-06-22 15:15:56,311] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/global_step500/mp_rank_00_model_states.pt...
    [2023-06-22 15:16:40,312] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/global_step500/mp_rank_00_model_states.pt.
    [2023-06-22 15:16:41,358] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/global_step500/zero_pp_rank_0_mp_rank_00_optim_states.pt...
    [2023-06-22 15:16:43,204] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/global_step500/zero_pp_rank_0_mp_rank_00_optim_states.pt.
    [2023-06-22 15:16:43,205] [INFO] [engine.py:3245:_save_zero_checkpoint] zero checkpoint saved /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/global_step500/zero_pp_rank_0_mp_rank_00_optim_states.pt
    [2023-06-22 15:16:43,206] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step500 is ready now!
    [INFO|tokenization_utils_base.py:2171] 2023-06-22 15:16:44,067 >> tokenizer config file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/sft_lora_model/tokenizer_config.json
    [INFO|tokenization_utils_base.py:2178] 2023-06-22 15:16:44,067 >> Special tokens file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-500/sft_lora_model/special_tokens_map.json
    {'loss': 0.1721, 'learning_rate': 0.00023087074843665005, 'epoch': 0.61}
    {'loss': 0.155, 'learning_rate': 0.00022809557256934758, 'epoch': 0.62}
    {'loss': 0.1894, 'learning_rate': 0.00022528322743914416, 'epoch': 0.63}
    {'loss': 0.1734, 'learning_rate': 0.00022243505157000556, 'epoch': 0.64}
    {'loss': 0.1464, 'learning_rate': 0.00021955240053938893, 'epoch': 0.65}
    {'loss': 0.1538, 'learning_rate': 0.0002166366463330609, 'epoch': 0.67}
    {'loss': 0.1555, 'learning_rate': 0.00021368917669210697, 'epoch': 0.68}
    {'loss': 0.1658, 'learning_rate': 0.0002107113944524418, 'epoch': 0.69}
    {'loss': 0.1222, 'learning_rate': 0.00020770471687713541, 'epoch': 0.7}
    {'loss': 0.1707, 'learning_rate': 0.00020467057498187248, 'epoch': 0.71}
    {'loss': 0.1595, 'learning_rate': 0.00020161041285386648, 'epoch': 0.73}
    {'loss': 0.1984, 'learning_rate': 0.00019852568696455275, 'epoch': 0.74}
    {'loss': 0.1362, 'learning_rate': 0.00019541786547638617, 'epoch': 0.75}
    {'loss': 0.1544, 'learning_rate': 0.00019228842754407622, 'epoch': 0.76}
    {'loss': 0.154, 'learning_rate': 0.00018913886261058882, 'epoch': 0.77}
    {'loss': 0.1163, 'learning_rate': 0.00018597066969825248, 'epoch': 0.79}
    {'loss': 0.1342, 'learning_rate': 0.00018278535669530467, 'epoch': 0.8}
    {'loss': 0.1527, 'learning_rate': 0.00017958443963821892, 'epoch': 0.81}
    {'loss': 0.158, 'learning_rate': 0.0001763694419901532, 'epoch': 0.82}
    {'loss': 0.1266, 'learning_rate': 0.0001731418939158643, 'epoch': 0.83}
    {'loss': 0.1459, 'learning_rate': 0.0001699033315534323, 'epoch': 0.85}
    {'loss': 0.1146, 'learning_rate': 0.00016665529628314164, 'epoch': 0.86}
    {'loss': 0.1437, 'learning_rate': 0.00016339933399386804, 'epoch': 0.87}
    {'loss': 0.1516, 'learning_rate': 0.00016013699434731885, 'epoch': 0.88}
    {'loss': 0.1283, 'learning_rate': 0.0001568698300404781, 'epoch': 0.89}
     51%|███████████████████████████████████████████████████████                                                      | 750/1485 [52:04<33:47,  2.76s/it][INFO|trainer.py:3129] 2023-06-22 15:28:27,250 >> ***** Running Evaluation *****
    [INFO|trainer.py:3131] 2023-06-22 15:28:27,251 >>   Num examples = 9141
    [INFO|trainer.py:3134] 2023-06-22 15:28:27,251 >>   Batch size = 1
    {'eval_loss': 0.1422119140625, 'eval_runtime': 448.5858, 'eval_samples_per_second': 20.377, 'eval_steps_per_second': 10.19, 'epoch': 0.89}
    {'loss': 0.1241, 'learning_rate': 0.00015359939606660682, 'epoch': 0.9}
    {'loss': 0.1518, 'learning_rate': 0.00015032724897515054, 'epoch': 0.92}
    {'loss': 0.1263, 'learning_rate': 0.00014705494613090575, 'epoch': 0.93}
    {'loss': 0.1368, 'learning_rate': 0.00014378404497279913, 'epoch': 0.94}
    {'loss': 0.1214, 'learning_rate': 0.0001405161022726304, 'epoch': 0.95}
    {'loss': 0.1199, 'learning_rate': 0.00013725267339413376, 'epoch': 0.96}
     55%|██████████████████████████████████████████████████████████▌                                                | 813/1485 [1:02:37<33:46,  3.01s/it][2023-06-22 15:39:03,043] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, but hysteresis is 2\. Reducing hysteresis to 1
    {'loss': 0.14, 'learning_rate': 0.000134320730509852, 'epoch': 0.98}
     55%|███████████████████████████████████████████████████████████                                                | 820/1485 [1:02:58<33:16,  3.00s/it][2023-06-22 15:39:24,070] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144, reducing to 131072
    {'loss': 0.1507, 'learning_rate': 0.00013139483228242723, 'epoch': 0.99}
    {'loss': 0.1128, 'learning_rate': 0.00012815229552227885, 'epoch': 1.0}
    {'loss': 0.1113, 'learning_rate': 0.0001249201570861941, 'epoch': 1.01}
    {'loss': 0.0809, 'learning_rate': 0.00012169995529701673, 'epoch': 1.02}
    {'loss': 0.0847, 'learning_rate': 0.00011849322279639263, 'epoch': 1.04}
    {'loss': 0.0748, 'learning_rate': 0.00011530148581531613, 'epoch': 1.05}
    {'loss': 0.0846, 'learning_rate': 0.00011212626344772628, 'epoch': 1.06}
    {'loss': 0.0788, 'learning_rate': 0.00010896906692750068, 'epoch': 1.07}
    {'loss': 0.0921, 'learning_rate': 0.00010583139890918971, 'epoch': 1.08}
    {'loss': 0.0787, 'learning_rate': 0.00010271475275283457, 'epoch': 1.1}
    {'loss': 0.0935, 'learning_rate': 9.962061181320867e-05, 'epoch': 1.11}
    {'loss': 0.0695, 'learning_rate': 9.655044873382111e-05, 'epoch': 1.12}
    {'loss': 0.0639, 'learning_rate': 9.350572474601839e-05, 'epoch': 1.13}
    {'loss': 0.0997, 'learning_rate': 9.048788897351683e-05, 'epoch': 1.14}
    {'loss': 0.0606, 'learning_rate': 8.749837774269878e-05, 'epoch': 1.15}
    {'loss': 0.09, 'learning_rate': 8.453861389899865e-05, 'epoch': 1.17}
    {'loss': 0.0798, 'learning_rate': 8.161000612970594e-05, 'epoch': 1.18}
    {'loss': 0.0798, 'learning_rate': 7.871394829350579e-05, 'epoch': 1.19}
     67%|███████████████████████████████████████████████████████████████████████▍                                  | 1000/1485 [1:12:19<25:32,  3.16s/it][INFO|trainer.py:3129] 2023-06-22 15:48:41,638 >> ***** Running Evaluation *****
    [INFO|trainer.py:3131] 2023-06-22 15:48:41,638 >>   Num examples = 9141
    [INFO|trainer.py:3134] 2023-06-22 15:48:41,638 >>   Batch size = 1
    {'eval_loss': 0.1202392578125, 'eval_runtime': 451.0076, 'eval_samples_per_second': 20.268, 'eval_steps_per_second': 10.135, 'epoch': 1.19}
     67%|███████████████████████████████████████████████████████████████████████▍                                  | 1000/1485 [1:19:50<25:32,  3.16s/it[INFO|trainer.py:2868] 2023-06-22 15:56:12,650 >> Saving model checkpoint to /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000
    [INFO|trainer.py:2880] 2023-06-22 15:56:12,656 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.
    [INFO|tokenization_utils_base.py:2171] 2023-06-22 15:56:13,216 >> tokenizer config file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/tokenizer_config.json
    [INFO|tokenization_utils_base.py:2178] 2023-06-22 15:56:13,216 >> Special tokens file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/special_tokens_map.json
    [2023-06-22 15:56:13,219] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is about to be saved!
    [2023-06-22 15:56:25,197] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt
    [2023-06-22 15:56:25,197] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt...
    [2023-06-22 15:57:09,266] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/global_step1000/mp_rank_00_model_states.pt.
    [2023-06-22 15:57:10,359] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt...
    [2023-06-22 15:57:12,169] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt.
    [2023-06-22 15:57:12,169] [INFO] [engine.py:3245:_save_zero_checkpoint] zero checkpoint saved /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt
    [2023-06-22 15:57:12,169] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1000 is ready now!
    [INFO|tokenization_utils_base.py:2171] 2023-06-22 15:57:13,312 >> tokenizer config file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/sft_lora_model/tokenizer_config.json
    [INFO|tokenization_utils_base.py:2178] 2023-06-22 15:57:13,312 >> Special tokens file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/checkpoint-1000/sft_lora_model/special_tokens_map.json
    {'loss': 0.0793, 'learning_rate': 7.585181875707835e-05, 'epoch': 1.2}
    {'loss': 0.0753, 'learning_rate': 7.30249797390705e-05, 'epoch': 1.21}
     69%|████████████████████████████████████████████████████████████████████████▉                                 | 1021/1485 [1:21:59<25:29,  3.30s/it][2023-06-22 15:58:25,164] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, but hysteresis is 2\. Reducing hysteresis to 1
     69%|████████████████████████████████████████████████████████████████████████▉                                 | 1022/1485 [1:22:02<24:59,  3.24s/it][2023-06-22 15:58:28,319] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, reducing to 262144
    {'loss': 0.0801, 'learning_rate': 7.07898224024448e-05, 'epoch': 1.23}
    {'loss': 0.0745, 'learning_rate': 6.80298850855435e-05, 'epoch': 1.24}
    {'loss': 0.0838, 'learning_rate': 6.530896110385494e-05, 'epoch': 1.25}
    {'loss': 0.0759, 'learning_rate': 6.262834546982969e-05, 'epoch': 1.26}
    {'loss': 0.067, 'learning_rate': 5.998931401132786e-05, 'epoch': 1.27}
    {'loss': 0.0512, 'learning_rate': 5.739312276439427e-05, 'epoch': 1.29}
    {'loss': 0.0664, 'learning_rate': 5.4841007375453186e-05, 'epoch': 1.3}
    {'loss': 0.0646, 'learning_rate': 5.233418251320765e-05, 'epoch': 1.31}
    {'loss': 0.0501, 'learning_rate': 4.987384129052291e-05, 'epoch': 1.32}
    {'loss': 0.0722, 'learning_rate': 4.7461154696569294e-05, 'epoch': 1.33}
     76%|████████████████████████████████████████████████████████████████████████████████▏                         | 1123/1485 [1:27:22<19:08,  3.17s/it][2023-06-22 16:03:48,418] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, but hysteresis is 2\. Reducing hysteresis to 1
     76%|████████████████████████████████████████████████████████████████████████████████▏                         | 1124/1485 [1:27:25<18:58,  3.15s/it][2023-06-22 16:03:51,528] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, reducing to 262144
    {'loss': 0.0557, 'learning_rate': 4.556608919326302e-05, 'epoch': 1.35}
    {'loss': 0.0701, 'learning_rate': 4.3242059076724304e-05, 'epoch': 1.36}
    {'loss': 0.0741, 'learning_rate': 4.096883995807439e-05, 'epoch': 1.37}
    {'loss': 0.0577, 'learning_rate': 3.874751376649664e-05, 'epoch': 1.38}
    {'loss': 0.0714, 'learning_rate': 3.657913773295159e-05, 'epoch': 1.39}
    {'loss': 0.0738, 'learning_rate': 3.44647438869924e-05, 'epoch': 1.4}
    {'loss': 0.0591, 'learning_rate': 3.240533856557452e-05, 'epoch': 1.42}
    {'loss': 0.0714, 'learning_rate': 3.040190193409395e-05, 'epoch': 1.43}
    {'loss': 0.0714, 'learning_rate': 2.845538751988146e-05, 'epoch': 1.44}
    {'loss': 0.0824, 'learning_rate': 2.6566721758375474e-05, 'epoch': 1.45}
     82%|███████████████████████████████████████████████████████████████████████████████████████▍                  | 1225/1485 [1:32:21<11:53,  2.74s/it][2023-06-22 16:08:46,546] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, but hysteresis is 2\. Reducing hysteresis to 1
     83%|███████████████████████████████████████████████████████████████████████████████████████▌                  | 1226/1485 [1:32:23<11:47,  2.73s/it][2023-06-22 16:08:49,302] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, reducing to 262144
    {'loss': 0.0778, 'learning_rate': 2.509804517777061e-05, 'epoch': 1.46}
    {'loss': 0.0738, 'learning_rate': 2.331575345019648e-05, 'epoch': 1.48}
    {'loss': 0.0825, 'learning_rate': 2.1593756562784292e-05, 'epoch': 1.49}
     84%|█████████████████████████████████████████████████████████████████████████████████████████▏                | 1250/1485 [1:33:29<10:47,  2.75s/it][INFO|trainer.py:3129] 2023-06-22 16:09:52,499 >> ***** Running Evaluation *****
    [INFO|trainer.py:3131] 2023-06-22 16:09:52,499 >>   Num examples = 9141
    [INFO|trainer.py:3134] 2023-06-22 16:09:52,499 >>   Batch size = 1
    {'eval_loss': 0.11163330078125, 'eval_runtime': 449.4219, 'eval_samples_per_second': 20.339, 'eval_steps_per_second': 10.171, 'epoch': 1.49}
    {'loss': 0.0655, 'learning_rate': 1.9932874092789412e-05, 'epoch': 1.5}
    {'loss': 0.069, 'learning_rate': 1.83338965303145e-05, 'epoch': 1.51}
    {'loss': 0.06, 'learning_rate': 1.6797584902078914e-05, 'epoch': 1.52}
    {'loss': 0.0583, 'learning_rate': 1.5324670409211348e-05, 'epoch': 1.54}
    {'loss': 0.0647, 'learning_rate': 1.3915854079237616e-05, 'epoch': 1.55}
    {'loss': 0.0581, 'learning_rate': 1.257180643242962e-05, 'epoch': 1.56}
    {'loss': 0.0638, 'learning_rate': 1.1293167162673794e-05, 'epoch': 1.57}
     89%|██████████████████████████████████████████████████████████████████████████████████████████████▋           | 1327/1485 [1:44:38<08:23,  3.19s/it][2023-06-22 16:21:03,660] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, but hysteresis is 2\. Reducing hysteresis to 1
     89%|██████████████████████████████████████████████████████████████████████████████████████████████▊           | 1328/1485 [1:44:41<08:14,  3.15s/it][2023-06-22 16:21:06,698] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, reducing to 262144
    {'loss': 0.0644, 'learning_rate': 1.0317759786179464e-05, 'epoch': 1.58}
    {'loss': 0.0665, 'learning_rate': 9.158368068765109e-06, 'epoch': 1.6}
    {'loss': 0.0478, 'learning_rate': 8.066009340192802e-06, 'epoch': 1.61}
    {'loss': 0.0606, 'learning_rate': 7.041203504055454e-06, 'epoch': 1.62}
    {'loss': 0.0769, 'learning_rate': 6.084438312427742e-06, 'epoch': 1.63}
    {'loss': 0.0725, 'learning_rate': 5.1961691337226776e-06, 'epoch': 1.64}
    {'loss': 0.0599, 'learning_rate': 4.376818735960785e-06, 'epoch': 1.66}
    {'loss': 0.0681, 'learning_rate': 3.6267770855555765e-06, 'epoch': 1.67}
    {'loss': 0.0574, 'learning_rate': 2.9464011617104986e-06, 'epoch': 1.68}
    {'loss': 0.0487, 'learning_rate': 2.3360147865162605e-06, 'epoch': 1.69}
     96%|██████████████████████████████████████████████████████████████████████████████████████████████████████    | 1429/1485 [1:49:59<02:58,  3.18s/it][2023-06-22 16:26:24,839] [INFO] [loss_scaler.py:190:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, but hysteresis is 2\. Reducing hysteresis to 1
    {'loss': 0.0614, 'learning_rate': 1.8467489107293509e-06, 'epoch': 1.7}
     96%|██████████████████████████████████████████████████████████████████████████████████████████████████████    | 1430/1485 [1:50:02<02:53,  3.15s/it][2023-06-22 16:26:27,973] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288, reducing to 262144
    {'loss': 0.0617, 'learning_rate': 1.4145989334634122e-06, 'epoch': 1.71}
    {'loss': 0.0468, 'learning_rate': 1.001621962921878e-06, 'epoch': 1.73}
    {'loss': 0.0599, 'learning_rate': 6.59560150600924e-07, 'epoch': 1.74}
    {'loss': 0.0611, 'learning_rate': 3.885762993972141e-07, 'epoch': 1.75}
    {'loss': 0.0422, 'learning_rate': 1.8879938294748542e-07, 'epoch': 1.76}
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1485/1485 [1:52:45<00:00,  2.78s/it][INFO|trainer.py:2039] 2023-06-22 16:29:08,246 >>
    
    Training completed. Do not forget to share your model on huggingface.co/models =)
    
    {'train_runtime': 6765.7055, 'train_samples_per_second': 3.512, 'train_steps_per_second': 0.219, 'train_loss': 0.18127430334235684, 'epoch': 1.77}
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1485/1485 [1:52:45<00:00,  4.56s/it]
    [INFO|tokenization_utils_base.py:2171] 2023-06-22 16:29:09,001 >> tokenizer config file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/sft_lora_model/tokenizer_config.json
    [INFO|tokenization_utils_base.py:2178] 2023-06-22 16:29:09,001 >> Special tokens file saved in /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/sft_lora_model/special_tokens_map.json
    ***** train metrics *****
      epoch                    =       1.77
      train_loss               =     0.1813
      train_runtime            = 1:52:45.70
      train_samples            =      13437
      train_samples_per_second =      3.512
      train_steps_per_second   =      0.219
    06/22/2023 16:29:09 - INFO - __main__ - *** Evaluate ***
    [INFO|trainer.py:3129] 2023-06-22 16:29:09,008 >> ***** Running Evaluation *****
    [INFO|trainer.py:3131] 2023-06-22 16:29:09,008 >>   Num examples = 9141
    [INFO|trainer.py:3134] 2023-06-22 16:29:09,008 >>   Batch size = 1
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4571/4571 [07:28<00:00, 10.19it/s]
    ***** eval metrics *****
      epoch                   =       1.77
      eval_loss               =     0.1071
      eval_runtime            = 0:07:28.54
      eval_samples            =       9141
      eval_samples_per_second =     20.379
      eval_steps_per_second   =     10.191
      perplexity              =      1.113
    

    看下模型结果的文件

    (py39) [root@localhost training]# ll /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model
    总用量 32
    -rw-r--r-- 1 root root   398 6月  22 16:36 all_results.json
    drwxr-xr-x 4 root root   293 6月  22 15:57 checkpoint-1000
    drwxr-xr-x 4 root root   292 6月  22 15:16 checkpoint-500
    -rw-r--r-- 1 root root   223 6月  22 16:36 eval_results.json
    drwxr-xr-x 3 root root    50 6月  22 14:36 runs
    drwxr-xr-x 2 root root   141 6月  22 16:29 sft_lora_model
    -rw-r--r-- 1 root root 19561 6月  22 16:29 trainer_state.json
    -rw-r--r-- 1 root root   196 6月  22 16:29 train_results.json
    (py39) [root@localhost training]# ll /data/lsy/Chinese-LLaMA-Alpaca-main/instruct_model/sft_lora_model/
    总用量 313148
    -rw-r--r-- 1 root root       476 6月  22 16:29 adapter_config.json
    -rw-r--r-- 1 root root 319886269 6月  22 16:29 adapter_model.bin
    -rw-r--r-- 1 root root        96 6月  22 16:29 special_tokens_map.json
    -rw-r--r-- 1 root root       747 6月  22 16:29 tokenizer_config.json
    -rw-r--r-- 1 root root    757972 6月  22 16:29 tokenizer.model
    

    merge模型

    4分钟

    (py39) [root@localhost Chinese-LLaMA-Alpaca-main]# python scripts/merge_llama_with_chinese_lora.py --base_model cn_llama_alpaca/7B --lora_model instruct_model/sft_lora_model --output_type huggingface --output_dir cn_llama_alpaca/my_7B
    [2023-06-22 16:39:50,434] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
    Base model: cn_llama_alpaca/7B
    LoRA model(s) ['instruct_model/sft_lora_model']:
    Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00,  7.21s/it]
    Peft version: 0.3.0.dev0
    Loading LoRA for 7B model
    Loading LoRA instruct_model/sft_lora_model...
    base_model vocab size: 49954
    tokenizer vocab size: 49954
    Loading LoRA weights
    merging base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.0.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.0.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.0.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.0.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.0.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.1.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.1.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.1.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.1.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.1.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.1.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.1.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.2.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.2.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.2.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.2.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.2.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.2.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.2.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.3.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.3.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.3.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.3.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.3.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.3.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.3.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.4.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.4.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.4.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.4.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.4.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.4.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.4.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.5.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.5.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.5.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.5.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.5.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.5.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.5.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.6.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.6.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.6.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.6.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.6.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.6.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.6.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.7.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.7.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.7.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.7.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.7.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.7.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.7.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.8.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.8.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.8.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.8.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.8.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.8.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.8.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.9.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.9.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.9.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.9.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.9.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.9.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.9.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.10.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.10.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.10.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.10.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.10.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.10.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.10.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.11.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.11.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.11.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.11.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.11.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.11.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.11.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.12.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.12.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.12.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.12.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.12.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.12.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.12.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.13.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.13.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.13.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.13.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.13.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.13.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.13.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.14.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.14.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.14.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.14.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.14.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.14.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.14.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.15.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.15.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.15.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.15.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.15.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.15.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.15.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.16.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.16.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.16.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.16.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.16.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.16.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.16.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.17.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.17.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.17.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.17.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.17.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.17.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.17.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.18.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.18.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.18.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.18.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.18.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.18.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.18.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.19.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.19.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.19.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.19.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.19.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.19.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.19.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.20.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.20.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.20.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.20.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.20.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.20.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.20.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.21.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.21.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.21.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.21.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.21.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.21.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.21.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.22.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.22.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.22.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.22.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.22.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.22.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.22.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.23.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.23.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.23.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.23.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.23.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.23.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.23.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.24.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.24.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.24.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.24.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.24.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.24.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.24.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.25.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.25.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.25.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.25.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.25.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.25.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.25.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.26.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.26.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.26.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.26.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.26.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.26.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.26.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.27.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.27.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.27.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.27.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.27.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.27.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.27.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.28.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.28.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.28.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.28.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.28.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.28.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.28.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.29.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.29.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.29.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.29.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.29.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.29.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.29.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.30.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.30.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.30.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.30.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.30.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.30.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.30.mlp.up_proj.lora_A.weight
    merging base_model.model.model.layers.31.self_attn.q_proj.lora_A.weight
    merging base_model.model.model.layers.31.self_attn.k_proj.lora_A.weight
    merging base_model.model.model.layers.31.self_attn.v_proj.lora_A.weight
    merging base_model.model.model.layers.31.self_attn.o_proj.lora_A.weight
    merging base_model.model.model.layers.31.mlp.gate_proj.lora_A.weight
    merging base_model.model.model.layers.31.mlp.down_proj.lora_A.weight
    merging base_model.model.model.layers.31.mlp.up_proj.lora_A.weight
    Saving to Hugging Face format...
    

    看了下文件

    -rw-r--r-- 1 root root        543 6月  22 16:40 config.json
    -rw-r--r-- 1 root root        132 6月  22 16:40 generation_config.json
    -rw-r--r-- 1 root root 9943340890 6月  22 16:40 pytorch_model-00001-of-00002.bin
    -rw-r--r-- 1 root root 3827767515 6月  22 16:41 pytorch_model-00002-of-00002.bin
    -rw-r--r-- 1 root root      26788 6月  22 16:41 pytorch_model.bin.index.json
    -rw-r--r-- 1 root root         96 6月  22 16:40 special_tokens_map.json
    -rw-r--r-- 1 root root        747 6月  22 16:40 tokenizer_config.json
    -rw-r--r-- 1 root root     757972 6月  22 16:40 tokenizer.model
    

    测试

    python scripts/inference/inference_hf.py --base_model cn_llama_alpaca/my_7B --with_prompt --interactive

    可以对比之前

    ( python scripts/inference/inference_hf.py --base_model cn_llama_alpaca/7B --with_prompt --interactive )

    参考

    https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E6%8C%87%E4%BB%A4%E7%B2%B

    相关文章

      网友评论

          本文标题:单机双卡3090指令精调训练chinese llama alpa

          本文链接:https://www.haomeiwen.com/subject/fhapydtx.html