大模型相关

作者: IT_小马哥 | 来源:发表于2023-11-25 11:42 被阅读0次

使用Transformers加载大模型，并使用流式输出进行文本生成

    from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
    model = AutoModelForCausalLM.from_pretrained(
        args.model, device_map="auto", torch_dtype="auto", trust_remote_code=True
    )
    tokenizer = AutoTokenizer.from_pretrained(
        args.tokenizer or args.model, trust_remote_code=True
    )

    while True:
        prompt = input('请输入内容： ')
        if prompt == 'end':
            break
        inputs = tokenizer(
            # args.prompt,
            prompt,
            return_tensors="pt",
        )
        streamer = TextStreamer(tokenizer) if args.streaming else None
        outputs = model.generate(
            inputs.input_ids.cuda(),
            max_new_tokens=args.max_tokens,
            streamer=streamer,
            eos_token_id=tokenizer.convert_tokens_to_ids(args.eos_token),
            do_sample=True,
            repetition_penalty=1.3,
            no_repeat_ngram_size=5,
            temperature=0.7,
            top_k=40,
            top_p=0.8,
        )
        if streamer is None:
            print(tokenizer.decode(outputs[0], skip_special_tokens=True))

使用Transformers加载大模型，并使用流式输出进行对话

这种有简单的历史对话功能

    os_name = platform.system()
    clear_command = 'cls' if os_name == 'windows' else 'clear'
    stop_stream = False

    model_path = r'/01-aiYi-34B-Chat-4bits'
    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, trust_remote_code=True)
    # Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        device_map="auto",
        torch_dtype='auto',
        trust_remote_code=True
    ).cuda().eval()

    streamer = TextStreamer(tokenizer)
    history = []
    print("零一万物 01-aiYi-34B-Chat-4bits 量化模型，输入内容即可进行对话，clear 清空对话历史，stop 终止程序\n")
    while True:
        print("请输入prompt，以Ctrl+D（在Windows上是Ctrl+Z）结束输入：")
        content = sys.stdin.readlines()
        content = ' '.join(content)
        content = content.strip()
        if content == "stop":
            break
        if content == "clear":
            history = []
            os.system('clear')
            print("零一万物 01-aiYi-34B-Chat-4bits 量化模型，输入内容即可进行对话，clear 清空对话历史，stop 终止程序")
            continue

        content = content if content else 'hi'
        messages= {"role": "user", "content": content}
        history.append(messages)
        input_ids = tokenizer.apply_chat_template(conversation=history, tokenize=True, add_generation_prompt=True,
                                                  return_tensors='pt')

        output_ids = model.generate(input_ids.to('cuda'),streamer=streamer,max_new_tokens=512,do_sample=True,
                                    eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>"),
                                    bos_token_id=tokenizer.convert_tokens_to_ids("<|im_start|"),
                                    repetition_penalty=1.3, no_repeat_ngram_size=5,temperature=0.7,top_k=40,top_p=0.8)

        response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
        print('*' * 30)
        print('返回的结果是：{}'.format(response))
        history.append({"role": "assistant", "content": response})

更完善的维护上下文关系，需要生成Standalone question.

可以参考
（1）LangChain中condense llm的工作原理
简单来说就是给定一个新的和Chat History来提升模型生成Standalone question。prompt = f"""
Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
Chat History:
Human: What did the president say about Ketanji Brown Jackson
Assistant: The President said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as one of our nation's top legal minds and mentioned that she comes from a family of public school educators and police officers. He also highlighted that she has received broad support from various groups, including the Fraternal Order of Police and former judges appointed by Democrats and Republicans.
Follow Up Input: Did he mention who she succeeded
Standalone question:
"""
（2）How to construct the prompt for a standalone question?
instruction = "Generate a standalone question which is based on the new question plus the chat history. Just create the standalone question without commentary. New question: ".question;
chatHistory[] = ["role" => "user", "content" => $instruction];

大模型的分布式部署

通常我们的资源有限，GPU大小的限制，而大模型的参数量很大，需要多块GPU。
使用huggingface提供的 accelerate库
参考：
（1） https://huggingface.co/blog/accelerate-large-models
（2）pytorch在有限的资源下部署大语言模型（以ChatGLM-6B为例）
值得注意的是，在使用load_checkpoint_and_dispatch函数时，需要避免包含残差链接的类的切分。

    with init_empty_weights():
        model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)

    model = load_checkpoint_and_dispatch(model, model_path,
                                         device_map='auto',
                                         offload_folder="offload",
                                         offload_state_dict=True,
                                         dtype = "float16",
                                         no_split_module_classes=["LlamaDecoderLayer"])

网友评论

本文标题：大模型相关

本文链接：https://www.haomeiwen.com/subject/enkowdtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

大模型相关

使用Transformers加载大模型，并使用流式输出进行文本生成

使用Transformers加载大模型，并使用流式输出进行对话

大模型的分布式部署

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大模型相关

使用Transformers加载大模型， 并使用流式输出进行文本生成

使用Transformers加载大模型， 并使用流式输出进行对话

大模型的分布式部署

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

使用Transformers加载大模型，并使用流式输出进行文本生成

使用Transformers加载大模型，并使用流式输出进行对话