美文网首页
大模型相关

大模型相关

作者: IT_小马哥 | 来源:发表于2023-11-25 11:42 被阅读0次

    使用Transformers加载大模型, 并使用流式输出进行文本生成

        from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
        model = AutoModelForCausalLM.from_pretrained(
            args.model, device_map="auto", torch_dtype="auto", trust_remote_code=True
        )
        tokenizer = AutoTokenizer.from_pretrained(
            args.tokenizer or args.model, trust_remote_code=True
        )
    
        while True:
            prompt = input('请输入内容: ')
            if prompt == 'end':
                break
            inputs = tokenizer(
                # args.prompt,
                prompt,
                return_tensors="pt",
            )
            streamer = TextStreamer(tokenizer) if args.streaming else None
            outputs = model.generate(
                inputs.input_ids.cuda(),
                max_new_tokens=args.max_tokens,
                streamer=streamer,
                eos_token_id=tokenizer.convert_tokens_to_ids(args.eos_token),
                do_sample=True,
                repetition_penalty=1.3,
                no_repeat_ngram_size=5,
                temperature=0.7,
                top_k=40,
                top_p=0.8,
            )
            if streamer is None:
                print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    

    使用Transformers加载大模型, 并使用流式输出进行对话

    • 这种有简单的历史对话功能
        os_name = platform.system()
        clear_command = 'cls' if os_name == 'windows' else 'clear'
        stop_stream = False
    
        model_path = r'/01-aiYi-34B-Chat-4bits'
        tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, trust_remote_code=True)
        # Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
        model = AutoModelForCausalLM.from_pretrained(
            model_path,
            device_map="auto",
            torch_dtype='auto',
            trust_remote_code=True
        ).cuda().eval()
    
        streamer = TextStreamer(tokenizer)
        history = []
        print("零一万物 01-aiYi-34B-Chat-4bits 量化模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序\n")
        while True:
            print("请输入prompt,以Ctrl+D(在Windows上是Ctrl+Z)结束输入:")
            content = sys.stdin.readlines()
            content = ' '.join(content)
            content = content.strip()
            if content == "stop":
                break
            if content == "clear":
                history = []
                os.system('clear')
                print("零一万物 01-aiYi-34B-Chat-4bits 量化模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序")
                continue
    
            content = content if content else 'hi'
            messages= {"role": "user", "content": content}
            history.append(messages)
            input_ids = tokenizer.apply_chat_template(conversation=history, tokenize=True, add_generation_prompt=True,
                                                      return_tensors='pt')
    
            output_ids = model.generate(input_ids.to('cuda'),streamer=streamer,max_new_tokens=512,do_sample=True,
                                        eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>"),
                                        bos_token_id=tokenizer.convert_tokens_to_ids("<|im_start|"),
                                        repetition_penalty=1.3, no_repeat_ngram_size=5,temperature=0.7,top_k=40,top_p=0.8)
    
            response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
            print('*' * 30)
            print('返回的结果是:{}'.format(response))
            history.append({"role": "assistant", "content": response})
    
    • 更完善的维护上下文关系,需要生成Standalone question.

      可以参考
      (1)LangChain中condense llm的工作原理
      简单来说就是给定一个新的和Chat History来提升模型生成Standalone questionprompt = f"""
      Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
      Chat History:
      Human: What did the president say about Ketanji Brown Jackson
      Assistant: The President said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as one of our nation's top legal minds and mentioned that she comes from a family of public school educators and police officers. He also highlighted that she has received broad support from various groups, including the Fraternal Order of Police and former judges appointed by Democrats and Republicans.
      Follow Up Input: Did he mention who she succeeded
      Standalone question:
      """

      (2)How to construct the prompt for a standalone question?
      instruction = "Generate a standalone question which is based on the new question plus the chat history. Just create the standalone question without commentary. New question: ".question;
      chatHistory[] = ["role" => "user", "content" => $instruction];

    大模型的分布式部署

        with init_empty_weights():
            model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
    
        model = load_checkpoint_and_dispatch(model, model_path,
                                             device_map='auto',
                                             offload_folder="offload",
                                             offload_state_dict=True,
                                             dtype = "float16",
                                             no_split_module_classes=["LlamaDecoderLayer"])
    

    相关文章

      网友评论

          本文标题:大模型相关

          本文链接:https://www.haomeiwen.com/subject/enkowdtx.html