Scholar Puyu 2 Dialogue 20B large model deployment practice

Mondo Culture Updated on 2024-02-02

Scholar Puyu 20 is a large language model, which is a new generation of large language model released by SenseTime and Shanghai AI Lab, Hong Kong Chinese University and Fudan University. ‘

Effectively support 200,000-word ultra-long contexts: The model almost perfectly realizes the "needle in a haystack" for long texts in 200,000 words long input, and its performance in long text tasks such as Longbench and L-Eval also reaches the leading level in open source models. You can try 200,000 words of extra-long contextual reasoning with lmdeploy. Comprehensive performance improvement: Compared with the previous generation of models, all ability dimensions have been comprehensively improved, and the ability improvement in reasoning, mathematics, dialogue experience, instruction following and creative writing is particularly significant, and the comprehensive performance has reached the leading level of the same level of open source models, and Internlm2-Chat-20B can match or even surpass ChatGPT (GPT-3.) in key ability evaluations5)。Interpreter & Data Analysis: With code-interpreter, INTERNLM2-CHAT-20B can achieve similar levels to GPT-4 on GSM8K and MATH. Based on a strong foundation in mathematics and tools, Internlm2-Chat provides practical data analysis capabilities. Overall upgrade of tool invocation capabilities: Based on stronger and more generalized command understanding, tool screening, and result reflection capabilities, the new version of the model can more reliably support the construction of complex agents, support effective multiple rounds of invocation of tools, and complete more complex tasks. Dependency Environment:

python >= 3.8·pytorch >= 1.12.0 (Recommend 2.)0.0 and higher)·transformers >= 4.34

Loaded via transformers

Load the internlm2-7b-chat model from transformers via the following ** (modifiable model name to replace different models): import torchfrom transformers import autotokenizer, automodelforcausallmtokenizer = autotokenizerfrom_pretrained("internlm/internlm2-chat-7b", trust remote code=true) to set torch dtype=torch.float16 to specify the model accuracy as torchfloat16, otherwise the video memory may be insufficient due to your hardware. model = automodelforcausallm.from_pretrained("internlm/internlm2-chat-7b", device_map="auto",trust_remote_code=true, torch_dtype=torch.float16) (optional) If you are on a low-resource device, you can use Bitsandbytes to load a 4-bit or 8-bit quantized model to further save GPU memory. A 4-bit quantized InterNlm 7B consumes approximately 8GB of video memory. # pip install -u bitsandbytes# 8-bit: model = automodelforcausallm.from_pretrained(model_dir, device_map="auto", trust_remote_code=true, load_in_8bit=true)# 4-bit: model = automodelforcausallm.from_pretrained(model_dir, device_map="auto", trust_remote_code=true, load_in_4bit=true)model = model.eval()response, history = model.chat(tokenizer, "Hello", history=)print(response) model output: Hello! Is there anything I can help you with? response, history = model.chat(tokenizer, "Please provide three suggestions for managing your time. ", history=history)print(response)

Dialogue through the front-end web page

pip install streamlitpip install transformers>=4.34streamlit run ./chat/web_demo.py

Interface display:

Fine-tuning:

For details, please refer to the official documentation

Performance Score:

internlm2-chat in alpacaeval 20, the results show that InterNLM2-Chat has surpassed Claude 2, GPT-4(0613) and Gemini Pro. on Alpacaeval

Related Pages