Open-source large language models (LLMS) have now reached a level of performance that makes them suitable as inference engines to drive agent workflows: Mixtral even surpassed GPT-3 in our benchmarks5. And through fine-tuning, its performance can be easily further enhanced.
Large language models (LLMS) trained for causal language modeling can handle a wide range of tasks, but they often struggle with basic tasks such as logic, computation, and search. The worst-case scenario is that they don't do well in a certain area, like math, and still try to handle all the calculations themselves.
To overcome this weakness, LLMs can be integrated, among other things, into a system in which it can invoke tools: Such a system is called an LLM agent.
In this post, we'll explain the inner workings of React agents, and then show how to use the ones that were recently integrated in Langchainchathuggingface
class to build them. Finally, we'll take a couple of open source LLMs with GPT-35 and GPT-4 for benchmarking.
LLM agents are defined very broadly: they refer to all systems that have an LLMS as their core engine and are able to exert influence on their environment based on observations. These systems are capable of achieving a given task through multiple iterations of the "Sense, Think, Act" cycle, and are often integrated into planning or knowledge management systems to improve their performance. You can find it at Xi et al, 2023 study to find a great review of the overview of the field of agents.
Today, we're focusing on:React agentsAbove. React uses a method based on ".Reasoning(reasoning) "and".action(acting)" to build agents. In the prompts, we explain what tools the model can use and guide it to think "step-by-step" (also known as chain-of-thought behavior) in order to plan and implement its next actions to reach the end goal.
Although the above ** seems a bit abstract, its core principle is actually quite straightforward.
See this notebook: We show a basic example of a tool call with the help of the transformers library.
Essentially, the LLM is called through a loop in which the hint contains the following:
Here's a question: ".
You can use these tools: .
First, you need to 'think': 'Next you can:'
Initiate the tool call in the correct JSON format, or output your answer with the prefix 'Final Answer:'.
Next, you need to parse the output of the LLM:
If included in the output'Final Answer:'
string, the loop ends, and the answer is output; If it is not included, it means that the LLM made a tool call: you need to parse this output to get the name of the tool and its parameters, and then execute the call of the corresponding tool based on those parameters. The result of this tool call is appended to the prompt, and then you pass the prompt with the new information to the LLM again until it has enough information to give a final answer to the question. For example, the output of the llm is answering the question:How many seconds are in 1:23:45?
When it might look like this:
Think: I need to convert a string of time to seconds.
Action:"action": "convert_time","action_input":
Given that this output is not included'Final Answer:'
string, which represents the tool call made. So we parse that output and get the arguments that the tool called: to the arguments
callconvert_time
tool, which is returned after the tool is invoked
So, we append this whole block of information to the prompt.
The updated prompt is now (a more detailed version):
Here's a question: "How many seconds does 1:23:45 contain?" ”
You can use the following tools:
Convert Time: Converts time in hours, minutes, and seconds to seconds.
First, "Think: "After that you can:
Call the tool in the correct json format, or output your answer with the "final answer:" prefix.
Think: I need to convert a string of time into seconds.
action"action": "convert_time","action_input":
Observations:
We call llm again with this new hint, given that it has access to the tool call resultsObservations:
, llm is now most likely to output:
Think: I now have the information I need to answer the question.
Final Answer: There are 1 seconds in 23:45:5025.
That's it!
In general, the difficulties of an agent system running an LLM engine include:
Choose one of the tools provided that will help you achieve your goals: For example, when asking"What is the smallest prime number greater than 30,000? ”
, the agent may call"What is the height of K2? ”
ofsearch
tools, but that doesn't help. Invoke the tool in a strict parameter format: For example, you have to call when trying to calculate the speed at which a car travels 3 kilometers in 10 minutescalculator
tool, passeddistance
Divide bytime
to calculate: Even if your calculator tool accepts calls in JSON formatThere are also a number of pitfalls, such as: misspelling of tool names:
“calculator”
or“compute”
is invalid, and the parameter name is provided instead of its value:“args”: distance/time”
Format non-standardized:“args": "3km in 10minutes”
Efficiently absorb and utilize information from past observations, whether in the initial context or in the back of an observation after using a tool. So, what would a complete agent setup look like?
We've just been in