Meta releases abstract chains, large model tools utilizing precision 6, speed 40!

Mondo Technology Updated on 2024-02-05

In the field of artificial intelligence, significant progress has been made in the development of large language models (LLMS), especially in understanding and executing instructions. However, these models still have errors when it comes to the need to invoke and combine real-world knowledge to generate responses. For example, they may make statements that do not conform to facts or make incorrect calculations. To address these issues, researchers propose to reduce these errors by using assistive tools such as search engines to provide reliable facts, calculators to perform accurate calculations, etc., which have inspired the development of tool-enhanced language models that integrate external API calls into output generation.

Still, current tool-enhanced LLMs, such as ToolFormer, face challenges in making reliable and efficient use of tools in multi-step inference. Especially in multi-step inference tasks, tool calls are often interleaved, that is, the response of one API call is often part of the query of subsequent calls. Without explicitly modeling the interconnections in these inference chains, LLMS will not be able to learn effective tool usage planning, resulting in reduced inference accuracy using tools. At the same time, interleaving text generation with API calls also introduces the problem of inference inefficiency, and the model must wait for the response of API calls before continuing the decoding process. This inefficiencies become more pronounced in multi-step inference scenarios, where each inference process typically requires multiple rounds of API calls.

This paper proposes a new method to evaluate fine-tuned models on two representative multi-step inference domains (mathematical reasoning and Wikipedia-based Q&A) by training LLMS to learn abstract reasoning chains (COAs), and shows how the method can improve the performance of LLMS while improving the efficiency of tool use, and proves through extensive human evaluation that the method guides LLMS to learn more accurate reasoning.

Title:chain of abstraction: a new approach to align large language models with real-world knowledge

Statement: This issue ** interpretation is written by non-humans, and the full text is independently completed by the Cyber Malean AI** interpretation expert agent, and is released after manual review and illustrations.

** Xi Xiaoyao Technology said "Background reply".Intelligent internal testing"Get the invitation link for the internal test of the intelligence.

In multi-step inference tasks, large language models (LLMS) rely on external knowledge, such as network facts, mathematical and physical rules, to achieve inference that matches human expectations. Auxiliary tools can help LLMS access this external knowledge, but there are still challenges when fine-tuning LLMS (e.g. Toolformer) to invoke tools. Especially in multi-step inference problems, tool calls are often interleaved, i.e., the response of one API call is often part of the query for subsequent calls. Without explicitly modeling the interconnections in these inference chains, LLMS will not be able to learn effective tool usage planning, resulting in reduced inference accuracy using tools. At the same time, interleaving text generation with API calls also introduces inefficient inference "wait times", where the model must wait for a response from the API call to continue the decoding process. This inefficiencies are even more pronounced in multi-step inference scenarios, where each inference process typically requires multiple rounds of API calls.

1.Definition and Objectives of COA Reasoning

COA (Chain-of-Abstraction) inference is a new training method designed to allow LLMS to learn how to plan abstract multi-step inference chains. In contrast to traditional COT (chain-of-thought) inference, COA inference does not generate concrete values, but rather abstract placeholders, allowing LLMS to focus on learning generic and holistic inference strategies without the need to generate instance-specific knowledge for the model's parameters. In addition, the decoupling of general inference from domain-specific knowledge enables LLM decoding to process different samples in parallel at the same time as API calls, that is, LLM can start generating the next abstraction chain while the tool fills the current chain, thus speeding up the overall inference process.

2.COA data construction and training process

To construct COA data for fine-tuning the LLMS, the researchers collected a sample of question answers (QA) from existing open-source QA datasets and prompted LLAMA-70B to rewrite the answers to each sample question. Specifically, they prompted LLAMA-70B to mark the span corresponding to a knowledge operation (e.g., mathematical derivation, Wikipedia-reference-based statement) in the gold-standard answer, and then rewrite the sentence with the marked span as a fillable COA trace, in which the result of the operation is replaced with an abstract placeholder. For example, a mathematical derivation is rewritten as "[20 + 35 = y1]" and "[90 y1 = y2]". In this way, intermediate results may appear multiple times in the rewritten answer, such as the result of a mathematical calculation in Figure 255. This rewriting method not only improves the average accuracy of LLMS in the field of mathematics and wiki QA, but also improves the inference efficiency, making the model more efficient in multi-step inference tasks.

1.Experimental setup in the field of mathematical reasoning

In order to evaluate the effectiveness of the COA method in the field of mathematical reasoning, we used a series of open-source mathematical problem-solving datasets, including GSM8K and ASDIV. We use the LLAMA-70B model, which rewrites the original answer as an Abstract Inference Chain (COA) by prompting, where concrete numerical values are replaced by abstract placeholders. For example, the number ** in the original answer is "20 + 35 = 55" is rewritten as "[20 + 35 = y1]". This design is designed to train the model to learn a generic inference strategy, rather than generating knowledge of a specific instance.

2.Experimental setup in the wiki QA domain

In the Wiki QA domain, we used the HotpotQA dataset to build granular COA data. HotpotQA contains 113k multi-hop Q&A examples, each tagged with two Wikipedia articles that provide supporting knowledge. We chose two types of questions, Bridge Qa and Comparison Qa, which involve identifying intermediate entities to connect questions and answers, and comparing the attributes of two entities, respectively. We utilize the LLAMA-70B model to rewrite these questions into a COA chain containing Wikisearch and NER queries, and verify the correctness of each COA with specialized tools such as the Wikipedia search engine and the NER Toolkit.

1.Results in the field of mathematical reasoning

In the field of mathematical reasoning, the COA method outperforms several baseline methods, including COT-FSP and COT-FT, on GSM8K and ASDIV datasets. The CoA method is particularly prominent on the two out-of-distribution datasets, SVAMP and MAWPS, showing its robustness in multi-step inference tasks. In addition, the COA method is superior to ToolFormer, which suggests that planning abstract variables in COA can improve the accuracy of inference using tools. The results of the human evaluation showed that the COA method was effective in reducing arithmetic errors and that reasoning errors were reduced compared to the baseline method.

2.Wiki Qa field results

In the Wiki QA domain, the COA method outperforms ToolFormer and FireAct in inference efficiency on the HotpotQA dataset. The COA method has not only achieved significant performance improvements on the HotpotQA development set, but also validated its zero-shot generalization ability on other open-domain QA datasets, including NaturalQuestions and TriviaQA. These results show that the COA method achieves more efficient multi-step inference performance by decoupling the generation of abstract inference chains from knowledge retrieval (i.e., tool use).

When solving multi-step inference problems, large language models (LLMS) need to combine the reasoning process with real-world knowledge, such as network facts, mathematical, and physical rules. In order to improve the accuracy of inference, researchers propose a chain abstraction (COA) method, which improves the performance of the model in a multi-step inference task by introducing abstract variables to plan the use of tools.

1.Design for chained abstract reasoning

The core of the COA method is to transform concrete knowledge operations in the reasoning process into abstract variables. This design allows the model to focus on learning a generic inference strategy without the need to generate knowledge of a specific instance for the model's parameters. For example, in math problem solving, the COA method converts concrete numeracy into expressions with abstract placeholders, such as rewriting "20 + 35 = 55" to "[20 + 35 = y1]", where "y1" is an abstract variable. This design allows the model to form a complete chain of abstract inference before calling an external API (e.g., a calculator).

2.Advantages of long-chain inference

The COA approach shows significant advantages when dealing with problems that require long-chain inference. It is found that when the problem requires more inference steps, the COA method can generate an inference chain that better matches the length of the inference chain than the traditional chain inference (COT) method. This is reflected in the heat map statistics below, where the length of the inference chain generated by the COA method is closer to the diagonal, i.e., it is more consistent with the length of the inference chain. In addition, the model achieves higher Q&A accuracy when the number of inference steps in the generated answers coincides with the number of parameters. These results suggest that the model trained by the COA method is better at generating matching inference chains.

To fully validate that the COA method improves knowledge manipulation (e.g., arithmetic) and reasoning accuracy, the researchers conducted a human assessment. In this assessment, the researchers randomly selected 200 GSM8K test questions and asked human staff to determine whether the model's answers contained any arithmetic errors (e.g., incorrect calculations, invalid equations) or reasoning errors unrelated to mathematical derivation (e.g., misinterpretation of questions, inappropriate strategies to solve problems).

1.Human assessment results

In the study, it was found that the COA method effectively reduced arithmetic errors to zero, thanks to the use of equation solvers for accurate calculations. More importantly, the COA method produces fewer inference errors than the baseline method, which verifies that the COA method learns more accurate inference through the overall planning guidance model of the abstract inference chain. In contrast, ordinary fine-tuning (i.e., cot-ft) has limited inference improvement compared to cot-FSP for a few examples, and it also fails to effectively suppress arithmetic errors.

2.Inference efficiency

The performance benefits of CoA inference do not come with a higher computational cost. The study demonstrated the average time (seconds) required to answer a question by the COA and baseline methods (based on LLAMA-2-CHAT-7B). Compared to the COT baseline approach, COA takes less time than a few example baselines COT-FSP, which relies on additional examples to build. However, COA is slightly less efficient compared to COT-FT, which may be due to decoding extra markers such as "[" and "]". Compared to ToolFormer, COA has a lower and flatter inference time curve, indicating that it has better scalability as the number of inference steps increases. The reason for this discrepancy is that COA decouples the generation of (abstract) chains of inference from knowledge retrieval (i.e., tool usage), allowing the chain of inference to be decoded in its entirety before any tool is invoked. This process amortizes the cost of inference in two ways: first, making tool calls after decoding the COA chain, allowing parallel tool calls to the same chain (e.g., using the equation solver once instead of calling the calculator multiple times), and avoiding the time delay caused by waiting for an external API response. Secondly, in multiple examples, the model can generate the COA chain of the next example while making a tool call for the previous example, which realizes the parallelization of COA decoding and tool calling.

The computational cost analysis of the COA (Chain of Abstraction) method is an important topic when large language models (LLMS) perform multi-step inference. The COA approach aims to improve the accuracy of reasoning by introducing abstract variables and to enable the execution of concrete knowledge through external tools. This approach has shown significant performance gains in multi-step reasoning areas such as mathematical reasoning and Wiki Qa.

1.The generation of the inference chain is decoupled from the invocation of the tool

The COA method achieves higher inference efficiency by decoupling the generation of the inference chain from the tool call. In the traditional ToolFormer model, tool calls during inference are sequential, which results in inference "wait times" while waiting for an API response. The COA approach allows the model to start generating the next abstract inference chain while waiting for the tool to populate the current chain, thus speeding up the overall inference process.

2.An empirical analysis of reasoning efficiency

In the empirical analysis, the coa method is about 1. faster than the previous enhanced method in terms of reasoning speed on mathematical and wiki qa tasks, respectively47 times and 133 times. This result shows that the COA method not only improves the accuracy of inference, but also significantly improves the inference speed.

3.Efficiency in multi-step inference scenarios

In multi-step inference scenarios, the COA method is particularly efficient, which can increase the utilization speed of large model tools by 40%. Since the COA method makes a tool call after decoding the abstract inference chain, this allows parallel tool calls to be made to the same inference chain, avoiding the time delay of waiting for an external API response. In addition, the coa method has a slower inference time growth curve when dealing with problems that require multiple inference steps, indicating that it can better maintain efficiency when the inference steps are increased.

By decoupling the general reasoning ability of LLMS from the use of external tools to perform specific knowledge, the COA method not only improves the accuracy of inference, but also significantly improves the speed of multi-step inference. The simple and effective implementation of this method shows its potential in diverse tasks such as mathematical reasoning and open-domain question answering, and provides the possibility of adapting to new reasoning scenarios in the future.

1.The potential of the COA approach

The COA method has shown significant potential in improving the accuracy and efficiency of LLMS for multi-step inference. Through the planning of abstract inference chains, the COA method can better adapt to the changes of out-of-distribution knowledge, and has shown good performance in different inference scenarios.

2.Future directions

Future research could explore the potential of the COA approach in a wider range of application scenarios, such as law, finance, or other fields that require complex reasoning. In addition, researchers can further optimize the inference efficiency of the COA method and reduce the dependence on external tool calls, so as to achieve a faster inference process.

3.Implications for future LLMS

The COA method provides a new perspective for the development of LLMS in the future, that is, the inference performance of the model can be improved through the planning of the abstract inference chain and the effective use of external tools. This provides important guidance for the design of smarter and more efficient LLMS, which is expected to promote the application and development of artificial intelligence in multi-step inference tasks.

Statement: This issue ** interpretation is written by non-humans, and the full text is independently completed by the Cyber Malean AI** interpretation expert agent, and is released after manual review and illustrations.

Related Pages