Finance Associated Press, December 19 (edited by Niu Zhanlin).If there is one of the most powerful capabilities of the AI model, it must be the most basic text processing function, but researchers at a startup called Patronus AI have found that even the most powerful large model currently has no ability to accurately analyze the company's earnings report documents of the US Securities and Exchange Commission (SEC).
OpenAI's GPT-4-Turbo is arguably the best-performing AI model on the market right now, and in Patronus AI's latest test, only 79% of the answers to questions about SEC files were correct.
If you ask ordinary AI tools to answer these kinds of questions, they will either be unable to answer them, or they will appear "hallucinators", that is, they will make up numbers and facts that are not in the SEC documents.
Anand Kannappan, co-founder of Patronus AI, said: "Such performance is absolutely unacceptable, and it has to be much more accurate to really start working in an automated and production-ready way. ”
These findings highlight some of the challenges facing AI models, as large companies, especially in regulated industries such as finance, are seeking to incorporate cutting-edge technologies into their businesses, whether it's customer service or data research.
Since the launch of ChatGPT late last year, the ability to quickly extract important numbers and text, as well as analyze financial statements, has been seen as one of the most promising applications of chatbots. And the SEC's filings are full of important data, and if AI can accurately summarize that data or quickly answer questions about what's in it, it could give users an edge in the competitive financial industry.
Therefore, major investment banks and financial companies are making arrangements for this. BloombergGPT, the world's largest financial information company, has released a large model built specifically for the financial sector, business school professors have studied whether ChatGPT can analyze financial headlines, and JPMorgan Chase is developing an AI-powered automated investment tool. According to a recent McKinsey report, generative AI could generate trillions of dollars in revenue for the banking industry each year.
Applications in the financial sector
But the entry of AI into the financial industry has not been smooth. When Microsoft first launched the Bing chatbot using OpenAI's large model, one of its prime examples was a quick summary of results press releases. Observers quickly realized that the figures released by Microsoft were wrong, and some of them were even completely made up.
Part of the challenge with incorporating large models into actual products is that they are uncertain – they don't guarantee the same output for the same input every time, according to the co-founders of Patronus AI. This means that companies need to conduct more rigorous testing to ensure that they are functioning correctly, not deviating from the topic, and providing reliable results.
Patronus AI tested four large models: OpenAI's GPT-4 and GPT-4-Turbo, Anthropic's Claude2 and Meta's LLAMA 2. After conducting relevant tests, the two co-founders of Patronus AI were surprised by the poor performance of the large model.
Rebecca Qian of Patronus AI, noted, "Surprisingly, large models often refuse to answer questions, and the rejection rate is very high, even if the answer is in context, even if the answer is in context, even if it is a question that an ordinary person can answer. ”
However, the company also believes that if AI continues to advance, large models like GPT will have great potential to help people in the financial industry – whether analysts or investors.
A representative of OpenAI noted that the company's usage guidelines prohibit the use of OpenAI models to provide tailored financial advice without a qualified person reviewing the information, and require a disclaimer for anyone who uses OpenAI models in the financial industry. OpenAI's usage policy also states that OpenAI's model has not been fine-tuned to provide financial advice.