Recently, the U.S. Department of Defense admitted that machine learning algorithms were used to identify targets in 85 airstrikes in Iraq and Syria this year, which is the first time that the U.S. side has admitted that artificial intelligence technology has been used in actual combat.
For the first time, artificial intelligence was used in real combat
According to Bloomberg, the U.S. ** Command, which is responsible for the Middle East, Central Asia and parts of South Asia, used target recognition algorithms from the military artificial intelligence project M**en in seven air strikes on Feb. 2 this year, covering multiple locations in Iraq and Syria.
Schuyler Moore, chief technology officer of U.S. Command, said the military began deploying the "M**EN Project" computer vision system in combat operations after Hamas surprised Israel last year.
Everything changed on Oct. 7," Moore told Bloomberg, "and we immediately started running at high speed, much faster than we were used to." ”
These object recognition algorithms are used to identify potential targets and are finally operated by humans** the system. The U.S. reportedly uses the software to identify enemy rockets, missiles, drones, and militia facilities.
In fact, as early as 2017, the Pentagon began to develop a military artificial intelligence project codenamed "M**EN", and looked for a company that could develop object recognition software for the images taken by drones. In 2017, U.S. Marine Corps Colonel Drew Cuckor said the Pentagon wanted to integrate M**EN software with the platform in order to gather intelligence.
While Google pulled out of the project due to its employees' use of AI for war, other Silicon Valley tech companies were happy to help drive the M**en project.
Transform the battlefield with large language models
The U.S. military is not satisfied with using target recognition machine learning algorithms to improve the accuracy and efficiency of air strikes, and the "battlefield brain" based on large language models is the key technology that can really change the battlefield situation. The ability of large language models to analyze and generate text can improve the Pentagon's ability to gather intelligence and plan operations to guide battlefield decision-making.
In 2023, with the advent of large language models, the United States has accelerated the militarization of artificial intelligence technology.
In August 2023, the U.S. Department of Defense established Task Force Lima to study what engineering intelligence can do for the military, with the stated goal of protecting the task force was formed by Deputy Secretary of Defense Kathleen Hicks and led by the Chief Office of Artificial Intelligence. It will analyze different tools such as large language models and figure out how to integrate them into the military's software systems.
In October 2023, the U.S. Bureau of Artificial Intelligence Security established the AI Security Center to oversee AI development and integrate it into defense and defense systems. According to the U.S. Department of Defense, the AI Security Center will centrally manage best practices for AI applications in critical systems, as well as assessment and risk frameworks.
Craig Martell, the chief AI officer at the U.S. Department of Defense, pictured a scenario for large language models to guide operational decision-making last week at the "Defense Advantage: Defense Data and Artificial Intelligence Symposium" last week: "Imagine a world where combat commanders can see everything they need to make strategic decisions, and turnaround times for situational awareness (on the battlefield) have been reduced from one or two days to 10 minutes." ”
Although Martel's idea is tempting enough, the US side's "large-scale model operation" plan does not seem to be going well. "In the last 60 to 90 days, we've had more opportunities for target identification," Moore revealed, adding that U.S. Command is also trying to run an artificial intelligence recommendation engine to see if it can suggest the best combination to use in military operations and develop a plan for attack. However, the technology "often falls short".
The challenge of military large language models
The biggest obstacle to military large language models is that the accuracy of current large language models is far from the "military level" that can operate independently and reliably. "No algorithm can operate completely autonomously, draw conclusions and then advance to the next step," Moore noted, "Every step involving AI ultimately has a human check." ”
Data security is also an AI security issue that the United States is focusing on. Although ChatGPT is currently the most powerful large language model application (and OpenAI has revised its use policy in a month to acquiesce to the military's use), the U.S. Department of Defense obviously cannot accept the data security problems that are prevalent in ChatGPT, a general-purpose large language model. According to reports, the U.S. military has banned the use of tools like ChatGPT internally. For example, the U.S. Space Force told staff not to use ChatGPT for fear that military secrets could be leaked or extracted.
Because military data is often highly sensitive, the U.S. military is concerned that an instant injection attack or API misuse could lead to a data breach if the data gets into a large language model.
In search of an ideal replacement for ChatGPT, the U.S. Department of Defense is ramping up efforts to integrate and test the combat capabilities of artificial intelligence. For example, the U.S. Department of Defense is partnering with startup Scale AI to test military generative AI models.
The biggest security problem with the entry of military large language models into actual combat is that they are prone to produce inaccurate or false information, the so-called machine hallucination. The Pentagon believes that by introducing Scale AI, the performance of different models can be tested to identify potential risks before considering using them to support operations or intelligence.
ScaleAI is responsible for developing a framework of tools and datasets for the Pentagon to evaluate military large language models. The framework's capabilities include "measuring large model performance, providing real-time feedback to warfighters, and creating specialized public sector evaluation sets to test AI models for military applications."
It is the responsibility of the U.S. Department of Defense to pursue generative AI models while taking appropriate protective measures and mitigating the risks that may arise from issues such as mismanagement of training data," said Martel, Chief AI Officer of the U.S. Department of Defense. ”