"Social engineering" is a tried-and-true tactic used by hackers against the human factor of a computer's security system, often because it is easier than defeating sophisticated security techniques. As new artificial intelligence becomes more human-like, will this approach work for them?
Not to be confused with the concept of moral dubious in political science in the field of cybersecurity, social engineering is the art of using psychological manipulation to get people to do what you want. If you're a hacker, the things you want people to do include giving away sensitive information, handing over passwords, or making payments directly to your account.
There are many different hacking techniques in the realm of social engineering. For example, leaving malware-infected flash drives haphazardly depends on human curiosity. Since the "human" is in ** or in a chat room with you almost certainly going to end up being some type of AI chatbot, this raises the question of whether the art of social engineering is still valid for synthetic targets.
Chatbot jailbreaks have been around for a while, and there are plenty of examples of how someone can convince a chatbot to violate its own rules of behavior, or otherwise do something completely inappropriate.
In principle, the existence and effectiveness of jailbreaking suggests that chatbots may actually be vulnerable to social engineering. The chatbot developers had to repeatedly narrow it down and set up strict guardrails to ensure it behaved properly, which seems to have sparked another round of jailbreaks to see if those guardrails could be exposed or circumvented.
We can find some examples posted by user X, such as Dr. Paris Buttfield-Addison who posted a screenshot that apparently shows how to convince a bank chatbot to change its name.
For example, the idea that a bank chatbot can be persuaded to drop sensitive information is really worrying. Then again, the first line of defense against this type of abuse is to avoid giving these chatbots access to such information in the first place. It remains to be seen how much responsibility we can take for such software without any human oversight.
On the other hand, for these AI programs to function, they need access to information. So it's not really a solution to keep them away from the information. For example, if an AI program is processing a hotel reservation, it needs to access the guest's details to get the job done. The fear, then, is that the shrewd ** might cleverly convince the AI to leak who is staying in the hotel and which room.
Another possible solution might be to use a "buddy". In the system, one AI chatbot monitors another bot and intervenes when it starts to get off track. Having an AI supervisor review each response before passing it on to the user could be one way to mitigate such approaches.
List of high-quality authors