What is shared today is [2023 China Unicom Artificial Intelligence Privacy Protection*** Report Producer: China Unicom.
Featured Reports** Public Title: A global repository of industry reports
1. Generative AI privacy risks
The endogenous privacy risk of generative AI is mainly the risk of data leakage caused by the process of using generative AI models. On the one hand, when users interact with generative AI models such as ChatGPT, they sometimes input prompt commands containing private data, and these instructions are recorded and stored indiscriminately. Due to the lack of access restrictions to the respective data, there is a risk that the user privacy contained in these directives will be compromised. On the other hand, generative AI models generate new data through the Xi of massive training data, and the current generative AI models represented by ChatGPT are basically recombinant innovations, and when performing forward inference, the model has the risk of transforming and splicing the privacy data contained in the training data to generate output, exposing it to irrelevant users.
2. Artificial intelligence privacy protection and control technology
Permission management restricts users' access to authorized resources based on preset rules or policies to protect system security and data integrity. Access control is a means of ensuring that the resources of a data processing system can only be accessed by authorized entities in an authorized manner. The privacy and security of AI data and models can be ensured by implementing access and use control mechanisms for access and use of AI systems, so that only authorized personnel can access and use specific data.
3. Artificial intelligence privacy protection data encryption technology
Homomorphic encryption is a form of encryption that allows users to perform specific algebraic operations directly on ciphertext, and the resulting data is still the result of encryption, and the result is the same as performing the same operation on the plaintext and then encrypting the result. Homomorphic encryption technology was first used to encrypt statistical data, and the homomorphism of the algorithm ensures that users can operate on sensitive data without leaking data information. Homomorphic encryption can be further divided into partial homomorphic encryption, slightly homomorphic encryption, and fully homomorphic encryption. Among them, partial homomorphic encryption technology only supports partial computation of ciphertext, slightly homomorphic encryption represented by BGN algorithm supports a finite number of calculations, and fully homomorphic encryption can perform arbitrary homomorphic operations on ciphertext for an unlimited number of times. In the field of machine Xi, in order to achieve the confidentiality of user data, it is necessary to combine encryption technology to protect data. However, the computational complexity of traditional cryptography methods is very large, while fully homomorphic encryption has obvious advantages in computational cost because it allows arbitrary operations to be performed on encrypted data without decryption. The machine-based Xi privacy protection scheme based on homomorphic encryption is divided into the homomorphic encryption privacy protection scheme without polynomial approximation and the homomorphic encryption privacy protection scheme based on polynomial approximation.
4. Artificial intelligence privacy protection attack defense technology
Recently, with the rapid rise and application of large language models, researchers have proposed a prompt attack defense method and a generated content detection and filtering defense method to prevent the threat of prompt attack and the privacy leakage of generated content by large models. For prompt injection attack defense, a simple and straightforward prompt injection attack defense strategy is to add the defense strategy to the instruction to increase the robustness of the instruction to enforce the desired behavior. Commonly used techniques include adjusting the position of the cue, marking with special symbols, etc. At the same time, the investigators proposed to construct a prompt detector to detect, classify, or filter prompts to prevent sensitive and harmful prompt input. At present, OpenAI's ChatGPT, Microsoft's NewBing, etc., have adopted this defense strategy. For generative content filtering defenses, the goal is to identify and avoid outputting private content. The methods of generating content detection mainly include the method of building a rule collection and the method of building a moderation model. By using these methods, the output content is detected and identified first, and then the privacy content is masked and filtered based on the detection and identification results, so as to avoid the generation of sensitive and risky content.
5. Emerging technologies for AI privacy protection
Fed Xi erated learning, proposed by Google in 2016, is a distributed machine learning Xi method in which multiple participants interact with model parameter information or gradient information through security mechanisms without interacting with data, so as to achieve collaborative training effects. Compared with the traditional centralized storage and training model, the federated academic Xi has the characteristics of "decentralization", which can achieve the balance between data privacy protection and data sharing analysis, and realize "data is available and invisible". The commonly used federated Xi technologies can be divided into three categories according to the data collection dimension, including horizontal federated Xi, vertical federated Xi and federated migration Xi, among which the representative algorithms and architectures are the Fed**G algorithm and WeBank's FATE architecture. At present, the federated Xi Xi technology is widely used by technology companies such as Facebook, Amazon, and Apple, and domestic fintech companies and universities are also making efforts to data privacy and security technology. In the field of communication, the federated academic Xi and the data of various network devices can be used to jointly train the model to optimize network site planning, and in addition, it can also promote cross-domain ecological cooperation centered on communication operators.
This article is for informational purposes only and does not represent any investment advice from us. To use the information, please refer to the original report. )
Featured Reports** Public Title: A global repository of industry reports