Lu Xiaohua, Multimodal Communication and Multi dimensional Competition, a New Stage of Intelligent C

Mondo Technology Updated on 2024-02-01

Authors: Lu Xiaohua (Dean of the School of New and Communication, Tianjin University, Chair Professor; Head of the Research Base for International Governance of Cyberspace, Tianjin University; Academic advisor of this journal).

*: Young Journalists, Issue 17, 2023.

Introduction: The content directly generated by artificial intelligence according to human prompts enters the communication, which promotes intelligent communication to a new stage, and multimodal communication has become an important communication mode.

Modes of transmission are changing in a complex and diverse manner.

The continuous development of basic technologies related to communication has promoted the generational replacement of the old and the new, and at the same time, different communication modes have also emerged, such as newspapers and magazines and print communication, radio and television and radio wave communication, Internet and network communication, mobile platform and mobile communication, recommendation algorithm and intelligent communication, and so on.

Dissecting the changes in the communication mode in the digital transformation in another dimension, we will find that there are other communication modes in the real communication world, such as multimodal communication, which embodies the special laws and underlying logic of the new communication, has a huge impact on the real communication activities, and has become an important constraint and driving force for the communication, influence and guidance in the new stage of intelligent communication.

The development of intelligent content generation technology and its characteristics of directly generating content according to human prompts and entering the dissemination have brought intelligent communication to a new stage, and the content produced and distributed by artificial intelligence assisted by human beings and the content directly generated by artificial intelligence have begun to affect people together. In the past, it is difficult to summarize the reality of today and the future in the form of multi-content and the corresponding communication laws summarized by the comprehensive text, image, and other content forms of journalism and communication. Therefore, the concept of "multimodality" used in artificial intelligence should be borrowed, on the one hand, multimodal communication should be used to understand the communication mode that embodies the subversive innovation in the intelligent transformation, and deepen the understanding of the law of multimodal communication; On the other hand, based on deepening the understanding of regulatory rules and collaborative governance patterns, the ability to effectively use multimodal communication should be improved.

Multimodal communication is a disruptive and innovative communication method that adapts to intelligent transformation. In the stage of intelligent communication, the content directly generated by artificial intelligence has entered the dissemination, which can not only generate textual content such as reports, plans, poems, and reviews according to human prompts, generate digital content such as **, web pages, and data analysis, but also generate **, etc. With the development of generative AI technology, AI may also directly generate other forms of content based on human prompts or instructions in other ways in the future, and there will even be some "content" that we do not consider as content today. More importantly, these "contents" may also enter the dissemination and become an important force affecting human information acquisition, knowledge extraction, opinion expression, and consciousness formation. Therefore, from the perspective of the combination of technology and content, and from the perspective of the evolution trend and law of communication mode, multimodal communication has arrived and is becoming an important factor in determining communication power and influence.

In order to grasp the law of communication and achieve more effective communication, we should not only understand the changes in the communication mode from the technical dimension, not only grasp the communication law from the progressive progress of print communication, radio wave transmission, network communication, etc., but also recognize and grasp the changes in the communication mode that actually exist due to the changes in the underlying logic of communication in the digital transformation and intelligent transformation. It is necessary to re-understand and understand the communication laws and competition patterns of this era with new eyes and thinking, and adjust their thinking and behavior accordingly.

Multimodal content and multimodal propagation.

In the field of artificial intelligence, multimodality refers to the perception, recognition, processing, understanding, and collaborative reasoning of heterogeneous modal data such as text, images, and speech, so as to understand the external world more accurately.

The multimodal content in multimodal communication includes at least nine categories: first, the integrated content that integrates text, graphics, voice, and emojis in the concept of integrated communication; the second is to provide professional live broadcast of radio and television institutions with information flow, sense of ritual and interactivity, and fully integrate the new live broadcast of various new means of expression and mobile phone live broadcast with distinctive personality; the third is a new form of content with the characteristics of audible, visual, interactive, felt, experiential, and shareable; fourth, data visualization content based on data mining; Fifth, the data set and data flow content based on the first-class database platform, such as the financial information service system of Reuters and Bloomberg; Sixth, data products that embody new concepts, processing methods and expression logic, such as Johns Hopkins University's "Global Epidemic Map" that associates epidemic data with location, automatically captures and updates, and reviews and corrects after release; Seventh, various technical means that can bring immersive experience, such as VR and other interactive content used and disseminated; Eighth, text, images, audio, virtual scenes and other content made based on deep learning, virtual reality and other generative synthesis algorithms; The ninth is the content directly generated by intelligent content generation tools and applications based on multi-modal large models.

Multimodal communication, that is, the content produced by digital technologies including artificial intelligence and applications-assisted digital content production and the content generated by generative artificial intelligence are disseminated with the help of recommendation algorithms. Compared with intelligent communication, multimodal communication not only focuses on the characteristics, rules and methods of intelligent communication based on data analysis and algorithm recommendation, but also pays more attention to the communication characteristics, communication rules, communication methods and communication modes of multimodal content generated by deep synthesis technology and intelligent content generation technology.

From converged mobile propagation to multimodal propagation.

Exploring the use of artificial intelligence in all aspects of news communication is a change brought about by the rapid development of new artificial intelligence technologies and applications.

From the time of sending news with the help of SMS to make SMS evolve into the fifth**, mobile demand has become the first demand, and mobile communication has become the main feature of communication. With the application of digital technology to content collection, production, production, and distribution, communication has shifted from abstract communication based on text expression to more concrete communication with the help of ** and charts. The changes caused by digital technology, such as "pictures and truths", have accelerated the combination and even integration of various content forms and expressions such as text and images, and fully reflected them in mobile communication, thus forming a fusion mobile communication form that integrates multiple first-class content and reflects the requirements of mobile communication.

Driven by digital technology, the form of communication has evolved into integrated mobile communication, that is, it is both mobile and integrated, and the two have been combined with a new logic at a new level. Therefore, the communication activities of any disseminator are facing a new competitive environment, with new connotations and new competitive logic. It manifests itself in the following four aspects.

First, technical conditions and social development have changed the living environment and competitive logic of professional communicators.

At present, some ** still stay in the traditional operation form, because they have not seen the change in the logic of competition. Technology changes life is the basic social reality we are facing, everyone is getting information through mobile phones today, and their psychological structure and information judgment methods have undergone fundamental changes. The mobile Internet is highly permeated at all levels of society, and if we ignore this, we will ignore the basic external environment we are facing.

Second, group behavior and audience demand have changed the communication path and influence logic.

The relationship between the quality of content products and the motivation of the audience is well worth studying. If we look at the dissemination of movies and TV series with new eyes, we will find that although movies and TV series themselves are the entrance to content, the information transmission methods and communication paths that determine whether people watch movies and TV series have undergone fundamental changes. Back then, "The Wandering Earth" went against the trend from the original film schedule, relying on mobile to learn. Mobile knowledge, large screen**, social identity, and empathy-driven, these four points determine the basic communication path of today's film and television dramas. The empathy drive brings in a new audience. And the dissemination of some film and television dramas is generally still stuck in arousing people's interest by creating topics and making a little spoiler. If the basic communication framework and communication logic of film and television dramas do not change, it will be difficult to win a large audience. The ratings and box office of many TV series and movies with good quality are not as good as expected, which is related to this.

Third, image communication is changing to multi-information dimensional image communication.

When we discuss **, it is generally customary to divide it into long****, medium****, and short**. Under the condition of integrated mobile communication, if the vision and way of thinking of the first person are only the length of the image, without considering the narrative logic and multi-information dimension, it will not be able to adapt to the technical conditions and audience needs in the digital transformation. Today, the composition of the content produced by TV institutions and production institutions is changing, not only paying more attention to the value of communication, but also paying more attention to providing knowledgeable and methodological content, which is a choice for the main body of communication to seek new competitive advantages in the fierce competition. At present, the production of content products should pay more attention to finding value points and entry points, exploring the resonance points between the needs of the audience, and finally finding the resonance points with the audience's values. Only by grasping these four points to deal with the narrative logic of the image itself can it be welcomed by the audience.

Fourth, content imagination and multiple interactions expand the connotation and creative logic of professional communication.

If the dissemination of film and television dramas is regarded as a kind of professional communication, its connotation and creative logic have undergone important changes. Taking science communication as an example, today's audience wants to see not only the visualization and popularization of existing scientific knowledge, but also the content that can stimulate imagination and curiosity, and meet people's needs to explore the unknown and explore the future. Today, science communication must be supported by humanistic values, and must have both temperature and hardness. Therefore, works and programs in science and technology, science and technology and science fiction have become more popular. This analytical framework can be applied to other areas as well.

Fifth, the pursuit of promoting development and benefiting mankind has expanded the performance space and narrative logic of content products.

China has increasingly taken center stage on the world stage, and Chinese audiences have a broader perspective than ever before. Therefore, content production should give more consideration to the opening and application of international and domestic communication requirements, not only to express oneself, but also to reflect the willingness to exchange and learn from each other, share knowledge, enhance understanding and deepen understanding. The success of "Nezha" is to rediscover the cultural symbols of the Chinese nation from a global perspective, and the mascot of the Beijing Winter Olympics "Bingdundun" is also a panda that recreates one of the classic Chinese elements with a new vision. These successful cases remind us how to deal with the relationship between globalization and national characteristics today, and also remind us how to grasp the relationship between local content and national, wide-area and global communication. It is not only good at telling Chinese stories, but also good at telling stories of mutual learning between civilizations, which is one of the important entry points for telling Chinese stories to the world.

More importantly, it is necessary to move from integrated mobile communication to multimodal communication, and to the combination of new forms of content and digital technology, so as to better adapt to the changing laws of the underlying logic of new communication. Therefore, it is necessary to accurately grasp the new trend of digital technology development, with stronger scientific and technological thinking and grasp of digital technology routes, especially a clear understanding of the available technologies and their impacts, so as to realize the multi-modal communication from the production of multi-modal content to the requirements of the new stage of intelligent communication.

Multi-constraints: development and application of intelligent content generation.

With the huge impact of generative AI, more entities are investing resources to join the competition, and at the same time, the intensity of relevant rule-making and collaborative supervision is significantly greater than when some new technologies appeared in the past. Moreover, there are also some cases worth pondering.

One of the latest cases is patent opening. On August 11, 2023, Ali Damo Academy announced that it will open 100 AI patent licenses to the public for free, covering many AI technology fields such as image technology, ** technology, 3D vision, etc., including patents with broad application scenarios such as "traffic light perception", "suspected infringement ** detection", "time series data**", "point cloud data processing", "intelligent subtitle generation", etc. This is the largest Chinese artificial intelligence patent opening action.

Historically, there are many precedents in the history of the Internet to build an ecosystem with a free and open strategy to enhance competitiveness. In the competition of mobile operating systems, the Android system entered the competition with Apple's iOS system in the early form of free licensing. Originally developed by Andy Rubin to support mobile phones, the Android operating system was acquired by Google in August 2005. In November 2007, Google and 84 hardware manufacturers, software developers and telecom operators formed the Open Handset Alliance to jointly develop and improve the Android system. Subsequently, Google released the source of Android under the authorization of the Apache open source license**. In October 2008, the first Android smartphone was released, and since then Android has gradually expanded to tablets, TVs, digital cameras, game consoles, smart watches, etc. In the first quarter of 2011, Android's global market share surpassed that of Nokia's mobile phones for the first time, surpassing the Symbian system used in Nokia mobile phones, and jumped to the first place in the world. In the fourth quarter of 2013, the global market share of Android phones reached 781%。On October 29, 2018, Android officially charged the EU region a licensing fee.

Objectively, the Android system has attracted a large number of developers by virtue of its openness, promoting the improvement of its ecosystem. Since 2023, many large model developers have opened up their API interfaces to attract other developers to develop various applications around large models. The free opening of 100 AI patent licenses by Ali DAMO Academy will not only introduce new factors and new conditions for the development and competition of large models, but also provide new opportunities for the application of intelligent content generation technology in the field of news, communication and publishing.

Another case is that on June 28, 2023, two writers filed a copyright class action lawsuit against Open AI in the U.S. District Court for the Northern District of California, accusing Open AI of using the plaintiff's copyrighted books to train ChatGPT for commercial gain. This is known as "the first representative ChatGPT copyright infringement lawsuit". According to the complaint, the plaintiff did not authorize Open AI to use its copyrighted books for model training, but ChatGPT was able to generate more accurate summaries based on prompts (although there were a few errors). The plaintiff deduced from the existing facts and information that the only explainable reason why ChatGPT can accurately generate a summary of a specific book is that Open AI obtained and copied the book in question and used it in its large language model (GPT3.).5 or GPT4). [1]

There are four points in this case that deserve special attention from large model researchers, industry users, and regulators.

First, the plaintiff in this case discovered the method of infringement of the large model, which may be used as a reference by later generations in China.

In this case, the plaintiff used the plaintiff's copyrighted books for the training of the large model behind ChatGPT by asking ChatGPT to output the abstracts of the copyrighted books owned by ChatGPT. "While much of the content is used to train large language models, books have been the core corpus of training datasets because they provide the best examples of high-quality long-form writing. [2] In the June 2018 publication of "Improving Language Comprehension through Generative Pre-Training", Open AI disclosed that the training of GPT-1 relies on the "Bookcorpus" dataset. "Bookcorpus" contains 7,000 books covering different fields such as adventure, fantasy, romance, etc. Objectively, books are an important part of the dataset used for large model training. According to statistics, as of the end of June 2023, more than 90 large models have been released in China. Do the datasets used for these large model training contain unauthorized books? Are there potential legal risks? How to view and deal with these constraints on the development of large models?

Second, the plaintiff in this case cited enterprise ** to prove its infringement, which will inhibit the disclosure of the technical details of the large model.

Judging from the relevant reports of this case, the plaintiff's complaint cited the relevant enterprise** to prove its infringement, such as the above-mentioned "Improving Language Comprehension through Generative Pre-training". Through a public search of Open AI's voluntary disclosure of information (enterprise**), the plaintiff hopes to demonstrate that the training of the GPT series model is based on the unauthorized infringement of massive book content. For example, in the July 2020 publication of "Language Models Are Small Sample Learners", Open AI disclosed that 15% of the content in the GPT-3 training dataset was stored in two e-book corpora named "Books1" and "Books2". "According to Open AI's disclosure, Books1 is 9 times the size of BookCorpus (about 6.).30,000 books), books2 is 42 times (about 29.).40,000 books). From this, the plaintiff deduced that "'books1' is most likely to be the 'Gutenberg Project' or 'Gutenberg Corpus Standardization Project'" and "books2 is very likely to be the 'shadow library' on the Internet". 3] In fact, in March 2023, Open AI released GPT-4 Enterprise**, but said that due to the industry competition situation and product application security considerations, the structure and content of the training dataset will no longer be disclosed.

Third, the plaintiff in this case provided inspiration and reference for the method of "self-proof principle" of the large model.

When the plaintiff in this case proved that Open AI was infringing, he talked to ChatGPT and asked it to "introduce itself" as evidence. The content includes the way to provide services and**, such as the web page mode is $20 per month, and the application developer exchanges data with ChatGPT through the API interface and is billed on a pay-as-you-go basis. This not only allows people to see how to use a large model, but also prompts large model developers, industry application developers and service providers to pay more attention to the "self-introduction" content generated by the large model, prompt it to generate self-introduction content from multiple dimensions during training, and evaluate its accuracy and derivative risks.

Fourth, the provision of services by large models through the Internet requires the study of international artificial intelligence regulatory rules.

On June 14, 2023, the European Parliament overwhelmingly passed the draft Artificial Intelligence Act with 499 votes in favor, 28 against, and 93 abstentions. This means that the European Parliament, EU member states and the European Commission are about to enter the stage of "tripartite negotiations" to determine the final terms of the bill. In 2021, the European Commission has proposed the first draft law to regulate AI, which has been revised and discussed by the European Parliament and the Council of the European Union several times. After the launch of generative AI based on large models, the draft law added corresponding provisions.

It is worth noting that, firstly, Article 2 of the draft clearly stipulates the scope of application of the law, that is, it applies to entities that put AI systems on the market or put into use in the EU (regardless of whether the entity is in the EU or a third country), entities that use AI systems in the EU, and entities that use AI systems in third countries, but the output of the system is used in the EU or has an impact on persons in the EU. This effectively clarifies the principle of extraterritorial jurisdiction of the law.

The second is to divide AI risks into four levels: unacceptable risk, high risk, limited risk and minimal risk, corresponding to different regulatory requirements. It proposes a strict ban on AI systems that pose unacceptable risks to human security, including systems that deploy subliminal or purposeful manipulation techniques, exploit people's weaknesses, or are used for social scoring.

Third, the bill requires each EU member state to establish a supervisory authority. [4]

Fourth, the bill has added new provisions due to the launch of ChatGPT, which improves the transparency and risk assessment requirements of large models, and requires disclosure of whether copyrighted materials are used to train large models. [5] This law deserves the attention and study of large-scale models and their industry application researchers and service providers, including China. After all, users outside the domain may also use domestically developed multimodal large models.

These two cases reflect that generative AI has entered a multi-constraint development environment, and is about to enter or has entered the stage of multi-dimensional competition. How to adapt to agile supervision and collaborative supervision in the field of artificial intelligence, and how to keenly discover and prevent possible copyright and infringement risks are major issues that service providers and users of intelligent content generation technology must face.

Multi-dimensional competition: the basic characteristics of the new stage of intelligent communication.

Calm observation, multi-dimensional competition is the basic feature of the new stage of intelligent communication.

First, from the perspective of content influence formation, there is competition between multimodal content in multimodal communication. The nine types of content of multi-modal content, including text, charts, voices, emojis, etc., are the focus of deep integration and transformation, and the quantity, quality, and variety are constantly improving; But at the same time, data visualization content, data products, new forms of content, and interactive content are becoming more and more influential.

Second, content directly generated by artificial intelligence and content produced by humans are competing for influence. This is reflected in the various effects of the pertinence, knowledge, and experience of the content directly generated by AIGC after the launch of ChatGPT.

The third is the competition between AI-generated multimodal content. In the early days, people paid attention to whether they could compose poems, write plans, change **, etc.; After that, whether you can generate a variety of styles** according to the prompts becomes the focus of people's attention. These are not only usable, have the possibility of direct replacement for human work, but also have strange characteristics that will produce unexpected results.

Fourth, competition for intelligent content generation is forming, which not only includes attracting users with targeted development in various industries, but also building an ecosystem with open API interfaces.

Fifth, the competition formed by the diverse use of intelligent content generation. In the early days of the launch of ChatGPT and Wenxin Yiyan, people paid attention to whether the content generated by them according to user prompts was accurate, professional and targeted. At this stage, more imaginative users value the uncertainty of the content generated by the large model, and strive to use various prompt word combinations to explore and amplify the uncertainty of intelligent content generation, so as to use it to stimulate creativity and expand the application value.

Therefore, in the new stage of intelligent communication, the cognitive level of multimodal content, the imagination of intelligent content generation and utilization, and the ability to use eclectic methods determine its intelligent competitiveness in a sense.

This paper is the phased achievement of the National Social Science ** Major Project "Research on the Theory, Methods and Practice of Digital Journalism" (Grant No.: 20 ZD317).

References: 1][2]kaysenChatGPT copyright case No. 1: OpenAI faces six charges of being "caught" for outputting book abstracts [EB ol].(2023-08-06).

3]kaysen.ChatGPT copyright case No. 1: OpenAI faces six charges of being "caught" for outputting book abstracts [EB ol].Project Gutenberg is a collection of e-books that have "expired copyright protection" and are often used for AI model training. In September 2020, the Gutenberg Project announced that it had included more than 60,000 books. In 2018, an AI research team built on it to create the Standardized Project Gutenberg Corpus. The term "shadow library" was coined in 2011 by the Social Science Research Council of the United States in an article entitled "Piracy in Emerging Economies" to refer to a large number of books that are infringingly collected and freely available to the public.

4] Wang Wei. Seize regulatory opportunities The European Union's Artificial Intelligence Act enters the final negotiation stageRule of Law**, 2023-07-03(6).

5] The European Parliament approved the Artificial Intelligence Act, requiring the disclosure of the copyright of generative AI training data [EB OL].The Paper. (2023-06-15).

This article refers to the citation format:

Lu Xiaohua. Multimodal Communication and Multidimensional Competition: A New Stage of Intelligent Communication[J].Young Journalists, 2023(17): 60-63

Related Pages