Today, AI OpenAI was sued by a number of writers again, and Meta released a 2 second AI translation

Mondo Culture Updated on 2024-01-30

OpenAI recently announced that ChatGPT's voice feature is now available to all free users. This means that you don't have to pay anything, you can have a natural conversation with ChatGPT and experience the magic of interacting with a real person.

When the app is updated on the Google Play Store, you'll see the update in the "Events & Orders" section, which says, "Now you can interact naturally with ChatGPT with your voice, whether it's small talk on the go, reading a bedtime story to your family, or settling a debate over dinner, ChatGPT has you covered." Just ** the ChatGPT app and tap on the headset icon to start the conversation!While there has been no announcement from Apple's App Store yet, the feature is already available in the iOS app.

On December 21, the medical vertical domain model independently developed by Yidu Technology was officially released, which is the first professional large language model for multiple scenarios in the medical vertical field in China, providing professional medical-level personalized services for the C-end and helping to improve the quality and efficiency of medical, teaching, research, management and other scenarios for the B-side.

At present, the evaluation performance of the Yidu technology model in multiple medical clear task scenarios such as sub-guidance, basic medicine, and general medicine exceeds GPT35. It has been applied in many head hospitals. On the same day, Yidu Technology and Huawei signed a deepening cooperation agreement to jointly launch smart medical solutions to promote the intelligent transformation of the healthcare industry.

On December 20, local time, 11 non-** American writers, including Pulitzer Prize winners Taylor Blanche and Stacy Schiff, sued the American artificial intelligence company "Open Artificial Intelligence Research Center" (OpenAI) and Microsoft in Manhattan Federal Court in New York, accusing them of abusing their works to train chatgpt.

The writers told the court that OpenAI had infringed the copyright of their work by copying their work from the internet in bulk without permission and including it in ChatGPT's training data. They also said that because Microsoft was "deeply involved" in training and developing AI models, it should also be liable for infringement. The writers filed a claim for an unspecified amount of damages from the court and asked the court to order the companies to cease and desist from copyright infringement.

Meta has recently released a series of AI translation models, which achieve real-time speech conversion latency of no more than 2 seconds, support multiple language translations, and have the ability to imitate characteristics such as tone, speech speed, and emotion. This family of models, called Seamless Communication, includes SeamlessExpressive, SeamlessStreaming, SeamlessM4T v2, and Seamless, the first three of which have been open-sourced on GitHub.

To ensure translation accuracy and avoid abuse, Meta employs toxicity mitigation technology that filters out "toxic content" before training and automatically detects and adjusts the generated toxic words during translation generation, while watermarking the audio to track**. To prevent the risk of abuse, Meta has also added a watermark to the audio, which allows you to accurately track the audio by embedding an imperceptible signal in the audio and combat various attack vectors.

Tianyancha APP shows that recently, Beijing Chehejia Information Technology, an affiliated company of Li Auto, has applied for the registration of the "Li-AI" trademark in the category of scientific instruments again, and the current status is also pending substantive examination. It is reported that in September last year, the company applied for the "Li-AI" trademark for scientific instruments and services, and not long ago, the above two trademarks were rejected.

Recent Gemini-Pro reviews have reportedly shown significant progress in the multimodal space, matching and in some ways outperforming the GPT-4V. First, in the combined performance on the multimodal proprietary benchmark MME, the Gemini-Pro came in at 1933A high score of 4 surpasses GPT-4V, demonstrating an all-around advantage in perception and cognition.

Secondly, among the 37 visual comprehension tasks, Gemini-Pro excelled in tasks such as text translation, color landmark character recognition, OCR, etc., while GPT-4V scored 0 on the celebrity recognition task. In the areas of advanced cognition, challenging visual tasks, and various expert abilities, Gemini-Pro demonstrated strong visual perception and understanding, but did not perform well on positional recognition tasks.

On December 20th, UnionTech Software officially released UnionTech UOS AI V1Version 1, and signed the "Lighthouse Project" with a number of large-scale model partners. According to the official introduction, UOS AI V10 realized the unified management of large models, completed the adaptation of 5 mainstream large models, and successfully connected to the local model. In addition, applications such as browsers, global search, mailbox, and writeaway are fully connected to UOS AI, realizing the intelligent upgrade of application experience.

uos ai v1.Version 1 brings a newly upgraded desktop intelligent assistant, which supports natural language interaction, covering 40+ scenarios such as opening apps, setting system functions, creating schedules, etc., and supporting knowledge Q&A, content creation, etc. At the same time, UOS AI v11. It also supports cloud-side and device-side model access, ** access to mainstream large models at home and abroad, including Qianfan, Xunfei Xinghuo, Zhipu, 360 Zhibrain, etc.;The device side is connected to local models such as Wensheng graph, speech, natural language search, processing, and classification.

On December 21, KLCII announced the release of EMU2, a multimodal large model with 37 billion parameters.

According to reports, EMU2 greatly surpasses mainstream multimodal pre-trained large models such as Flamingo-80B and IDEFICS-80B in few-shot multi-modal understanding tasks, and achieves the best performance in a number of few-shot comprehension, visual question answering, and subject driving image generation tasks, including VQ**2, OKVQA, MSVD, MM-Vet, and Touchstone.

EMU2 exhibits strong multimodal contextual learning capabilities and can even solve tasks that require instant reasoning, such as visual cues and object-based generation. EMU2-Chat based on EMU2 fine-tuning can accurately understand ** instructions for better information perception, intent understanding and decision planning. EMU2-Gen can accept sequences of images, text, and interleaved positions as input to achieve flexible, controllable, high-quality image and ** generation. The research team also stated that EMU2 can serve as a base model and a common interface for a variety of multimodal tasks.

Related Pages