Meta Releases New AI Models for Text, Image, and Music Generation

Meta’s FAIR team releases Chameleon model for processing text and images together.
New multi-token prediction method improves LLM efficiency.
JASCO model allows controlled AI music generation with text and other inputs.
AudioSeal detects AI-generated speech faster and more accurately.
Meta introduces tools to enhance diversity in text-to-image generation.

Meta’s Fundamental AI Research (FAIR) team has spent over a decade conducting open AI research. As technology rapidly innovates, collaboration within the global AI community becomes evermore vital to Meta.

Today, Meta is pleased to present some of their latest FAIR research models with the global community. Through sharing this work openly, they aim to spark iterations on this research while responsibly furthering AI technology advancement.

Meta’s Chameleon Model Can Process and Generate Both Text and Images

Meta are pleased to release key components of the Chameleon models under an academic research license, providing key pieces that allow this mixed-modal family of models to comprehend images as well as text for understanding purposes. Chameleon can process both words and images simultaneously just like humans can; similarly it delivers both image and text at once. Although most large language models produce unimodal results (converting text to images for instance), Chameleon can take any combination of text and images as input and produce any combination as output – opening up endless opportunities ranging from creating creative captions for images or using both prompts and images together to form entirely new scenes!

Multi-Token Prediction Aids AI Models to Accurately Predict Words

Trained on large volumes of text, large language models (LLMs) have already proved valuable tools in aiding people generate creative text, brainstorm ideas and answer questions more quickly and accurately than before. LLMs focus on one training objective – anticipating what the next word might be – making the approach simple but inefficient: children typically require significantly fewer texts before reaching language fluency themselves.

Meta recently unveiled an innovative method to develop superior and faster LLMs: multi-token prediction. Utilizing this strategy, Meta train language models so as to predict multiple future words simultaneously instead of performing one prediction per token as was done previously. Furthermore, in accordance with responsible open science principles and as part of responsible open science initiatives such as Open Knowledge Exchange Initiative and OSF Open Science Platform, these pretrained models for code completion under noncommercial, research license are made freely available for code completion use by anyone worldwide.

JASCO Offers More Control Over AI Music Generation

Generative AI has enabled people to unleash their creativity in exciting new ways, like turning text prompts into musical arrangements. Although existing text-to-music models such as MusicGen only accept text input for music production, our new model, JASCO is capable of accepting other inputs like chords or beats so as to increase control of its generated music outputs.

This allows the inclusion of both symbols and audio in one text-to-music generation model.

Results indicate that JASCO stands up well against evaluated baselines when it comes to generation quality while offering more versatile controls over its output music.

AudioSeal Helps Spot AI-Generated Speech

At Meta, they have also introduced AudioSeal as the first audio watermarking technique designed specifically to detect AI-generated speech locally within audio snippets. AudioSeal makes it possible to isolate individual AI segments within longer audio snippets for detection using AudioSeal technology.

AudioSeal stands apart from conventional methods by employing its localized detection approach for faster and more effective detection, outstripping traditional methods by up to 485 times in speed compared to prior methods and making it suitable for large-scale and real-time applications.

AudioSeal will be made available under a commercial license and represents one line of research conducted by Meta to prevent misuse of generative AI tools.

Enhancing Diversity in Text-To-Image Generation Systems

It is vital that text-to-image models serve all groups equally and accurately reflect our globalized society, which means creating automatic indicators that measure potential geographical disparities within these text-to-image models. Meta has developed such indicators.

Meta conducted an annotative research project to better understand how perceptions of geographic representation vary among people from various regions, collecting over 65,000 annotations with 20+ survey responses per example regarding appeal, similarity, consistency and shared recommendations to improve automatic and human evaluation of text-to-image models for better diversity and better representation in AI generated images. This allowed more diversity to be represented more accurately within AI generated images.

Today, Meta is unveiling our geographic disparities evaluation code and annotations, hoping they’ll assist the community in improving diversity within generative models.

Related Topics:AI Models AudioSeal Chameleon JASCO Meta

Up Next

Bring New Photo Sharing Experiences: TikTok Introduces Whee

Don't Miss

Snapchat and IAS Collaborate for Enhanced Brand Safety

Click to comment

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Client Support

Meta Releases New AI Models for Text, Image, and Music Generation

Meta’s Chameleon Model Can Process and Generate Both Text and Images

Multi-Token Prediction Aids AI Models to Accurately Predict Words

JASCO Offers More Control Over AI Music Generation

AudioSeal Helps Spot AI-Generated Speech

Enhancing Diversity in Text-To-Image Generation Systems

Leave a Reply

THAILAND EVENT 2024

Text Translator

SPRING ISSUE 2025

GLOBAL BRAND AWARDS EVENT 2025

Top Reads

Impact of Remote Work Policies on Revenue Growth: Scoop and Boston Consulting Group Report

Top 10 Mobile Brands in the World

The Future of Healthcare? Etiome’s Bid to Redefine Early Detection

Top 10 Speaker Brands in the world 2023

Related Reads

Meta’s Ray-Ban Smart Glasses Now Feature Hands-Free Instagram Sharing

Meta’s Project Waterworth: World’s Longest Sub-Sea Cable for Global Connectivity

Instagram Eyes Standalone Reels App Amid TikTok’s US Uncertainty

From Text to Talk: Meta’s Move Into Synthetic Voice with Play AI Signals Major Shift

Global Brand Awards

Client Support

Awards Nomination

Contact Us

Global Brand Awards Winners

Find Us On Social Media

Global Brand Awards Ceremony

Meta’s Chameleon Model Can Process and Generate Both Text and Images

Multi-Token Prediction Aids AI Models to Accurately Predict Words

JASCO Offers More Control Over AI Music Generation

AudioSeal Helps Spot AI-Generated Speech

Enhancing Diversity in Text-To-Image Generation Systems

Leave a Reply

THAILAND EVENT 2024

Text Translator

SPRING ISSUE 2025

GLOBAL BRAND AWARDS EVENT 2025

Top Reads

Impact of Remote Work Policies on Revenue Growth: Scoop and Boston Consulting Group Report

Top 10 Mobile Brands in the World

The Future of Healthcare? Etiome’s Bid to Redefine Early Detection

Top 10 Speaker Brands in the world 2023

Related Reads

Meta’s Ray-Ban Smart Glasses Now Feature Hands-Free Instagram Sharing

Meta’s Project Waterworth: World’s Longest Sub-Sea Cable for Global Connectivity

Instagram Eyes Standalone Reels App Amid TikTok’s US Uncertainty

From Text to Talk: Meta’s Move Into Synthetic Voice with Play AI Signals Major Shift

We Care About Your Privacy