Releasing New AI Research Models to Accelerate Innovation at Scale
Meta’s Fundamental AI Research (FAIR) team has announced the public release of several cutting-edge AI research models, aiming to accelerate innovation and collaboration within the global AI community.
This move marks a significant step forward in the field of artificial intelligence, as the rapid pace of innovation demands increased collaboration to ensure responsible and beneficial advancements.
Meta Chameleon: Mixed-Modal Models for Text and Images
The Chameleon models, released under a research-only license, represent a significant breakthrough in mixed-modal AI capabilities.
Unlike most large language models that typically have unimodal results (e.g., turning text into images), Chameleon can process and generate both text and images simultaneously. This allows for creative applications such as generating captions for images or combining text prompts and images to create new scenes.
Multi-Token Prediction for Faster Language Models
To improve the efficiency of large language models (LLMs), Meta has proposed a new approach called multi-token prediction. This method trains language models to predict multiple future words at once, rather than one word at a time.
The pre-trained models for code completion are being released under a non-commercial, research-only license, with the goal of inspiring further iterations and advancements in language models.
JASCO: Enhanced Control Over AI Music Generation
JASCO, a new text-to-music generation model, offers more control over generated music outputs by accepting various inputs such as chords or beats.
This allows for the incorporation of both symbols and audio in the same model, providing significantly better and more versatile controls over the generated music. The results suggest that JASCO is comparable to existing baselines in terms of generation quality.
AudioSeal: Detecting AI-Generated Speech
AudioSeal, an audio watermarking technique, is designed to detect AI-generated speech within longer audio snippets.
This localized detection approach is faster and more efficient than traditional methods, enhancing detection speed by up to 485 times. AudioSeal is being released under a commercial license as part of Meta’s efforts to prevent the misuse of generative AI tools.
Increasing Diversity in Text-To-Image Generation
To ensure that text-to-image models work well for everyone and reflect geographical and cultural diversity, Meta has developed automatic indicators to evaluate potential geographical disparities.
A large-scale annotation study collected over 65,000 annotations and survey responses to improve automatic and human evaluations of text-to-image models. The geographic disparities evaluation code and annotations are being released to help the community improve diversity across their generative models.
By publicly sharing these research models, Meta aims to inspire further innovation and collaboration in the AI community, ultimately advancing AI in a responsible and beneficial manner.