AI Revolution Hub
Posts
27 | Llama 3.1: What's the strategy behind open-source AI ? | Mistral Large 2 is live

27 | Llama 3.1: What's the strategy behind open-source AI ? | Mistral Large 2 is live

Wilson CELY
July 26, 2024

In partnership with

Our Menu :)

The Open-Source Gambit: Meta's Llama 3.1 🦙
MISTRAL LARGE 2! Now Rival to GPT-4 & Claude 3.5 🇫🇷💪
Waitlist for OpenAI’s search engine

Jobs and Training

Andrej Karpathy's "Zero to Hero"

Read time: 10 Minutes

WEEKLY DIGEST
The Open-Source Gambit: Meta's Llama 3.1 and the Future of AI 🦙 🏆

Meta has just dropped Llama 3.1, and for the first time, we're seeing an open-source model that can go toe-to-toe with the closed-source heavyweights like GPT-4 and Claude 3.5 Sonnet.

What does it mean ?

Imagine for a moment that the AI world is a bustling marketplace. In one corner, you have the luxury boutiques – the GPT-4s and Claude 3.5s of the world – offering high-end, proprietary AI models at premium prices. And then, seemingly out of nowhere, Meta shows up and starts handing out free samples that are nearly as good as the expensive stuff.

That's essentially what's happening with Llama 3.1. Meta has released not one, but three powerful AI models that anyone can download and use, even for commercial purposes. The largest model, 405B, is performing on par with the best closed-source models out there. It's like giving away a sports car that can keep up with a Formula 1 racer.

The Big Picture 🖼️

Open-Source Power: These models are open-source, allowing researchers and developers to use them freely, even for commercial purposes (with some limitations*).
Massive Context Window: The context window has been expanded from 8,000 tokens to a whopping 128,000 tokens! – that's about 200 pages of text!
Competitive Performance: The 405B model is giving GPT-4 and Claude 3.5 a run for their money in various benchmarks.
Distilled Knowledge: The smaller models (8B and 70B) have been improved using knowledge distilled from the 405B model.
Multimodal Promise: While not yet available, Meta hints at future multimodal capabilities for Llama 3.
The model also comes with new pieces in the "Llama system" - a system of tools to make it easier for developers to build applications with Llama models. It's a long way from "I have an open-source LLM" to "I have a working LLM-enabled product."
The Llama ecosystem has now grown to include an agentic framework, content and jailbreaking safeguards, and a proposal for new developer interfaces.

*License of Llama 3.1

Permissively licensed, including commercial use.
Any derivative artifact, including models and datasets, must be distributed with the Llama 3.1 license.
Companies with greater than 700 million active users at the time of release cannot use the model.
The modification of the fairly restrictive Llama 3 license

The Secret Sauce: Knowledge Distillation

Here's where it gets really interesting. Meta didn't just create three separate models – they used a clever technique called "knowledge distillation" to make the smaller models punch above their weight class.

Think of it like this: Imagine you had a world-class chef (the 405B model) teach a talented home cook (the smaller models) all their secrets. The home cook might not be able to replicate every complex dish, but they'd certainly pick up techniques that make their cooking far better than before. That's essentially what Meta has done, using the massive 405B model to improve the capabilities of the smaller, more accessible models.

It's similar to what OpenAI did with GPT-4 Mini. Announced last week.

Democratizing AI

The true significance of Llama 3.1 isn't just its technical capabilities – it's what it means for the AI ecosystem as a whole. By releasing such powerful models as open-source, Meta is essentially lowering the barrier to entry for AI innovation.

This could lead to:

Keep AI Gold Rush: When powerful AI becomes a free resource, innovation explodes.
Price Wars: As open-source models improve, commercial AI services will likely need to drop their prices or offer unique value to stay competitive.
Ethical Challenges: With great power comes great responsibility. As advanced AI becomes more accessible, we'll need to grapple with ensuring its responsible use.

Llama 3.1's architecture is simpler compared to other complex strategies, like a mixture of experts. Despite its simplicity, it requires significant resources: Meta used 16,000 H100 GPUs to train the 405B mode. In the words of Ben Thompson, "Meta trained Llama like a big company with lots of resources, while OpenAI trained GPT-4 like a startup."

But that brings us back to :

Why did Meta repurposed $100M worth of hardware to give away the result for free?

Mark Zuckerberg's letter provides valuable insights into Meta's strategy in the competitive AI landscape. He emphasizes the need for developers, CEOs, and governments to control their models, ensuring data protection, fine-tuning for specific use cases, and making appropriate long-term technology choices. Larger organizations often require more control over the technology they use, making open-source AI like Llama an attractive option.

Meta doesn't want to depend on a single AI provider or proprietary systems, so they benefit from Llama being open-source. Zuckerberg recognizes the intense competition in AI development, meaning the best models will keep changing. Open-sourcing a model won't give a lasting edge in this fast-paced environment.

Moreover, OpenAI, Anthropic, DeepMind, and other foundation model developers make money by selling access to their models, either per token or via product subscriptions. Meta already has a multi-billion dollar ad business, which means it can afford to give away the AI stuff for free.

That allow them to apply, a longstanding idea in tech "commoditize your complement" - to make adjacent layers of the stack/supply-chain/ecosystem commodities, so that your product benefits from more options and your competition fights each other

With the release of Llama 3.1, a significant ecosystem of partners is already in place, offering the model on various cloud platforms and providing fine-tuning and RAG systems.

This extensive support seems to create a much larger moat than having a model that marginally outperforms others on benchmarks, especially as AI models become increasingly commoditized. Additionally, Meta can position itself as an advocate for AI safety.

Numbers Don't Lie, do they? 📊

Let's talk performance. The 405B model is flexing hard:

MMLU-Chat (General Knowledge): 88.6
GSM8K (Math): 96.8
HumanEval (Coding): 89

Interpretation from Clement Delangue

What's Next? 🔮

They're integrating Llama 3.1 across Facebook, Instagram, WhatsApp, and even AI glasses.

Will we see a world where AI is as ubiquitous as cat photos on social media? Only time will tell!

Access

Read the 92 page paper here
For API based access, checkout current partners of Meta
Huggingface page to download Llama 3.1 models
Try it on Groq

But of course, for Europe there are and will be restrictions.

Meta will *not* release the multimodal versions of its AI products and models in the EU because of an unpredictable regulatory environment.
This means that EU users of Ray-Ban Meta won't be able to use the image understanding features.
It also means that the EU industry will not… x.com/i/web/status/1…
— Yann LeCun (@ylecun)
2:36 PM • Jul 19, 2024

The Catch (Because There's Always a Catch)

Before you go all-in on Llama 3.1, keep in mind:

It's not yet multimodal (no image or video processing... yet).
You'll need some serious computing power to run the full 405B model.
There's a learning curve to effectively utilizing open-source models.
The Llama 3.1 license is closer to freeware than open-source.

While the new Llama 3.1 license seems generous, it requires downstream models to include "Llama" in their name, potentially boosting the brand and ecosystem.

Meta might also influence standard interfaces and APIs for open-source LLMs, which raises concerns similar to Big Tech's efforts to define browser standards, a topic that privacy and open internet advocates have been fighting against for years.

We should indeed acknowledge Zuckerberg and Meta for supporting open-source AI. However, it's essential to remain cautious about relying on a tech giant, as their incentives may not always align with the developer community's interests, despite current synergies.

WEEKLY DIGEST
MISTRAL LARGE 2 DROPS! Now Rival to GPT-4 & Claude 3.5 🇫🇷💪

Mistral Large 2

Mistral dropped a new open-source model (Mistral Large 2) that’s competitive with Llama 3.1 and therefore ChatGPT-4 (Try it on Mistral or HuggingFace).

Performance:
- Comparable to or outperforming Llama 3.1 405B and GPT-4o on various benchmarks
- Excels in coding, math, and reasoning tasks
- Strong performance in multilingual capabilities (12 human languages, 80 coding languages)
- Improved instruction following and function calling
Availability:
- Open weights released on Hugging Face
- Non-commercial use allowed under Mistral Research License
- Commercial use requires a separate license
Key Features:
- Trained to acknowledge when it lacks information or confidence
- Improved performance on mathematical benchmarks
- Enhanced reasoning and problem-solving skills

Blog: https://mistral.ai/news/mistral-large-2407/…

Weights:https://huggingface.co/mistralai/Mistral-Large-Instruct-2407…

Mistral NeMo

Key Features

12B parameters
128,000 token context window
Outperforms Mistral 7B in conversation tracking and multilingual capabilities
Open-source, available on Hugging Face and La Plateforme
More capable than GPT-3.5 despite smaller size

Significance

Demonstrates the trend of smaller, more efficient models
Challenges the need for expensive, large model training
Suggests shorter utility windows for AI models
Enables easier experimentation for entrepreneurs
Offers lower operational costs (per token and hosting)

Try Mistral at:

WEEKLY DIGEST
Waitlist for OpenAI’s search engine

OpenAI's search engine comes to light, at least as an accessible prototype with a waitlist. The combination of GPT technology with the information sources that Altman has been securing deals with over the past few months.

A new perplexity? openai.com/index/searchgpt-prototype/

We’re testing SearchGPT, a temporary prototype of new AI search features that give you fast and timely answers with clear and relevant sources.
We’re launching with a small group of users for feedback and plan to integrate the experience into ChatGPT. openai.com/index/searchgp…
— OpenAI (@OpenAI)
6:09 PM • Jul 25, 2024

JOBS AND TRAININGS
Andrej Karpathy's "Zero to Hero"

Andrej Karpathy's "Zero to Hero" neural networks course is an extensive learning resource that includes YouTube videos and a GitHub repository with Jupyter notebooks and exercises. It covers neural network basics to advanced topics like GPT, providing hands-on experience in each lecture.

The course includes five main topics:

Backpropagation and Neural Network Training: Learn how to train neural networks using backpropagation.
Language Modeling: Build character-level language models and advance to more complex models like GPT.
Multilayer Perceptrons (MLPs): Understand and implement MLPs, including training and evaluation techniques.
Convolutional Neural Networks (CNNs): Transform a deeper MLP into a CNN, learning about CNN architecture and applications.
Tokenization in Language Models: Create a GPT Tokenizer to understand tokenization's role in large language models and its impact on performance.

JOBS AND TRAININGS

Lambda is seeking a Machine Learning Researcher.

DeepL is looking for a Software Engineer - Backend to join their team.

Luma AI is seeking a VP of Finance to join their team.

Coreweave is hiring a Manager, Customer Support Engineering.

OctoAI is looking for a Staff Product Manager - Platform

Senior Systems Software Engineer position at Groq

Research Engineer, Speech Technologies at Hippocratic AI

We put your money to work

Betterment’s financial experts and automated investing technology are working behind the scenes to make your money hustle while you do whatever you want.

Learn more

Thank you, see you next week!

Reply

or to participate.