This website uses cookies

Read our Privacy policy and Terms of use for more information.

In partnership with

WEEKLY DIGEST
The Open-Source Gambit: Meta's Llama 3.1 and the Future of AI 🦙 🏆

Meta has just dropped Llama 3.1, and for the first time, we're seeing an open-source model that can go toe-to-toe with the closed-source heavyweights like GPT-4 and Claude 3.5 Sonnet.

What does it mean ?

Imagine for a moment that the AI world is a bustling marketplace. In one corner, you have the luxury boutiques – the GPT-4s and Claude 3.5s of the world – offering high-end, proprietary AI models at premium prices. And then, seemingly out of nowhere, Meta shows up and starts handing out free samples that are nearly as good as the expensive stuff.

That's essentially what's happening with Llama 3.1. Meta has released not one, but three powerful AI models that anyone can download and use, even for commercial purposes. The largest model, 405B, is performing on par with the best closed-source models out there. It's like giving away a sports car that can keep up with a Formula 1 racer.

The Big Picture 🖼️

  1. Open-Source Power: These models are open-source, allowing researchers and developers to use them freely, even for commercial purposes (with some limitations*).

  2. Massive Context Window: The context window has been expanded from 8,000 tokens to a whopping 128,000 tokens! – that's about 200 pages of text!

  3. Competitive Performance: The 405B model is giving GPT-4 and Claude 3.5 a run for their money in various benchmarks.

  4. Distilled Knowledge: The smaller models (8B and 70B) have been improved using knowledge distilled from the 405B model.

  5. Multimodal Promise: While not yet available, Meta hints at future multimodal capabilities for Llama 3.

  6. The model also comes with new pieces in the "Llama system" - a system of tools to make it easier for developers to build applications with Llama models. It's a long way from "I have an open-source LLM" to "I have a working LLM-enabled product."

  7. The Llama ecosystem has now grown to include an agentic frameworkcontent and jailbreaking safeguards, and a proposal for new developer interfaces.

*License of Llama 3.1

  • Permissively licensed, including commercial use.

  • Any derivative artifact, including models and datasets, must be distributed with the Llama 3.1 license.

  • Companies with greater than 700 million active users at the time of release cannot use the model.
    The modification of the fairly restrictive Llama 3 license

The Secret Sauce: Knowledge Distillation

Here's where it gets really interesting. Meta didn't just create three separate models – they used a clever technique called "knowledge distillation" to make the smaller models punch above their weight class.

Think of it like this: Imagine you had a world-class chef (the 405B model) teach a talented home cook (the smaller models) all their secrets. The home cook might not be able to replicate every complex dish, but they'd certainly pick up techniques that make their cooking far better than before. That's essentially what Meta has done, using the massive 405B model to improve the capabilities of the smaller, more accessible models.

It's similar to what OpenAI did with GPT-4 Mini. Announced last week.

Democratizing AI

The true significance of Llama 3.1 isn't just its technical capabilities – it's what it means for the AI ecosystem as a whole. By releasing such powerful models as open-source, Meta is essentially lowering the barrier to entry for AI innovation.

This could lead to:

  1. Keep AI Gold Rush: When powerful AI becomes a free resource, innovation explodes.

  2. Price Wars: As open-source models improve, commercial AI services will likely need to drop their prices or offer unique value to stay competitive.

  3. Ethical Challenges: With great power comes great responsibility. As advanced AI becomes more accessible, we'll need to grapple with ensuring its responsible use.

Llama 3.1's architecture is simpler compared to other complex strategies, like a mixture of experts. Despite its simplicity, it requires significant resources: Meta used 16,000 H100 GPUs to train the 405B mode. In the words of Ben Thompson, "Meta trained Llama like a big company with lots of resources, while OpenAI trained GPT-4 like a startup."

But that brings us back to :

Why did Meta repurposed $100M worth of hardware to give away the result for free?

Mark Zuckerberg's letter provides valuable insights into Meta's strategy in the competitive AI landscape. He emphasizes the need for developers, CEOs, and governments to control their models, ensuring data protection, fine-tuning for specific use cases, and making appropriate long-term technology choices. Larger organizations often require more control over the technology they use, making open-source AI like Llama an attractive option.

Meta doesn't want to depend on a single AI provider or proprietary systems, so they benefit from Llama being open-source. Zuckerberg recognizes the intense competition in AI development, meaning the best models will keep changing. Open-sourcing a model won't give a lasting edge in this fast-paced environment.

Moreover, OpenAI, Anthropic, DeepMind, and other foundation model developers make money by selling access to their models, either per token or via product subscriptions. Meta already has a multi-billion dollar ad business, which means it can afford to give away the AI stuff for free.

That allow them to apply, a longstanding idea in tech "commoditize your complement" - to make adjacent layers of the stack/supply-chain/ecosystem commodities, so that your product benefits from more options and your competition fights each other

With the release of Llama 3.1, a significant ecosystem of partners is already in place, offering the model on various cloud platforms and providing fine-tuning and RAG systems.

This extensive support seems to create a much larger moat than having a model that marginally outperforms others on benchmarks, especially as AI models become increasingly commoditized. Additionally, Meta can position itself as an advocate for AI safety.

Numbers Don't Lie, do they? 📊

Let's talk performance. The 405B model is flexing hard:

  • MMLU-Chat (General Knowledge): 88.6

  • GSM8K (Math): 96.8

  • HumanEval (Coding): 89

Interpretation from Clement Delangue

What's Next? 🔮

They're integrating Llama 3.1 across Facebook, Instagram, WhatsApp, and even AI glasses.

Will we see a world where AI is as ubiquitous as cat photos on social media? Only time will tell!

Access

But of course, for Europe there are and will be restrictions.

The Catch (Because There's Always a Catch)

Before you go all-in on Llama 3.1, keep in mind:

  • It's not yet multimodal (no image or video processing... yet).

  • You'll need some serious computing power to run the full 405B model.

  • There's a learning curve to effectively utilizing open-source models.

  • The Llama 3.1 license is closer to freeware than open-source.

While the new Llama 3.1 license seems generous, it requires downstream models to include "Llama" in their name, potentially boosting the brand and ecosystem.

Meta might also influence standard interfaces and APIs for open-source LLMs, which raises concerns similar to Big Tech's efforts to define browser standards, a topic that privacy and open internet advocates have been fighting against for years.

We should indeed acknowledge Zuckerberg and Meta for supporting open-source AI. However, it's essential to remain cautious about relying on a tech giant, as their incentives may not always align with the developer community's interests, despite current synergies.

We put your money to work

Betterment’s financial experts and automated investing technology are working behind the scenes to make your money hustle while you do whatever you want.

Thank you, see you next week!

Reply

Avatar

or to participate

Keep Reading