- AI Revolution Hub
- Posts
- 29 | OpenAI launched yesterday o1 | Waymo is proving safety | Pixtral Multimodal
29 | OpenAI launched yesterday o1 | Waymo is proving safety | Pixtral Multimodal


Our Menu :)
Actionable Tips:
Jobs and Training
Next Week
Read time: 10 Minutes
WEEKLY DIGEST
OpenAI Unveils New "01" Model Series
Sep 10, 2024. OpenAI, the company behind ChatGPT, has just released a new AI system called O1 (also known as "Strawberry"). This new system represents a big change in how AI works and thinks. Here's what you need to know:

What's New About O1?
Thinking Before Speaking: Unlike older AI systems, O1 takes time to think before giving an answer. You can even see its thought process.
Solving Tough Problems: O1 is great at tasks that need careful thinking, like advanced math and complex coding. It sometimes does better than human experts.
Using Less to Do More: O1 doesn't need to be as big as other AI systems to be smart. It's about how it uses information, not how much it knows.
Learning While Working: Every time O1 solves a problem, it gets better at solving similar problems in the future. It's like it's always learning on the job.
Availability and Versions:
O-1 Mini: A faster, more efficient version
O-1 Preview: The more powerful version available to select users
A third, more advanced version is being used internally at OpenAI, to generate synthetic data potentially to train GPT-5
30 messages/week for o1-preview (the big berry).
50 messages/week for o1-mini (80% cheaper and 3-5x faster).
Train of Thought.
OpenAI doesn't share many details about how their model is trained, which is typical for them. They mention two things: first, the model is trained to "think before it answers," and second, this training uses a method called reinforcement learning.
"Think before you answer" is a technique where the model explains its reasoning step by step. You might have seen this if you've worked with advanced prompts. In this case, the model is trained to give detailed, thoughtful answers.
What you're seeing here is a summary of the model's output. For safety and competitive reasons, OpenAI doesn't show the exact answers from the model. Instead, another model summarizes the steps, though these summaries might not be perfect.
This use of secondary AI models is a common theme in the training process. OpenAI is getting better at using existing models to improve and speed up new training. For example, after collecting the initial training data, they used a special tool to remove harmful examples, like explicit content.
Similarly, another model called GPT-4o was used to check for safety and preparedness. It acted as both a security guard and a partner in tests. In two exercises, "MakeMeSay" and "MakeMePay," the model played the roles of a manipulator and a con artist, respectively.
This brings us to the topic of safety with this model.

Why Is This Important?
More Human-Like Reasoning The way O1 thinks through problems is more similar to how humans think, which could make it easier for people to work with and understand.
New Possibilities in Many Fields From education to scientific research, O1 could open up new ways of learning and discovering.
Technical aspects
49th percentile in the 2024 International Olympiad in Informatics
83.3% score on AIME math exam with consensus voting (GPT-4o: 13.4%)
Elo rating of 1673 on Codeforces, outperforming 89% of human competitors
362.14 points (above gold medal threshold) with relaxed submission constraints in International Olympiad in Informatics
01-preview is top-ranked in LiveBench, a contamination-free benchmark.
How Does Reasoning work?
The o1 models introduce reasoning tokens. The models use these reasoning tokens to "think", breaking down their understanding of the prompt and considering multiple approaches to generating a response.
The reinforcement learning approach trains the model to think through problems, similar to AlphaGo's Monte Carlo tree search.
This shifts computational focus from pre-training to inference time, allowing the model to explore multiple strategies and scenarios during problem-solving.
In conclusion, This release aligns more closely with the original GPT-3 than the ChatGPT release. It introduces a new approach that some may find incredibly valuable in ways even we at OpenAI can't predict.
However, it's not a mass-market product that works seamlessly and unlocks new value for everyone effortlessly.
I'm confident that we will have another significant moment like ChatGPT soon!
There are already some interesting use cases
Favorite o1 Use-Cases: Standouts
An absolute must-watch: A researcher's reaction after o1 wrote his PhD code in just one hour, a task that originally took him ten months. The reaction video went so viral that he had to make a second response video.
YouTube video by Kyle Kabasares: "ChatGPT o1 Preview + Mini: Wrote My PhD Code in 1 Hour*—What Took Me ~1 Year"
Another notable example: A doctor used o1 to write a "major cancer treatment project" in under a minute, a task that would have taken him days. o1 even provided at least one creative idea he might not have thought of in "thirty years."
Aaron Levie’s take: o1 could potentially replace many enterprise SaaS use-cases by handling complex tasks, such as determining a contract's effective date through reasoning and self-validating. Levie (of Box fame) says this capability is "extremely useful for complex business processes" typically solved by SaaS products.
PRO(MPT) TIP: Ethan Mollick emphasized prompting your question in a way that an AI might find interesting, otherwise it might not think so hard.
More o1 Prompting Tips:
First, have a casual back-and-forth with 4o or Claude Sonnet about your ideas. Then, clearly summarize what you want done, along with any constraints.
Be specific about what you're asking. Instead of a broad question, narrow it down to a particular area or issue.
When the AI gives you an answer, ask it to explain why. This helps you spot any mistakes in its reasoning.
P.S.: If you want to learn more about how o1 works, this is a must-read. o1 can handle problems with up to 200 steps, but struggles approaching ~250 steps… so it’s perfect for difficult seating charts!
Just combined @OpenAI o1 and Cursor Composer to create an iOS app in under 10 mins!
o1 mini kicks off the project (o1 was taking too long to think), then switch to o1 to finish off the details.
And boom—full Weather app for iOS with animations, in under 10 🌤️
Video sped up!
— Ammaar Reshi (@ammaar)
9:47 PM • Sep 12, 2024
Update September 15th:
WILD o1 Demos We Discovered- YOU CAN YRY:
Generated an Animated Solar System: Created in just 5 prompts.
Wrote a Poem with Incredibly Strict Rules: Showcasing its ability to adhere to complex constraints.
Solved Complex Crossword Clues and Intelligence Tests: Demonstrating advanced problem-solving capabilities.
Created Fractal Art Using JavaScript and Adobe Firefly: Combining coding and artistic creativity.
Completed White-Collar Tasks: Such as estimating the number of Chinese people with an annual disposable income over 100K Yuan.
Achieved First Place on the Norway Mensa IQ Test: Outperforming other models tested, with an IQ score of approximately 120.
WEEKLY DIGEST
Waymo Leads the Pack in Autonomous Vehicle Safety, New Data Reveals

Recent data and analysis paint a picture of a technology that's not just futuristic, but increasingly safer than human drivers.
Widespread Deployment
Waymo's autonomous vehicles are no longer confined to controlled environments. They're now operational in major U.S. cities including San Francisco, Phoenix, Austin, Miami, Seattle, Los Angeles, and Atlanta. This widespread deployment is a testament to the company's confidence in its technology and its ability to navigate diverse urban environments.
Impressive Growth and Safety Record
The numbers speak volumes about Waymo's progress:
A fleet of approximately 700 vehicles
100,000 rides per week (up from 10,000 a year ago)
22 million miles driven over 2 million paid rides
But it's the safety statistics that are truly turning heads. Waymo reports that its autonomous vehicles cause fewer than one crash per million miles driven. To put this into perspective, in the 22 million miles driven, Waymo has reported:
20 crashes with injuries
5 crashes serious enough to deploy airbags
Only 1 crash involving serious injuries
Waymo claims that human drivers in San Francisco and Phoenix would have caused 64 crashes over the same distance, 31 of which would have been severe enough to trigger airbags.
Human Error: The Biggest Threat to Self-Driving Cars?
An in-depth analysis by Timothy B. Lee of Understanding AI reveals an interesting twist: out of the 23 most serious crashes involving Waymo vehicles, 19 were actually caused by human error. Specifically:
16 incidents involved humans rear-ending Waymo cars
3 cases where humans ran red lights and collided with Waymo vehicles
This leaves only 4 serious crashes that could be attributed to Waymo's technology itself.

Safety Comparison with Human Drivers
Lee's analysis suggests that Waymo's injury crash rates are approximately 60-70% lower than those of human drivers. This stark contrast highlights the potential of autonomous vehicles to significantly reduce road accidents and improve overall traffic safety.
The Road Ahead
As the technology continues to improve and gain public trust, the autonomous vehicle market is poised for significant growth. S&P Global projects that by 2034, there will be 230,000 driverless taxis sold in the U.S. alone.
However, the irony isn't lost on industry observers: the biggest challenge facing self-driving cars might not be the technology itself, but rather the unpredictable behavior of human drivers sharing the road.

The Bottom Line
Waymo's impressive safety record and rapid expansion across major U.S. cities signal a turning point in the autonomous vehicle industry. As public perception catches up with the reality of the technology's safety benefits, we may be witnessing the early stages of a transportation revolution.
The question now isn't whether self-driving cars will become mainstream, but how quickly society will adapt to this safer, more efficient mode of transportation. As one industry insider quipped, "The safest self-driving car tech will be the one everyone uses—we might not be the safest drivers, but us humans don't mess around when it comes to 'unsafe' new tech."
With Waymo leading the charge, the future of transportation looks not just autonomous, but demonstrably safer.
WEEKLY DIGEST
Mistral Challenges Tech Giants with Launch of Pixtral 12B, Its First Multimodal AI Model

In a bold move that signals its ambitions in the AI race, French startup Mistral AI has unveiled Pixtral 12B, its first-ever multimodal model capable of processing both language and visual data. This release marks Mistral's entry into the competitive field of vision-language models, putting it in direct competition with tech giants like Meta and OpenAI.
A New Contender in Multimodal AI
Pixtral 12B builds upon Mistral's text-based Nemo 12B model, incorporating a 400 million parameter vision adapter. This architectural choice allows the model to leverage Mistral's existing language prowess while adding robust image processing capabilities.
"With Pixtral 12B, we're pushing the boundaries of what's possible in AI," said Arthur Mensch, CEO and co-founder of Mistral AI. "This model represents a significant step forward in our mission to create more versatile and powerful AI systems."
Technical Specifications and Features
Pixtral 12B boasts impressive specifications:
Text backbone: Mistral Nemo 12B
Vision Adapter: 400M parameters
Vocabulary size: 131,072 tokens
Image processing: Up to 1024x1024 pixels, divided into 16x16 pixel patches
New special tokens: 'img', 'img_break', 'img_end'
Model weights in bfloat16 format
Total download size: 24GB
The model uses GeLU activation for the vision adapter and 2D Rotary Position Embedding (RoPE) for the vision encoder, techniques that have shown promise in improving model performance and efficiency.

Accessibility and Implementation
In line with Mistral's commitment to open-source AI, the company has made Pixtral 12B's weights available on the Hugging Face Hub. Developers can access the model through the mistral_common
Python package, which now supports image input alongside text in user messages.
Installation is straightforward:
pip install --upgrade mistral_commonCopy
This integration allows for seamless incorporation of image data within the text processing pipeline, potentially enabling applications like visual question answering and image captioning.
Additionally, following Mistral's previous practice, the model is also distributed via a peer-to-peer torrent network, further enhancing accessibility for developers worldwide.
Performance and Future Prospects
While specific benchmarks are yet to be released, the AI community is eagerly anticipating performance comparisons with other multimodal models like OpenAI's GPT-4V or Meta's CLIP.
"The release of Pixtral 12B opens up exciting new possibilities in AI application development," noted AI researcher Dr. Emily Chen, unaffiliated with Mistral. "Its large scale and advanced architecture could enable more sophisticated AI applications across various industries, from content creation to data analysis."
As Mistral continues to push the boundaries of AI technology, all eyes will be on how Pixtral 12B performs in real-world applications and how it stacks up against offerings from more established players in the field.
With this release, Mistral has made it clear that it's not just participating in the AI race – it's aiming to lead it. As the company refines its multimodal capabilities, the broader implications for the AI industry and beyond remain to be seen.
ACTIONABLE TIPS
Listen to anything on the go with the highest quality voices

This week introduces an exciting free mobile app from Eleven Labs that transforms any text into high-quality audio using advanced AI voices.
Why It’s Cool and Useful:
Versatile Import Options: Easily import text by typing, pasting URLs, uploading files like PDFs, or using your phone’s camera for OCR to read text from images.
Customizable Playback: Adjust the speed and choose from a variety of voices based on language, gender, and style to suit your preferences.
Additional Tip: For complex documents like white papers that include charts and extensive formatting, use an LLM like ChatGPT to adapt the content for audio. This involves:
Uploading the Document: Import the PDF into the LLM platform.
Preparing a Prompt: Instruct the LLM to translate (if needed) and simplify the text, removing citations and complex formatting.
Generating a Clean Version: The LLM creates a streamlined version suitable for narration.
Listening on the App: Load the adapted document into the Eleven Labs app for a smooth listening experience.
ACTIONABLE TIPS
Replit Agent: AI for development everybody is talking about
AI is incredible at writing code.
But that's not enough to create software. You need to set up a dev environment, install packages, configure DB, and, if lucky, deploy.
It's time to automate all this.
Announcing Replit Agent in early access—available today for subscribers:
— Amjad Masad (@amasad)
4:27 PM • Sep 5, 2024

Thank you, see you next week!
Reply