AI Revolution Hub
Posts
23 | Everything about "Her" => GPT-4o / Google IO

23 | Everything about "Her" => GPT-4o / Google IO

Wilson CELY
May 17, 2024

Our Menu :)

All you need to know about GPT-4o
Anthropic is expanding to Europe / Metaprompt
Google IO conference

Actionable Tips:

Jobs Opportunities

FRESH NEW! Job Listings this week

Read time: 6 Minutes

GEN AI AT WORK
All you need to know about GPT-4o

May 13, 2024. On Monday evening, I rewatched the movie "Her". It relates the story of someone falling in love with an AI (that has the voice of Scarlett Johansson). 👨💻❤️🤖

We're kind of getting to that reality to some extent with this announcement

OpenAI launched its new model called GPT-4o, here are some facts about the model:

Available through the Desktop App and the API
2x faster
50% cheaper
5x higher rate limits compared to GPT-4 turbo
Available to all users for free
And it is the most conversation ever made

BREAK DOWN =>

What is truly multimodal?

GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.

📥 Input: Text, Text + Image, Text + Audio, Text + Video, Audio

📤 Output: Image, Image + Text, Text, Audio

Open many possibilities for disabled people. Be My Eyes + GPT-4o helping a blind person “see” what’s in front of him, even flagging down an available taxi.

GPT-4o is closer and closer to “HER”

🎤 Near real-time audio with 320ms on average, similar to human conversation.

Voice Mode feels like chatting with a real human—it captures your tone, language, and expressions in real-time. Many are describing it as a real-life Her (the voice in all the demos might actually be Scarlett Johansson).

Explore what it can do here:

live language translation (link).
realtime conversational speech (link).
lullabies and whispers (link).
sarcasm (link).
even singing (link) ⭐

How intelligent is the model?

🖼️ 69.1% on MMU; 92.8% on DocVQA

When can we use it?

GPT4-o will roll out from Monday 5/13th for all ChatGPT users through the Web UI, even for free tier users. It is also available through the API, and the playground.

GPT-4o vs Dalle-3

One of the biggest improvements is the capacity to include text within the generated image.

If we are familiar with DALL-E 3, we know that DALL-E 3 was not capable of generating text on its images, but now look at some examples of what the new model can do.

A GPT-4o generated image — so much to explore with GPT-4o's image generation capabilities alone. Team is working hard to bring those to the world.
— Greg Brockman (@gdb)
10:18 PM • May 15, 2024

GEN AI AT WORK
Anthropic is expanding to Europe

May 14, 2024. Anthropic has launched its AI assistant Claude for users and businesses across Europe, aiming to boost productivity and creativity through:

Claude.ai: Web interface for the next-gen AI assistant
Claude iOS app: Free mobile version with intuitive experience
Claude Team plan: Secure access to AI capabilities and Claude 3 models for businesses

Key Features:

Strong comprehension of multiple European languages
User-friendly interface for seamless AI model integration

Pricing:

Free access to basic web and mobile offerings
€18+VAT/month for Claude Pro with advanced Claude 3 Opus model
€28+VAT/user/month for business Team plan (min. 5 seats)

With a focus on human-centric AI systems, Anthropic brings Claude's advanced language capabilities to Europe, allowing individuals and organizations to leverage its state-of-the-art models.

Anthropic vs prompt engineers

Anthropic has released Metaprompt, a tool designed to enhance performance in Claude-powered applications by converting brief task descriptions into optimized prompt templates. It utilizes a few-shot prompt method with variables like subject, length, and tone.

GEN AI AT WORK
All you need to know about the Google IO conference

May 14, 2024. Allow me to summarise those two hours in just a few lines. Among other announcements at the Google IO conference, they introduced the following:

Google is integrating AI across its entire ecosystem.
Veo: their most capable video generation model
Project Astra: their new project focused on building a future AI assistant
Updates to Gemini 1.5 Pro: two new versions, one that's more lightweight, and another with a 2M token context length
AlphaFold 3 Expands Biochemical

BREAK DOWN =>

AI is here in Google Workspace.

This means paying users will have a ChatGPT-esque assistant right beside their screen that knows everything your Google apps know about you (so literally everything).

Why it matters: When you're working in Docs/Sheets/Gmail/Slides, you can ask Gemini to retrieve or summarise any content from all of these apps:

My personal all-time product

"Ask Photos" is a new feature that will upgrade searching through photos from simple keywords like “flowers” to “show me all the images of my son playing with our dog” and “what was the first place we visited on our Japan trip”.

Veo :

It is Google DeepMind's most capable video generation model to date. It generates videos:

High-quality with a 1080p resolution
That can go on for over a minute
In a wide range of cinematic and visual styles

Veo can take as input an image or video along with a text prompt. It can animate the image or edit the video when passed in as the input.

Moreover, it supports masked editing, enabling changes to specific areas of the video when you add a masked area to your video and text prompt.

Project Astra

Astra is Google's new project focused on building a future AI assistant, very similar to OpenAI's GPT-4o that was showcased live yesterday.

Google's new assistant is powered by Gemini and supports audio, text, video and image sharing in real-time.

Google still presents this project as a prototype, and the capabilities of Astra were only shared through pre-recorded videos since it is still not available to all users.

Early testers report a longer latency, and less emotional intelligence and tone for Astra compared to GPT-4o, but strong text to speech and potentially better ongoing video a long context support.

Gemini 1.5 Pro

Google unveiled two iterations of their flagship model Gemini 1.5 Pro.

Gemini 1.5 Pro Flash is the light-weight, fast and cost-efficient version of the model, meaning it is also multimodal and has a 1M token context length. The performance cost is small, with an MMLU of 78.9% compared to 81.9% for the original Gemini 1.5 Pro model.
Gemini 1.5 Pro had its context length doubled to 2M tokens. The new model is available via a waitlist for select developers building through the API.

AlphaFold 3 Expands Biochemical Modeling Capabilities

DeepMind's latest AlphaFold 3 model can predict 3D structures of not just proteins, but all biomolecules like DNA, RNA, and drug compounds, as well as their interactions.
Unlike previous versions, AlphaFold 3 is not open source and will be commercially controlled by Isomorphic Labs, a DeepMind spin-off focused on drug discovery.
AlphaFold 3 outperforms existing methods in accurately modeling molecular structures and interactions, achieving up to 77% success on benchmark datasets.

Business Impact:

Unprecedented accuracy in predicting drug-protein interactions could accelerate drug development and improve treatment efficacy.
Ability to model antibody-protein binding could aid in developing treatments for future pandemics and diseases.
Commercial control by Isomorphic Labs means businesses may need to license or partner for access to this powerful technology.

Other announcements

Gemma 2—Google’s best open-source models. it is a 27B parameter model that outperforms the previous version and will be available starting in June.
PaLI-3—a fresh open-source vision model. Their most capable image generation model, which will be available in multiple versions, each optimized for different types of tasks, from generating quick sketches to high-resolution images.
Gemini Live—a talking feature for Gemini similar to ChatGPT Voice Mode coming later this year.

GEN AI AT WORK
FRESH NEW! Add a file from Google Drive to ChatGPT

We're rolling out interactive tables and charts along with the ability to add files directly from Google Drive and Microsoft OneDrive into ChatGPT. Available to ChatGPT Plus, Team, and Enterprise users over the coming weeks. openai.com/index/improvem…
— OpenAI (@OpenAI)
10:00 PM • May 16, 2024

GEN AI Jobs
Jobs Opportunites in AI

Remember that we have launched our beta job watcher

Software Engineer, Safety - Character AI
Character AI is looking for a Software Engineer, Safety to join their team.
Events Coordinator - Scale AI
Scale AI is hiring an Events Coordinator for their team
People Operations Coordinator - Anthropic
Anthropic is seeking a People Operations Coordinator to join their team
Media Relations, Policy Communications - OpenAI
OpenAI is looking for a Media Relations, Policy Communications specialist to join their team.

Thank you, see you next week!

Reply

or to participate.

23 | Everything about "Her" => GPT-4o / Google IO

GEN AI AT WORK All you need to know about GPT-4o

GEN AI AT WORKAnthropic is expanding to Europe

GEN AI AT WORKAll you need to know about the Google IO conference

GEN AI AT WORK FRESH NEW! Add a file from Google Drive to ChatGPT

GEN AI JobsJobs Opportunites in AI