23 | Everything about "Her" => GPT-4o / Google IO

Our Menu :)

Actionable Tips:

Jobs Opportunities

Read time: 6 Minutes

GEN AI AT WORK
All you need to know about GPT-4o 

May 13, 2024. On Monday evening, I rewatched the movie "Her". It relates the story of someone falling in love with an AI (that has the voice of Scarlett Johansson). šŸ‘ØšŸ’»ā¤ļøšŸ¤–

We're kind of getting to that reality to some extent with this announcement

OpenAI launched its new model called GPT-4o, here are some facts about the model:

  • Available through the Desktop App and the API

  • 2x faster

  • 50% cheaper

  • 5x higher rate limits compared to GPT-4 turbo

  • Available to all users for free

  • And it is the most conversation ever made

BREAK DOWN =>

What is truly multimodal?

GPT-4o (ā€œoā€ for ā€œomniā€) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.

šŸ“„ Input: Text, Text + Image, Text + Audio, Text + Video, Audio

šŸ“¤ Output: Image, Image + Text, Text, Audio

Open many possibilities for disabled people. Be My Eyes + GPT-4o helping a blind person ā€œseeā€ what’s in front of him, even flagging down an available taxi.

GPT-4o is closer and closer to ā€œHERā€

šŸŽ¤ Near real-time audio with 320ms on average, similar to human conversation.

Voice Mode feels like chatting with a real human—it captures your tone, language, and expressions in real-time. Many are describing it as a real-life Her (the voice in all the demos might actually be Scarlett Johansson).

Explore what it can do here:

  • live language translation (link).

  • realtime conversational speech (link).

  • lullabies and whispers (link).

  • sarcasm (link).

  • even singing (link) ⭐

How intelligent is the model?

šŸ–¼ļø 69.1% on MMU; 92.8% on DocVQA

When can we use it?

GPT4-o will roll out from Monday 5/13th for all ChatGPT users through the Web UI, even for free tier users. It is also available through the API, and the playground.

GPT-4o vs Dalle-3

One of the biggest improvements is the capacity to include text within the generated image.

If we are familiar with DALL-E 3, we know that DALL-E 3 was not capable of generating text on its images, but now look at some examples of what the new model can do.

GEN AI AT WORK
Anthropic is expanding to Europe

May 14, 2024. Anthropic has launched its AI assistant Claude for users and businesses across Europe, aiming to boost productivity and creativity through:

  • Claude.ai: Web interface for the next-gen AI assistant

  • Claude iOS app: Free mobile version with intuitive experience

  • Claude Team plan: Secure access to AI capabilities and Claude 3 models for businesses

Key Features:

  • Strong comprehension of multiple European languages

  • User-friendly interface for seamless AI model integration

Pricing:

  • Free access to basic web and mobile offerings

  • €18+VAT/month for Claude Pro with advanced Claude 3 Opus model

  • €28+VAT/user/month for business Team plan (min. 5 seats)

With a focus on human-centric AI systems, Anthropic brings Claude's advanced language capabilities to Europe, allowing individuals and organizations to leverage its state-of-the-art models.

Anthropic vs prompt engineers

  • Anthropic has released Metaprompt, a tool designed to enhance performance in Claude-powered applications by converting brief task descriptions into optimized prompt templates. It utilizes a few-shot prompt method with variables like subject, length, and tone.

GEN AI AT WORK
All you need to know about the Google IO conference

May 14, 2024. Allow me to summarise those two hours in just a few lines. Among other announcements at the Google IO conference, they introduced the following:

  • Google is integrating AI across its entire ecosystem.

  • Veo: their most capable video generation model

  • Project Astra: their new project focused on building a future AI assistant

  • Updates to Gemini 1.5 Pro: two new versions, one that's more lightweight, and another with a 2M token context length

  • AlphaFold 3 Expands Biochemical

BREAK DOWN =>

AI is here in Google Workspace.

This means paying users will have a ChatGPT-esque assistant right beside their screen that knows everything your Google apps know about you (so literally everything).

Why it matters: When you're working in Docs/Sheets/Gmail/Slides, you can ask Gemini to retrieve or summarise any content from all of these apps:

My personal all-time product

"Ask Photos" is a new feature that will upgrade searching through photos from simple keywords like ā€œflowersā€ to ā€œshow me all the images of my son playing with our dogā€ and ā€œwhat was the first place we visited on our Japan tripā€.

Veo :

It is Google DeepMind's most capable video generation model to date. It generates videos:

  • High-quality with a 1080p resolution

  • That can go on for over a minute

  • In a wide range of cinematic and visual styles

Veo can take as input an image or video along with a text prompt. It can animate the image or edit the video when passed in as the input.

Moreover, it supports masked editing, enabling changes to specific areas of the video when you add a masked area to your video and text prompt.

Project Astra

Astra is Google's new project focused on building a future AI assistant, very similar to OpenAI's GPT-4o that was showcased live yesterday.

Google's new assistant is powered by Gemini and supports audio, text, video and image sharing in real-time.

Google still presents this project as a prototype, and the capabilities of Astra were only shared through pre-recorded videos since it is still not available to all users.

Early testers report a longer latency, and less emotional intelligence and tone for Astra compared to GPT-4o, but strong text to speech and potentially better ongoing video a long context support.

Gemini 1.5 Pro

Google unveiled two iterations of their flagship model Gemini 1.5 Pro.

  • Gemini 1.5 Pro Flash is the light-weight, fast and cost-efficient version of the model, meaning it is also multimodal and has a 1M token context length. The performance cost is small, with an MMLU of 78.9% compared to 81.9% for the original Gemini 1.5 Pro model.

  • Gemini 1.5 Pro had its context length doubled to 2M tokens. The new model is available via a waitlist for select developers building through the API.

AlphaFold 3 Expands Biochemical Modeling Capabilities

  • DeepMind's latest AlphaFold 3 model can predict 3D structures of not just proteins, but all biomolecules like DNA, RNA, and drug compounds, as well as their interactions.

  • Unlike previous versions, AlphaFold 3 is not open source and will be commercially controlled by Isomorphic Labs, a DeepMind spin-off focused on drug discovery.

  • AlphaFold 3 outperforms existing methods in accurately modeling molecular structures and interactions, achieving up to 77% success on benchmark datasets.

Business Impact:

  • Unprecedented accuracy in predicting drug-protein interactions could accelerate drug development and improve treatment efficacy.

  • Ability to model antibody-protein binding could aid in developing treatments for future pandemics and diseases.

  • Commercial control by Isomorphic Labs means businesses may need to license or partner for access to this powerful technology.

Other announcements

  • Gemma 2—Google’s best open-source models. it is a 27B parameter model that outperforms the previous version and will be available starting in June.

  • PaLI-3—a fresh open-source vision model. Their most capable image generation model, which will be available in multiple versions, each optimized for different types of tasks, from generating quick sketches to high-resolution images.

  • Gemini Live—a talking feature for Gemini similar to ChatGPT Voice Mode coming later this year.

GEN AI AT WORK
FRESH NEW! Add a file from Google Drive to ChatGPT

GEN AI Jobs
Jobs Opportunites in AI

Remember that we have launched our beta job watcher

Thank you, see you next week!

Reply

or to participate.