- AI Revolution Hub
- Posts
- 10 | Google Gemini Era or Hangover ?
10 | Google Gemini Era or Hangover ?

This week, we witnessed the much-anticipated launch of Gemini, which has set the networks ablaze. But why is there so much hype?
In this brief article, we will cover everything you need to know about Gemini LLM, including how to use it and its use cases.
Google is finally alert and active, firing in all directions. We might expect a response from Apple soon.
Our Menu :)
Weekly Digest:
Pixel 8 Pro is the first smartphone engineered to run Gemini Nano 📱
Gemini Research Paper Briefing 📑
Actionable Tips:
Gemini Use Cases 🎯
Training & Crash Courses:
Google AI Training - Now Free 🆓
WEEKLY DIGEST
From Gemini Era to Gemini Hangover

What is Gemini?
On December 6th, 2023, Google DeepMind introduced its most advanced artificial intelligence model, a technology capable of processing various forms of information such as video, audio, and text.
3 Different Sizes for 3 Different Use Cases:
Gemini Ultra: Its largest model, designed for complex tasks (Competing with GPT-4 and GPT-4V).
Gemini Pro: Scales across a wide range of tasks (Comparable to GPT-3.5 Turbo).
Gemini Nano: Used for specific tasks and mobile devices (A competitor to Apple’s Siri).
How Can You Use Google Gemini?
Gemini Pro will be integrated into Google products such as Gmail, YouTube, Docs, and more. It will be available on Google Bard starting from December 6th, 2023.
Note: Bard is now available in English in more than 170 countries BUT in Canada, the UK, or Europe.
Gemini Ultra: Developers and enterprise customers
Starting December 13th, 2023, Gemini Pro will be available for developers and enterprise customers through an API on Google AI Studio or Google Cloud Vertex AI.
This allows them to start exploring and integrating Gemini into their projects.
Gemini Nano:
Currently exclusive to Pixel 8 Pro devices.
May become available for other Android devices in the future.
Why is it Relevant?
To understand the significance of its marketing campaign, consider its promotional video. It is truly 🤯🤯🤯
Here we see a clear separation of Gemini from the rest of the players.
lets recap =>
Takeaways from the Paper and Announcements:
🤔 Gemini Ultra analyzes video in Real-time with such an understanding of the photograms that we imply: the model is making inferences only about the relevant objects shown with no further guidance.
🤔 Gemini Ultra exhibits an understanding of context; for instance, when a person made a hand gesture for Rock-Paper-Scissors, the model recognized that the person was playing a game, rather than merely describing the hand's movement.
🤔 Gemini Ultra shows human-level interaction (never seen before). For example, I was able to create a question-based game in real time within seconds, reflecting a high degree of creativity and initiative
Even the most advanced models like GPT-4V cannot achieve what was claimed today, as they are incapable of analyzing a sequence of frames along the temporal axis. Yet, it's worth noting that GPT-4V is already a breathtaking model that astonished the world just two weeks ago
🌟 The Gemini suite was developed with native multimodality. This feature is significant because, for instance, aids in a nuanced understanding of languages like Mandarin where the tone is crucial. This differs from other models that transcribe audio to text, losing the tone.
🎨 As part of its multimodality capabilities, Gemini can process inputs like images, text, video, and audio, and output text and images.
🏆 Google asserts that Gemini surpasses GPT-4 in "30 of the 32 widely used academic benchmarks."
🤔 With a score of 90%, Gemini Ultra is the first AI model to outperform human experts on the MMLU benchmark.
📊 Gemini is available in three tailored versions: Ultra for complex tasks, Pro for scalability, and Nano for on-device efficiency.
🛠️ Starting December 13, Gemini Pro will be accessible to developers through an API on Google AI Studio or Google Cloud Vertex AI.

Google Gemini Hangover
Let's move beyond the initial buzz from yesterday to gain a clearer understanding of what we witnessed, starting from the less impressive aspects to the more remarkable ones
About the MMLU benchmark 🤔
It stands for: Massive Multitask Language Understanding, and the benchmark is a new way to test AI models. The goal is to see if AI models can use what they know to answer questions, even if they haven't been specifically trained on those questions.
However, it also provides an estimate of expert-level human performance, which is around 89.8% accuracy. Simple put, if the AI model scores more than 89.8% it can perform better than human experts
The MMLU benchmark covers 57 different subjects, including science, history, and more, and it has questions that range from easy to very hard.
According to Google, Gemini outperforms human experts in all 57 categories, not even ChatGPT which is the state-of-the-art LLM model can do this. This is a bold statement

The benchmark is not without its issues. The credibility of the claimed 89.8% human expert baseline is questionable, considering there's a likely margin of error of 2-3% in the test itself.
Furthermore, different prompting strategies used by models make direct comparisons dubious. While touted as a popular benchmark, presenting MMLU results to 2 decimal places is misleading given inherent errors in the test. So while a useful benchmark, the MMLU should not be over-interpreted when comparing the performance of large language models.
Source: Hugging Face
Source2 : AI-Explained
About the Promotional Video
"It's not what happens to you that matters, but how you react to it."
The authenticity of the video we witnessed is questionable, or at the very least, it might not be entirely accurate.
Shortly after the release of the promotional video, Google published an article detailing how the video was produced. (Linked below)
This revelation raises numerous questions and suggests that Gemini's responses were not in real-time to the video and user's voice. Instead, it appears that Gemini's reactions were based on a series of pre-selected frames (photograms) and carefully crafted prompts, which directed the responses of Google's Gemini Ultra
For example, at 2:45 minutes in the video, we see footage of a hand playing Rock-Paper-Scissors, which Gemini appears to recognize instantly.
However, the reality behind this demonstration is different:
1️⃣ The model was shown three separate pictures representing each hand movement: Rock, Paper, and Scissors.
Each image was paired with a prompt to guide the model towards the correct inference (an example is provided below)

2️⃣ They combined the three pictures with a well-crafted text in a single prompt, thereby leading Gemini to the intended response

After providing the initial information, additional prompts were used to ensure that Gemini fully grasped the context.
…
With CHATGPT
We replicated the same sequence using ChatGPT with GPT-4 and arrived at a similar conclusion to that of Gemini.
To illustrate, here's how GPT-4 responded to the action described in step 2️⃣.
It appears that you are playing a game of "Rock, Paper, Scissors." The three images represent the three elements of the game:
1) The first image with the hand open and fingers extended represents "Paper."
2) The second image with a clenched fist represents "Rock."
3) The third image with two fingers extended in a V shape represents "Scissors."
This hand game is often used as a selection method in a similar way to tossing a coin or drawing straws, to decide between two or more parties where there is a dispute or to determine who gets to go first in some activity.
...Continuing in the same vein...
There is a significant difference between an AI making real-time inferences about a situation based on its analysis of video footage and user voice commands, and training the machine using pre-selected images while also employing carefully crafted prompts to steer the AI's response
⁉️ Many ambiguities undermine the true achievements of Google's Gemini Ultra, which are, nonetheless, remarkable and compete shoulder to shoulder with GPT-4.
Below is the article where Google explains the making of the video
WEEKLY DIGEST
📱 Pixel 8 Pro is the first smartphone engineered to run Gemini Nano
The entrance of LLM to phone natively
Despite the buzz around comparing Gemini Ultra with GPT-4, a major headline is the introduction of Large Language Models (LLMs) natively to smartphones.
While ChatGPT indeed has a dedicated application, it is constrained by the operating system's limitations and faces challenges in interoperability with other apps. Its functionality also ceases when offline.
This is where the power of Gemini Nano comes into play, split into two versions: Nano-1 with 1.8 billion parameters and Nano-2 with 3.25 billion parameters. These models are tailor-made for on-device operations, especially optimized for performance within Android environments.
A key feature of Gemini Nano is its ability to function without an internet connection. According to the announcement, it excels in:
Summarization,
Reading comprehension,
Text completion tasks,
Impressive reasoning capabilities,
STEM code support
(Science, Technology, Engineering, and Mathematics)
Multimodal interactions (Voice, Image, Text, and Video),
Multilingual tasks, relative to their size
WEEKLY DIGEST
Gemini Research Paper Briefing
✏️ Supports inputs and outputs in text, vision, and audio, such as transcription and image generation.
📏 Features a decoder architecture with a 32,000-token context length and Multi-Query Attention (MQA).
👀 Incorporates a visual encoder inspired by the Flamingo model.
📚 Trained on a diverse range of data including web documents, books, code, images, audio, and video. However, the total number of tokens used in training is unspecified.
⚡️ Utilizes both TPUv5e and TPUv4 for training.
⬆️ The performance of Gemini Ultra is comparable to, or slightly surpasses, that of GPT-4.
📇 AlphaCode 2 demonstrates revolutionary AI coding ability, solving problems unsolvable by GPT-4 via massively sampling and scoring solutions with fine-tuned Gemini models.
📇 AlphaCode-2 scores at 87% percentile among the human competitors.
💪 Exhibits strong capabilities in reasoning, coding, and language understanding.
🔁 Employs Reinforcement Learning from Human Feedback (RLHF) for fine-tuning.
❌ Lacks information on the sizes of the Ultra and Pro models.
❌ Training data specifics are not comprehensively detailed.
Blog post:https://lnkd.in/eud5Wd74
Technical Report: https://lnkd.in/e5NvbhRv

ACTIONABLE TIPS
Use cases
Research = > A task that would take researchers hundreds of hours if not thousands of hours, was achieved on a lunch break.
Coding => Gemini is great at competitive programming.
AlphaCode2 performs at 85% that of its human counterparts.
That increases to an even higher % when paired with a human.
Creativity =>
Education =>
TRAINING
Google launched a 100% free learning path for Generative AI 🎉
🤖 Intro to Generative AI
🤖 Intro to Large Language Models
🤖 Intro to
Since it is Google week 🙂 👉
Thank you, see you next week!
10
Reply