Close Menu
Innovation Village | Technology, Product Reviews, Business
    Facebook X (Twitter) Instagram
    Wednesday, June 4
    • About us
      • Authors
    • Contact us
    • Privacy policy
    • Terms of use
    • Advertise
    • Newsletter
    • Post a Job
    • Partners
    Facebook X (Twitter) LinkedIn YouTube WhatsApp
    Innovation Village | Technology, Product Reviews, Business
    • Home
    • Innovation
      • Products
      • Technology
      • Internet of Things
    • Business
      • Agritech
      • Fintech
      • Healthtech
      • Investments
        • Cryptocurrency
      • People
      • Startups
      • Women In Tech
    • Media
      • Entertainment
      • Gaming
    • Reviews
      • Gadgets
      • Apps
      • How To
    • Giveaways
    • Jobs
    Innovation Village | Technology, Product Reviews, Business
    You are at:Home»Artificial Intelligence»Google unveils Gemini 2.0 Flash AI model with advanced image and audio generation capabilities
    Google Gemini Assist

    Google unveils Gemini 2.0 Flash AI model with advanced image and audio generation capabilities

    0
    By Tapiwa Matthew Mutisi on December 11, 2024 Artificial Intelligence, Google, News, Technology

    Google has unveiled its next major AI model, Gemini 2.0 Flash, designed to compete with the latest offerings from OpenAI. This new model, announced on Wednesday, boasts the ability to natively generate images and audio in addition to text. Furthermore, 2.0 Flash can utilize third-party apps and services, enabling it to tap into Google Search, execute code, and more.

    OpenAI introduces Sora Turbo, a text-to-video AI model

    An experimental release of 2.0 Flash is now available through the Gemini API and Google’s AI developer platforms, AI Studio and Vertex AI. However, the audio and image generation capabilities are initially accessible only to “early access partners,” with a broader rollout planned for January.

    In the coming months, Google plans to integrate 2.0 Flash into a variety of products, including Android Studio, Chrome DevTools, Firebase, Gemini Code Assist, and others.

    Flash, Upgraded

    The first-generation Flash, known as 1.5 Flash, was limited to text generation and not designed for particularly demanding workloads. The new 2.0 Flash model is significantly more versatile, partly because it can call tools like Search and interact with external APIs.

    “We know Flash is extremely popular with developers for its balance of speed and performance,” said Tulsee Doshi, head of product for the Gemini model at Google, during a briefing on Tuesday. “And with 2.0 Flash, it’s just as fast as ever, but now it’s even more powerful.”

    Google claims that 2.0 Flash is twice as fast as the company’s Gemini 1.5 Pro model on certain benchmarks, according to their own testing. The new model is “significantly” improved in areas such as coding and image analysis, and it replaces 1.5 Pro as the flagship Gemini model due to its superior math skills and “factuality.”

    New Capabilities

    2.0 Flash can generate and modify images alongside text. It can also ingest photos, videos, and audio recordings to answer questions about them, such as “What did he say?”

    Another key feature of 2.0 Flash is its audio generation capability, which Doshi described as “steerable” and “customizable.” For instance, the model can narrate text using one of eight voices optimized for different accents and languages. Users can ask it to adjust the speed of speech or even to speak in a specific style, like a pirate.

    Despite these advancements, Google has not yet provided images or audio samples from 2.0 Flash, so the quality of its outputs compared to other models remains to be seen.

    To address concerns about misuse, Google is using its SynthID technology to watermark all audio and images generated by 2.0 Flash. On software and platforms that support SynthID, the model’s outputs will be flagged as synthetic. This measure aims to mitigate the growing threat of deepfakes, which saw a fourfold increase in detections worldwide from 2023 to 2024, according to ID verification service Sumsub.

    Multimodal API

    The production version of 2.0 Flash is set to launch in January. In the meantime, Google is releasing an API called the Multimodal Live API to help developers build apps with real-time audio and video streaming functionality.

    The Multimodal Live API allows developers to create real-time, multimodal apps with audio and video inputs from cameras or screens. It supports the integration of tools to accomplish tasks and can handle “natural conversation patterns” such as interruptions, similar to OpenAI’s Realtime API.

    The Multimodal Live API is generally available starting today, providing developers with the tools to leverage the advanced capabilities of 2.0 Flash in their applications.

    OpenAI launches ChatGPT Search with Real-Time Web Search, Challenging Google and Bing

    Related

    AI artificial intelligence (AI) Gemini Gemini 2.0 Flash Google Investments OpenAI Technology
    Share. Facebook Twitter Pinterest LinkedIn Email
    Tapiwa Matthew Mutisi
    • Facebook
    • X (Twitter)
    • LinkedIn

    Tapiwa Matthew Mutisi has been covering blockchain technology, intelligent technologies, cryptocurrency, cybersecurity, telecommunications technology, sustainability, autonomous vehicles, and other topics for Innovation Village since 2017. In the years since, he has published over 4,000 articles — a mix of breaking news, reviews, helpful how-tos, industry analysis, and more. | Open DM on Twitter @TapiwaMutisi

    Related Posts

    dLocal to Acquire AZA Finance, Strengthening Cross-Border Payments in Africa

    Top 10 Lucrative Tech Skills That Don’t Require Coding

    IBM Acquires Seek AI, Launches NYC AI Accelerator

    Leave A Reply Cancel Reply

    You must be logged in to post a comment.

    Copyright ©, 2013-2024 Innovation-Village.com. All Rights Reserved

    Type above and press Enter to search. Press Esc to cancel.