AI Like Her: Imperfect Artificial Intelligence

TechCrunch AI Newsletter: Initial Impressions of OpenAI’s Advanced Voice Mode with Vision

Welcome to the latest edition of TechCrunch’s AI newsletter. To receive this directly in your inbox each Wednesday, please subscribe via the provided link.

OpenAI’s New Feature: Advanced Voice Mode with Vision

Recently, OpenAI introduced Advanced Voice Mode with Vision, a capability that provides ChatGPT with real-time video input. This allows the chatbot to process visual information and expand its understanding beyond text-based interactions.

The core idea behind this advancement is to enhance ChatGPT’s contextual awareness, leading to more natural and intuitive responses.

Early Experiences and Inaccuracies

However, initial testing revealed instances of inaccurate interpretations. During a first attempt, the chatbot misrepresented an ottoman as a sofa when asked to describe a living room.

Upon correction, ChatGPT acknowledged the error, but continued to characterize the space as comfortable.

A Year in Development: From Demo to Reality

The debut of Advanced Voice Mode with Vision followed a year after its initial demonstration. OpenAI initially presented it as a significant step towards achieving AI functionality similar to that depicted in the film “Her.”

The company suggested this feature would empower ChatGPT with abilities such as solving visual math problems, interpreting emotions, and responding thoughtfully to written correspondence.

Reliability Remains a Key Challenge

While the feature demonstrates progress in several areas, it hasn’t fully addressed ChatGPT’s fundamental issue of reliability. In some cases, Advanced Voice Mode with Vision actually makes the bot’s inaccuracies more noticeable.

Fashion Feedback and Further Slip-Ups

When tested with a fashion-related query, ChatGPT provided feedback on an outfit but consistently failed to recognize a brown jacket that was being worn.

This wasn’t an isolated incident. OpenAI president Greg Brockman also experienced a miscalculation during a demonstration on “60 Minutes,” where ChatGPT incorrectly identified the height of a triangle when calculating its area.

The Question of Trust

This raises a critical question: what value does AI with “Her”-like capabilities hold if its accuracy cannot be guaranteed?

User Experience and Implicit Trust

Each instance of misinterpretation diminishes the inclination to actively engage with the feature, requiring a multi-step process to initiate. The feature’s optimistic presentation aims to build trust.

When that trust is broken, the experience is both unsettling and disappointing.

Looking Ahead

While OpenAI may eventually resolve the issue of hallucinations, the current state leaves us with a bot that perceives the world with inherent flaws. It remains uncertain whether this is a desirable outcome for users.

Recent Developments in Artificial Intelligence

OpenAI is maintaining a rapid pace of innovation with its “shipmas” event, unveiling new products daily leading up to December 20th. A comprehensive summary of these announcements is being continuously updated.

Content Creator Control on YouTube

YouTube is now empowering content creators with greater control over the utilization of their material. Specifically, creators can now indicate whether they permit third parties to employ their content for the training of AI models.

Enhancements to Meta’s Smart Glasses

Meta’s Ray-Ban Meta smart glasses have received several new updates leveraging AI technology. These include the capability for continuous dialogue with Meta’s AI assistant and real-time language translation.

DeepMind’s Veo 2: A Challenge to OpenAI

Google DeepMind is actively developing advanced video generation AI. Veo 2, announced on Monday, is designed to compete with OpenAI’s offerings, capable of producing videos exceeding two minutes in length with resolutions up to 4k (4,096 x 2,160 pixels).

Tragic Loss of OpenAI Researcher

Suchir Balaji, a former employee of OpenAI, was recently found deceased in his San Francisco residence, as confirmed by the San Francisco Office of the Chief Medical Examiner. Prior to his death, the 26-year-old AI researcher had voiced concerns regarding potential copyright infringements by OpenAI during an interview with The New York Times.

Grammarly’s Acquisition of Coda

Grammarly, renowned for its grammar and style checking software, has acquired the productivity startup Coda. The financial terms of the acquisition were not disclosed. Shishir Mehrotra, Coda’s CEO and co-founder, will assume the role of CEO at Grammarly as part of this transaction.

Cohere and Palantir Collaboration

A partnership between Cohere, an AI startup valued at $5.5 billion, and Palantir, a data analytics company, has been exclusively reported by TechCrunch. Palantir is well-known for its collaborations with U.S. defense and intelligence agencies, which have sometimes been subject to scrutiny.

Key Takeaways

OpenAI continues to release new AI products at a rapid rate.
Content creators gain more control over AI training data usage on YouTube.
Meta is integrating AI into its smart glasses for enhanced functionality.
DeepMind is developing a powerful video generation AI, Veo 2.
The AI community mourns the loss of OpenAI researcher Suchir Balaji.
Grammarly expands its capabilities through the acquisition of Coda.
Cohere partners with Palantir for data analytics applications.

A Weekly Look at AI Research: Anthropic's Clio

Anthropic has recently revealed details about Clio – a system designed for analyzing how users interact with their AI models. Clio, described as analogous to tools like Google Trends, is being leveraged to enhance the safety features of Anthropic’s artificial intelligence.

The company utilized Clio to gather and analyze anonymized user data, with a portion of these findings released publicly. This data sheds light on the diverse applications of Anthropic’s AI.

Key Use Cases for Anthropic AI

Customers are employing Anthropic’s AI for a broad spectrum of tasks. Web and mobile app development, content creation, and academic research are currently the most prevalent applications.

Interestingly, usage patterns differ based on language. For instance, users communicating in Japanese are significantly more inclined to request anime analysis from the AI than those using Spanish.

The insights gained from Clio are proving to be “valuable” according to Anthropic, directly contributing to improvements in AI safety protocols.

who wants ‘her’-like ai that gets stuff wrong?

Pika 2: A New Advancement in AI Video Generation

The AI company Pika has recently unveiled its latest video generation model, designated Pika 2. This new iteration possesses the capability to produce video clips based on user-provided inputs of characters, objects, and settings.

Pika’s platform allows users to upload several reference materials, such as images depicting a conference room and personnel. Pika 2 then intelligently determines the function of each reference before integrating them into a cohesive visual scene.

Understanding Pika 2’s Capabilities

While no current model achieves perfection, Pika 2 demonstrates significant progress. An example, showcased below, illustrates the model’s ability to maintain consistency, though it also exhibits the characteristic aesthetic anomalies often found in AI-generated video.

Despite these minor imperfections, the advancements in video generation tools are occurring at a remarkable pace. This rapid development is simultaneously capturing the attention of, and generating concern among, creative professionals.

The increasing sophistication of these tools is prompting both excitement and debate within the creative industries, as AI continues to reshape the landscape of video production.

AI Safety Assessments: A New Index

The Future of Life Institute (FLI), a nonprofit established with the involvement of MIT cosmologist Max Tegmark, has published an “AI Safety Index.” This index serves as a means of assessing the safety protocols implemented by prominent AI developers.

The evaluation focuses on five crucial domains: existing risks, safety infrastructure, strategies for mitigating existential threats, governance and responsibility, and openness in communication.

Index Findings and Company Performance

According to the Index, Meta received the lowest score among the companies analyzed, earning an overall grade of F. The scoring system employs both numerical values and a GPA-style assessment.

While Anthropic achieved the highest ranking, its performance still resulted in a C grade. This indicates that substantial enhancements are needed across the board in AI safety practices.

Key Areas of Evaluation

Current Harms: Assessing the immediate negative impacts of AI systems.
Safety Frameworks: Examining the established procedures for ensuring AI safety.
Existential Safety Strategy: Evaluating plans to address potential long-term, catastrophic risks.
Governance and Accountability: Determining the structures in place for responsible AI development.
Transparency and Communication: Measuring the openness of AI companies regarding their safety efforts.

The FLI’s index provides a benchmark for evaluating the commitment of AI organizations to responsible development. It highlights the need for continued progress in bolstering AI safety across all evaluated categories.

Topics

More