The AI Weekly Breakthrough | Issue 5 | April 9, 2024

Welcome to The AI Weekly Breakthrough, a roundup of the news, technologies, and companies changing the way we work and live.

More Agents Is All You Need

Midjourney depiction of robots as call center agents

A study using a simple sampling-and-voting method showed that LLM performance scales with the number of agents used. By increasing the number of agents (or instances of LLMs), researchers from Tencent found that overall performance can be enhanced across various LLMs and types of tasks, ranging from arithmetic reasoning to code generation. Their experiments revealed that a brute-force ensemble of smaller LLMs could match or even surpass the performance of larger LLM models. The study includes examination of the effects of task difficulty on the performance gains observed and concludes by proposing ways to optimize the effectiveness of the “More Agents” approach.

Many-Shot Jailbreaking

Midjourney depiction of a robot sitting in a prison cell reading a book

Anthropic’s researchers have cracked the code on a jailbreaking technique that outsmarts the safety measures of current LLMs. Many-shot jailbreaking exploits large context window sizes by including a lengthy faux dialogue between a human and an AI assistant within a single prompt. In this dialogue, the AI assistant readily answers potentially harmful queries from the human with the exception of one final query. The study reveals a direct correlation: the greater the number of faux interactions included in that prompt, the more probable it is for the LLM to comply with the dangerous final query. Anthropic is openly publishing their findings to the larger AI community to accelerate the development of more robust defenses that can armor up against such vulnerabilities.

Cohere’s New Enterprise-Grade LLM

CommandR from Cohere

Cohere debuts its latest LLM, Command R+, on Microsoft Azure. Built on the foundations of the earlier Command R model, this enterprise-grade model features a 128k-token context window and offers best-in-class RAG with citation to reduce inaccuracies, multilingual support in 10 business languages, and a robust Tool Use API for automating complex workflows. According to Cohere, Command R+ is “optimized for conversational interaction and long-context tasks [and] aims at being extremely performant, enabling companies to move beyond proof of concept and into production.”

Google Contemplates Paid Search

Google logo on the wall

Google is contemplating charging for a premium GenAI-powered version of its search engine, marking a significant departure from its long-held business model. This initiative aims to add AI-enhanced search capabilities to its subscription services, which already include access to Gemini AI in Gmail and Docs. Although the tech giant’s traditional search services will remain free and ad-supported, this move signifies Google’s first foray into charging for improvements to its primary search product.

Epic Moves into AI

Midjourney depiction of a scientist looking at patient's health records

Epic, a leader in electronic health records, will launch an AI validation software designed to allow healthcare organizations to locally evaluate and monitor AI models for accuracy and efficacy. The software automates data collection and mapping, provides real-time metrics and analysis on tested AI models, and features intuitive reporting dashboards and a common monitoring template. Epic’s AI move supports local auditing of AI models on patient populations and workflows, which is crucial as a rural hospital’s needs can differ significantly from a specialized urban center. Experts especially laud this step, seeing it as crucial for ensuring AI models are validated and monitored in alignment with the specific contexts of different healthcare settings.

Augment Yourself 🤖

🔥 For more AI News brought to you via email subscribe to our newsletter here.
👀 Want to know more about Shelf’s suite of AI solutions? Check out our website here.