As enterprises race to integrate AI agents into operations, many are discovering a hard truth: it’s not the models holding them back—it’s the mess.

Specifically, the mess of unstructured data.

While much of the excitement in enterprise AI focuses on models, tools, and interfaces, one fundamental layer is too often ignored: the quality, consistency, and readiness of the knowledge feeding these systems.

In the recent Shelf webinar, AI Agents: From Hype to Reality – A Strategic Roadmap for Enterprise Transformation, Jan Stihec, Director of Data and Generative AI at Shelf, cut straight to the point:

“Data quality is the number one barrier preventing GenAI success. It shows up in survey after survey, and yet, 85% of professionals say their leadership isn’t addressing it.”

The message is clear: AI agents are only as good as the knowledge they’re built on. And if that knowledge is incomplete, inconsistent, or outdated, the agents will behave accordingly—poorly, sometimes dangerously.

The Hidden Complexity of Unstructured Enterprise Knowledge

What exactly are we talking about when we say “unstructured data”?

It’s the everyday fabric of enterprise knowledge: PDFs, PowerPoints, emails, policies, presentations, training manuals, customer service scripts, outdated spreadsheets, internal wikis, and product catalogs. Unlike structured databases, these formats are harder to classify, link, deduplicate, or update.

Stihec shared survey findings from over 500 enterprise teams, revealing where unstructured data is most commonly used in GenAI initiatives:

  • 79% use PDFs and Word documents
  • 67% tap into email repositories
  • 63% rely on PowerPoint and other presentations
  • 52% leverage internal knowledge base articles

These are not edge cases. These are the core ingredients of enterprise knowledge work—and therefore the raw material for enterprise AI agents.

Unfortunately, this raw material is often deeply flawed.

“Over 90% of files contain at least one major inaccuracy,” said Stihec, referencing Shelf’s proprietary analysis of millions of enterprise documents. “33% are duplicates or redundant. A full quarter are outdated. And 12% pose compliance risks.”

These issues are not theoretical. They’re operational hazards—especially in the context of AI agents designed to act on knowledge.

The AI Agent Amplification Effect

The consequences of poor data quality are magnified in AI agent systems.

Traditional chatbots might generate incorrect answers based on bad inputs. That’s problematic, but often manageable. AI agents, on the other hand, don’t just answer—they act. They make decisions, update records, trigger workflows, and plan next steps autonomously.

“With AI agents, inaccurate data doesn’t just produce bad responses—it produces bad actions,” said Stihec. “This has downstream consequences that can impact customer experience, revenue, and compliance.”

Consider a contact center agent that retrieves a deprecated product catalog. The agent might quote the wrong price to a customer, apply the wrong discount, or overlook a critical promotion.

Or a sales enablement agent that provides an account executive with an outdated pricing model, leading to a misquoted proposal—and potentially, a lost deal or financial write-down.

Worse still, in multi-agent architectures—where outputs from one agent become inputs for another—errors cascade. A single flawed document can compromise the entire agent network.

“We’ve seen cases where one inaccurate file created a ripple effect across four or five agents,” said Stihec. “This is not hypothetical—it’s happening right now.”

Why Traditional Data Management Isn’t Enough

Enterprise teams have long focused on structured data governance—cleaning CRMs, updating fields, managing taxonomies. But unstructured data has largely escaped the same rigor.

There are several reasons:

  • It’s spread across platforms—SharePoint, Google Drive, Dropbox, Confluence, Slack, and more
  • It exists in multiple versions, authored and edited by different teams
  • It wasn’t designed for AI—most legacy content was never meant to power autonomous systems

“No one was creating PDFs 10 years ago expecting them to be used by AI agents,” Stihec noted. “So the formats, language, structure, and even the truthfulness of those documents may no longer be valid.”

In other words, enterprise knowledge is often a museum—not a mission control center.

What Enterprises Actually Need: Continuous Knowledge Validation

So what’s the solution?

First, it’s not about hiring a new army of content managers to manually audit and rewrite every policy or training document. That’s neither realistic nor scalable.

Instead, organizations need automated, intelligent systems that can:

  • Continuously assess data quality in real-time
  • Identify outdated, conflicting, duplicated, or incomplete content
  • Flag risky documents before they enter an AI agent’s memory
  • Enable teams to fix content at the source—and keep it fixed

“If you had a magic wand,” Stihec said, “the three capabilities companies most want are: the ability to identify inaccuracies, detect conflicting content, and surface missing knowledge. That’s exactly what Shelf is building.”

Shelf’s AI-powered knowledge platform is designed around these needs. It doesn’t just store and serve content—it validates it, monitors its impact on downstream AI behavior, and makes it easy to fix issues at the root.

For example, if an AI agent pulls the wrong answer from a conflicting policy document, Shelf can trace the source, compare similar content, and help knowledge owners reconcile the conflict—all with full transparency.

“You can’t fix what you can’t see,” said Stihec. “Our goal is to make the invisible visible—so enterprises can regain trust in their knowledge layer.”

AI Agents Are Only As Smart As Your Worst Document

The AI agents revolution promises autonomy, orchestration, and scale. But if unstructured data remains unchecked, those agents will act on noise, not knowledge.

This is why data is not just an IT concern—it’s a strategic imperative. Unstructured content must be treated as a first-class citizen in the AI transformation roadmap.

And as Stihec concluded, “It’s not the next-gen models that will fix your data problems. You have to address them directly, at the source—because no matter how smart the model, it can’t outreason bad input.”

The call to action for enterprise leaders is urgent: Audit your unstructured data. Invest in tools that surface and solve quality issues. Establish a governance layer that evolves with your AI maturity.

Because when it comes to AI agents, it’s simple: Garbage in, failure out. But the inverse is just as true: Good knowledge in, transformational intelligence out.