Agents in the Enterprise: A Field Guide

We recently hosted an event with some of the top CIOs and AI leaders here in the Bay Area, and used the opportunity to dive into the use of Agents in the Enterprise.

Key topics

McKinsey: The Potential of AI Agents
What’s the Definition of an Agent?
How Will Agents Impact Automation?
How Do Agents Work?
How Can Today’s CXOs Get Started on Deploying Gen AI Automation?

Part I: McKinsey: The Potential of AI Agents

In 2017, McKinsey predicted that AI could automate 40% of tasks, today they’re predicting 70%
Gen AI enterprise use cases could yield $2.6 trillion to $4.4 trillion annually in value across more than 60 use cases – How much of this value is realized as business growth and productivity will depend on how quickly enterprises can reimagine and truly transform work in priority domains—that is, user journeys or processes across an entire chain of activities, or a function
People want workflows around LLMs, not point solutions – Today, the new hotness is scaffolding, which essentially boils down to checking for quality and executing properly by addressing gaps in logic, hallucinations, context windows, memory, etc.
Workflows Will Become More Advanced Over Time – This will depend on how advanced the LLM is and how well it inferences. GPT-5 will use at a minimum over 100x the compute used to train as GPT-4. It may be cost prohibitive to replace these models over the next couple of years. Many problems are however, requiring increasingly less compute to solve (complex math is down 10x YoY)
Once Quality and Inference Solved, It Comes Down to the Engineers – Enterprise-grade, ready-to-apply software will be the gamechanger
Value Will Move Upstream – The hyperscalers and NVIDIA have already won. MS, for example, is launching a generic agent framework, and you’ll be able to use them to build an agent. They want to be the OS of companies, controlling both data and workflows
Lots of Early Adoption in Healthcare and Financial Services – Similar to the ML wave years prior

Part II: What’s the definition of an Agent?

Today’s large language models are the just start of the GenAI revolution—companies need to prepare for what’s coming next: autonomous agents that work independently to achieve an assigned goal.

What is an Agent: Unlike stand-alone LLM-based applications that still require humans to input prompts, autonomous agents can plan how to execute tasks end to end, monitor the output, adapt, and use tools to accomplish goals. Autonomous agents are able to sense and act on their environment. Building on GenAI’s ability to mimic human behavior, agents could make it possible to run simulations at scale for a wide range of products and services. Companies need to start preparing today for agents’ arrival to the mainstream in three to five years with a robust transformation roadmap.

What is Not an Agent: They are not LLMs interpreting a prompt, or LLMs with RAG or fine-tuning. LLMs are really just a set of weights, where you input a prompt, and receive a probabilistic projection. Additionally, LLMs are NOT software – the codebase of GPT-3 had <300 lines of code in total.

Agents react, extract, and take full action – a software wrapper around the LLM is required to take things to the next level

Part III: How Will Agents Impact Automation?

Automation is and has been a difficult problem to solve. Today, most practitioners are deploying human-in-the-loop as their preferred framework, since LLMs are still struggling in areas such as context, memory, and deterministic execution. All that being said, the true dream of AI is to automate simple (or perhaps more complex) workflows as the technology advances. Getting this right is going to involve a lot of “scaffolding,” in terms of data and architecture, including:

AI Development: E.g. a framework or set of tools that assist in building and deploying AI agents. This can include things like:

Pre-built components for common tasks like perception or decision-making.
Tools for ensuring data quality and security during training.
Standards and best practices for developing reliable and trustworthy AI.

Agentic Learning: E.g. How AI agents themselves learn and interact with the world. This can include things like:

Providing prompts, hints, or feedback during the training process.
Designing the environment where the agent learns to encourage desired behaviors.

The fundamental goal is really to provide the agent with temporary structures or guidance that help it develop its own capabilities. Over time, the scaffolding can be removed as the agent becomes more proficient. Early automation efforts including RPA, middleware and low-code have all come with a lot of bootstrapping, rigidity in workflow, and fragility when it comes to changing operating environments. GenAI agents have the potential to leapfrog these kinds of automations, moving from scoped workflows to more sophisticated tasks. In the past, ML was only capable of very rigid workflows such as sentiment analysis, computer vision, or NLP – in the future, we could see greater understanding and reasoning, environmentally-adaptive planning, and persistent memory. LLM providers are already beginning to offer no-code platforms to build custom versions of their LLMs – for individual tasks, this is a good place to start experimenting.

Part IV: How Do Agents Work?

Agents are fundamentally software wrappers around an LLM, which enable that LLM to model, collaborate, access tools or features, etc. LLM providers now offer integrations with data sources and applications, allowing AI agents to leverage external data as part of their workflow. RAG additionally allows access to proprietary data and APIs bring in external tools like search.

Additionally, creating a multi-agent framework (such as a workflow with a sequence of tasks) will require gluing together multiple agents into a single unified agent via code.

“Cognitive architectures explained for non-developers.” April 22, 2024. Antti Karjalainen, Sema4.ai

Part V: How Can Today’s CXOs Get Started on Deploying Gen AI Automation?

On the Tech Side…

Figure out what old tools work well for, and what new tools work well for – Most enterprises today are already using ML and RPA – figure out via cost analysis where the gaps exist that genAI can help fill today. This includes figuring out the cost of implementation, including costs that don’t immediately meet the eye. Low risk use cases are a good starting point – easy wins are a good thing and a lot of results are actually coming out of projects once people just get things going. The difficulty is in the initial setup and friction
Crawl before you run – Have a deep understanding of each use case, the end user, proper performance benchmarks, and what LLMs are actually capable of will be critical. As a next step, scaffolding will be required to better incorporate broader context, external tooling or data, etc.
Fix your data – LLMs aren’t magic, it’s easy for an LLM to read the wrong data if it’s organized poorly, or, if your data itself is wrong. Data volume isn’t there either, even 10 years of data may not be enough to go it alone on your own LLM. For a lot of enterprises, the journey begins with creating clean and focused data sets and pipelines that can ground the models
Choose a model/platform based on your use case – The LLM landscape is evolving rapidly with the impending release of GPT5/Llama3. At the same time, multiple models at GPT4-level performance are now available at attractive cost points. Enterprises now have models from different sources, at different cost-performance levels, to choose from based on use-case and functionality needs. Additionally, incumbent players are embedding AI into their customer solutions, and many startups are taking an AI-native approach to reinvent vertical use cases or create new platforms. Workflow and performance benchmarks should drive choices
Reimagine your architecture – LLMs and agents need access to systems. How will the software of tomorrow be built? A lot of this is back to basics – new frameworks and reference architectures are required. We will need APIs, standardized interfaces, and modern architecture. Adoption of new APIs, orchestration and microservices will be gated by industry frameworks – you will need a CoE to help navigate

On the talent side….

Train employees on how to use the tools – LLMs today are very sensitive to prompting and slight variations could cause drift in the model output. Establishing clear measures of performance for each use case is critical. The same goes for governance and data security. Human-in-the-loop is a fundamental feature of all AI deployments today
Build a center of excellence – Hub and spoke model works very well for this use case. Just make sure that it’s the center of “yes” and not the center of “no” – otherwise you will see lots of shadow IT. There must be a federated model for AI on purpose – this can’t be an IT initiative – companies will need business sponsors who have budgets
Get ahead of change management – Executive buy-in is required for cross-functional or even functional initiatives and AI productivity cannot be realized by employees if they are not interested in or educated on the topic. Teams that are segregated functionally or regionally will be incapable of implementing effective early use cases
Talent – Do you have the talent you need today? If your company doesn’t have a foundation, it’s time to start thinking about partnerships and/or acquisitions
When Starting Out, Partners > DIY – Build a platform around use cases with major vendors to start, and in the meantime develop a responsible AI policy (What are the questions we’ll allow a non-technical person to ask? How do we insert controls into the source of data and logic? How can we turn everything into an API call?)

Blog

06.2024