Written by

Perplexity Team

Published on

Agent API: A Managed Runtime for Agentic Workflows

Today we are releasing the Perplexity Agent API, a managed runtime for building agentic workflows with integrated search, tool execution, and multi-model orchestration.

It replaces a model router, a search layer, an embeddings provider, a sandbox service, and a monitoring stack with a single integration point.

The agentic loop as a compute model

A conventional CPU executes a deterministic cycle: fetch an instruction, decode it, execute it, store the result. The program counter advances. The processor never decides what to do.

The Agent API implements a different compute model. The processor is a frontier language model. It receives an objective and determines how to achieve it. It decomposes that objective into a plan, selects which tools to use from its available tool set, executes, observes the results, evaluates whether the objective is met, and iterates. The context window serves as registers. Reasoning and orchestration serve as the scheduler. 

Consider preparing for a sales call with a prospect you've spoken to a couple of times. You send a single request to the Agent API with three tools: one to search your internal CRM,  web_search, and  fetch_url. The model calls your CRM tool first, retrieving context from past conversations. It then calls web_search to find recent news and competitive intelligence, returning several relevant pages. It decides two of those pages warrant deeper reading and calls fetch_url on each. In three steps, the model has assembled internal history, broad web context, and full-page detail into a single, grounded response. That is the agent loop.

Orchestration of the full agentic loop 

It is important to distinguish the Agent API from model routing services. The Agent API is a managed runtime that orchestrates the full agentic loop: retrieval, tool execution, reasoning, and multi-model fallback, and any custom tools you give it access to. It replaces a model router, a search layer, an embeddings provider, a sandbox service, and a monitoring stack with a single endpoint, account, and API key.

The API is model-agnostic across all frontier model providers. For high-availability applications, the API supports model fallback chains: specify multiple models, and the API automatically tries the next if one is unavailable. This ensures close to 100% availability. 

Powerful, built-in tools

Two built-in tools are available: web_search and fetch_url. web_search supports domain filtering (allowlist and denylist, up to 20 domains), recency filtering, date range filtering, language filtering, and configurable content budgets per page. fetch_url retrieves and extracts full page content from specific URLs. 

Beyond the built-in tools, custom functions allow developers to connect the agent to their own backends, databases, and APIs.

Continuously optimized frontier model presets

Building an effective agent configuration from scratch requires choosing the right model, calibrating reasoning depth, selecting tools, and tuning token budgets. Perplexity does this continuously for its own products, backed by an in-house evaluation team that benchmarks configurations against real workloads.

Presets share that expertise. Each preset is a fully transparent, pre-configured setup optimized for a specific use case: fast factual lookups, balanced research, deep multi-source analysis, and institutional-grade research. We publish the recommended system prompt, tools, and cost profile for each. As the model landscape evolves, we update the underlying configurations so the preset always reflects the current state of the art at a predictable cost. All preset parameters are overridable: developers can use a preset as a starting point and adjust the model, tools, step count, or token budgets in a single request.

Deep Research 2.0, available through the advanced-deep-research preset, is the same multi-step reasoning engine that powers Perplexity's consumer product. It performs dozens of searches per query, reads hundreds of source documents, and iteratively refines its analysis. Performance on DRACO, Scale AI's ResearchRubrics, and Google DeepMind's DeepSearchQA is detailed in our DRACO benchmark post.

The Agent API is available today. Documentation and quickstart guides are at docs.perplexity.ai 

Agent API: A Managed Runtime for Agentic Workflows

Today we are releasing the Perplexity Agent API, a managed runtime for building agentic workflows with integrated search, tool execution, and multi-model orchestration.

It replaces a model router, a search layer, an embeddings provider, a sandbox service, and a monitoring stack with a single integration point.

The agentic loop as a compute model

A conventional CPU executes a deterministic cycle: fetch an instruction, decode it, execute it, store the result. The program counter advances. The processor never decides what to do.

The Agent API implements a different compute model. The processor is a frontier language model. It receives an objective and determines how to achieve it. It decomposes that objective into a plan, selects which tools to use from its available tool set, executes, observes the results, evaluates whether the objective is met, and iterates. The context window serves as registers. Reasoning and orchestration serve as the scheduler. 

Consider preparing for a sales call with a prospect you've spoken to a couple of times. You send a single request to the Agent API with three tools: one to search your internal CRM,  web_search, and  fetch_url. The model calls your CRM tool first, retrieving context from past conversations. It then calls web_search to find recent news and competitive intelligence, returning several relevant pages. It decides two of those pages warrant deeper reading and calls fetch_url on each. In three steps, the model has assembled internal history, broad web context, and full-page detail into a single, grounded response. That is the agent loop.

Orchestration of the full agentic loop 

It is important to distinguish the Agent API from model routing services. The Agent API is a managed runtime that orchestrates the full agentic loop: retrieval, tool execution, reasoning, and multi-model fallback, and any custom tools you give it access to. It replaces a model router, a search layer, an embeddings provider, a sandbox service, and a monitoring stack with a single endpoint, account, and API key.

The API is model-agnostic across all frontier model providers. For high-availability applications, the API supports model fallback chains: specify multiple models, and the API automatically tries the next if one is unavailable. This ensures close to 100% availability. 

Powerful, built-in tools

Two built-in tools are available: web_search and fetch_url. web_search supports domain filtering (allowlist and denylist, up to 20 domains), recency filtering, date range filtering, language filtering, and configurable content budgets per page. fetch_url retrieves and extracts full page content from specific URLs. 

Beyond the built-in tools, custom functions allow developers to connect the agent to their own backends, databases, and APIs.

Continuously optimized frontier model presets

Building an effective agent configuration from scratch requires choosing the right model, calibrating reasoning depth, selecting tools, and tuning token budgets. Perplexity does this continuously for its own products, backed by an in-house evaluation team that benchmarks configurations against real workloads.

Presets share that expertise. Each preset is a fully transparent, pre-configured setup optimized for a specific use case: fast factual lookups, balanced research, deep multi-source analysis, and institutional-grade research. We publish the recommended system prompt, tools, and cost profile for each. As the model landscape evolves, we update the underlying configurations so the preset always reflects the current state of the art at a predictable cost. All preset parameters are overridable: developers can use a preset as a starting point and adjust the model, tools, step count, or token budgets in a single request.

Deep Research 2.0, available through the advanced-deep-research preset, is the same multi-step reasoning engine that powers Perplexity's consumer product. It performs dozens of searches per query, reads hundreds of source documents, and iteratively refines its analysis. Performance on DRACO, Scale AI's ResearchRubrics, and Google DeepMind's DeepSearchQA is detailed in our DRACO benchmark post.

The Agent API is available today. Documentation and quickstart guides are at docs.perplexity.ai 

Share this article