Skip to main content

The Problem

For LLM researchers, setting up LLM training or reinforcement learning environment for real-world tool use is complex and painful:
  • Managing different environment or test accounts
  • Implementing MCP Servers and handling various authentication issues
  • Initializing realistic data
  • Resetting states between multiple runs
  • Ensuring isolation across concurrent sessions

The Solution

Klavis MCP Sandbox as a Service solves these challenges. In addition to letting your model interact with our comprehensive MCP server ecosystem, you can use our sandbox infrastructure to easily verify and reset data on any concurrent run.
Our sandbox infrastructure is horizontally scalable, so it can handle any number of concurrent sessions as you need.

Lifecycle

1

Create

Request a sandbox based on the external services you need (Snowflake, GitHub, Notion, Woocommerce, etc.) and get an MCP server URL for that isolated instance.
2

Initialize

Load a deterministic “world state” in JSON format or via API. We handle everything—creating databases, setting up data, and more.
3

Interact via MCP

Let your LLM / AI agent use MCP tools against the sandbox as if it were the real app. You can use multiple MCP servers with many tools simultaneously.
4

Verify

Access the full sandbox state to programmatically compare against your ground truth—whether your LLM completed the task correctly or not.
5

Reset / Delete

Wipe the sandbox back to a clean state and kick off the next run.

Video

Resources

Example Notebook

Create sandboxes, seed data, run an agent, then verify and clean up.

Sandbox API

Manage isolated sandbox environments for training/eval: pooling, init, export, teardown.

Fireworks + Klavis

Use Klavis MCP Sandbox with Eval Protocol for model training and RL at scale.