docs: dev-documentation for adding new backend/mcp tools (#7011)

## 📝 Summary

<!--
Provide a concise summary of what this pull request is addressing.

If this PR fixes any issues, list them here by number (e.g., Fixes
#123).
-->
This adds dev documentation for adding new mcp and backend tools
following the implementation of `add-lint-rules.md`. It also shares best
practices, common pitfalls (that I faced), and visual examples.

## 🔍 Description of Changes

<!--
Detail the specific changes made in this pull request. Explain the
problem addressed and how it was resolved. If applicable, provide before
and after comparisons, screenshots, or any relevant details to help
reviewers understand the changes easily.
-->

## 📋 Checklist

- [x] I have read the [contributor
guidelines](https://github.com/marimo-team/marimo/blob/main/CONTRIBUTING.md).
- [ ] For large changes, or changes that affect the public API: this
change was discussed or approved through an issue, on
[Discord](https://marimo.io/discord?ref=pr), or the community
[discussions](https://github.com/marimo-team/marimo/discussions) (Please
provide a link if applicable).
- [ ] I have added tests for the changes made.
- [ ] I have run the code and verified that it works as expected.
This commit is contained in:
Joaquin Coromina
2025-10-30 21:26:55 +01:00
committed by GitHub
parent f4436b0326
commit f4718f2a36

View File

@@ -0,0 +1,655 @@
# Adding Backend and MCP Tools to marimo
This guide explains how to create tools that are accessible via both the backend (chat panel) and MCP (Model Context Protocol) server endpoints.
## Overview
marimo provides a unified framework for creating tools that can be used by AI assistants to interact with notebooks. These tools are automatically registered in both:
1. **Backend Tools**: Used by the marimo chat panel (ask/agent modes)
2. **MCP Tools**: Exposed via the MCP server endpoint for external AI clients (like Claude Desktop)
The unified architecture means you write a tool once and it works in both contexts.
## Step-by-Step Implementation
### 1. Create the Tool File
Create a new file in `marimo/_ai/_tools/tools/your_tool.py` for your tool implementation.
### 2. Define Input and Output Types
Create dataclasses for your tool's arguments and output. Place these at the top of your tool file.
**Template:**
```python
from dataclasses import dataclass, field
from marimo._ai._tools.types import SuccessResult
from marimo._types.ids import SessionId
@dataclass
class YourToolArgs:
"""Arguments for your tool."""
session_id: SessionId
# Add other required parameters
optional_param: str = "default_value"
@dataclass
class YourToolOutput(SuccessResult):
"""Output from your tool."""
# Add your output fields
data: dict = field(default_factory=dict)
count: int = 0
```
**Important Type Patterns:**
- **Naming Convention**: Input dataclasses must end with `Args`, output dataclasses must end with `Output`
- Input args should use plain dataclasses
- Output should inherit from `SuccessResult` (provides `status`, `next_steps`, `message`, etc.)
- Use `field(default_factory=...)` for mutable defaults (lists, dicts)
- Use marimo types like `SessionId`, `CellId_t` for consistency
### 3. Create the Tool Class
Implement your tool class in the same file:
**Template:**
```python
# Copyright 2025 Marimo. All rights reserved.
from __future__ import annotations
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
from marimo._ai._tools.base import ToolBase
from marimo._ai._tools.types import SuccessResult, ToolGuidelines
from marimo._ai._tools.utils.exceptions import ToolExecutionError
from marimo._types.ids import SessionId
if TYPE_CHECKING:
from marimo._server.sessions import Session
@dataclass
class YourToolArgs:
"""Arguments for your tool."""
session_id: SessionId
# Add parameters here
@dataclass
class YourToolOutput(SuccessResult):
"""Output from your tool."""
# Add output fields here
sample_dict: dict = field(default_factory=dict)
class YourTool(ToolBase[YourToolArgs, YourToolOutput]):
"""Brief description of what this tool does.
More detailed explanation of the tool's purpose and functionality.
This docstring becomes the tool's description shown to AI assistants.
Args:
session_id: The session ID of the notebook
# Document other args
Returns:
A success result containing [describe what it returns].
"""
guidelines = ToolGuidelines(
when_to_use=[
"When [describe primary use case]",
],
prerequisites=[
"You must [describe args that need additional explanation]",
],
avoid_if=[
"When [describe when not to use]",
],
additional_info=(
"Any additional context or warnings about tool usage."
),
)
def handle(self, args: YourToolArgs) -> YourToolOutput:
"""Implement your tool logic here."""
# ToolContext provides access to sessions, notebooks, and all marimo state
context = self.context
session_id = args.session_id
session = context.get_session(session_id)
# Implement your logic
sample_dict = self._do_work(session)
return YourToolOutput(
sample_dict=sample_dict,
next_steps=[
"Review the results",
"Consider next actions",
],
message="Optionally add this is the results require more explanation",
)
# Helper methods (prefix with _)
def _do_work(self, session: Session) -> dict:
"""Private helper method."""
# Implementation details
return {}
```
### 4. Understanding ToolContext
`ToolContext` is your gateway to all marimo state—sessions, notebooks, cells, errors, and more. It's available via `self.context` in your tool.
#### How to access ToolContext in your tool
Access via `self.context` in your `handle()` method:
```python
def handle(self, args: YourToolArgs) -> YourToolOutput:
# Access ToolContext
context = self.context
# Use context methods
session = context.get_session(args.session_id)
errors = context.get_notebook_errors(args.session_id)
```
#### When to add to ToolContext vs a helper method
**Add to ToolContext when:**
- The functionality will be used by **multiple tools**
- It accesses core marimo state (sessions, cells, errors)
- It provides a common pattern that should be consistent across tools
**Use helper methods when:**
- The logic is specific to **your tool only**
- It's a one-off data transformation or validation
- It doesn't need to access marimo state beyond what you already have
**Example:**
```python
class YourTool(ToolBase[YourToolArgs, YourToolOutput]):
def handle(self, args: YourToolArgs) -> YourToolOutput:
# Use ToolContext for common operations
session = self.context.get_session(args.session_id)
errors = self.context.get_notebook_errors(args.session_id)
# Use helper methods for tool-specific logic
filtered_data = self._filter_by_criteria(errors, args.criteria)
return YourToolOutput(data=filtered_data)
def _filter_by_criteria(self, errors: list, criteria: str) -> list:
"""Tool-specific logic as a helper method."""
return [e for e in errors if criteria in e.message]
```
#### Available ToolContext Methods
For the current and complete list of available methods, see `marimo/_ai/_tools/base.py` in the `ToolContext` class. Common methods include:
- `get_session(session_id)` - Get a notebook session
- `get_notebook_errors(session_id, include_stderr)` - Get all errors in a notebook
- `get_cell_errors(session_id, cell_id)` - Get errors for a specific cell
- `get_active_sessions_internal()` - Get list of active notebook sessions
### 5. Understanding ToolGuidelines
`ToolGuidelines` help AI assistants understand when and how to use your tool. Customize based on your tool's specific use case.
**Fields:**
- **`when_to_use`**: List specific scenarios where your tool is appropriate
- Example: `"When the user needs to inspect cell outputs"`
- **`avoid_if`**: List scenarios where your tool should NOT be used
- Example: `"When the session hasn't been started yet"`
- **`prerequisites`**: Required state or information before using the tool
- Example: `"Valid session ID from an active notebook"` (only if accessing notebook data)
- **`side_effects`**: Any state changes your tool makes
- Example: `"Modifies notebook cells"`, `"Triggers cell re-execution"`
- **`additional_info`**: Additional context or warnings (single string)
- Example: `"This tool provides static analysis only"`
**⚠️ Warning:** Too many guidelines can confuse the AI agent. Less is more—only add guidelines when you clearly understand the use cases. If you're unsure, keep it minimal
### 6. Error Handling
#### When to Use Try/Except
**Only use try/except when you need to catch a specific error and provide tailored guidance to the AI agent.**
-**Use try/except**: For expected errors where you want to guide the agent (e.g., "Use get_lightweight_cell_map to find valid cell IDs")
-**Don't use try/except**: For unexpected errors—they're automatically wrapped in `ToolExecutionError` and surfaced to the agent
#### Using ToolExecutionError
Use `ToolExecutionError` for expected failures:
```python
from marimo._ai._tools.utils.exceptions import ToolExecutionError
# Raise structured errors
raise ToolExecutionError(
"Clear description of what went wrong",
code="ERROR_CODE", # Machine-readable code
is_retryable=True, # Can the user retry?
suggested_fix="How to fix the issue", # User-friendly guidance
meta={"session_id": session_id}, # Additional context
)
```
**Common Error Codes:**
- `SESSION_NOT_FOUND`: Session ID doesn't exist
- `CELL_NOT_FOUND`: Cell ID doesn't exist
- `BAD_ARGUMENTS`: Invalid arguments passed
- `OPERATION_FAILED`: Generic operation failure
- `UNEXPECTED_ERROR`: Uncaught exception (handled automatically)
**Error Handling Best Practices:**
```python
def handle(self, args: YourToolArgs) -> YourToolOutput:
# ToolContext methods automatically raise ToolExecutionError if session not found
session = self.context.get_session(args.session_id)
# Validate inputs - raise ToolExecutionError directly for validation errors
if args.count < 0:
raise ToolExecutionError(
"Count must be non-negative",
code="INVALID_COUNT",
is_retryable=False,
suggested_fix="Provide a count >= 0",
)
# Only use try/except for specific expected errors where you want to guide the agent
try:
result = self._operation_that_might_fail()
except ValueError as e:
# Caught specific error - provide tailored guidance
raise ToolExecutionError(
f"Invalid cell ID: {e}",
code="INVALID_CELL_ID",
is_retryable=False,
suggested_fix="Use get_lightweight_cell_map to find valid cell IDs",
)
# Don't wrap everything in try/except - unexpected errors are handled automatically
return YourToolOutput(data=result)
```
### 7. Register the Tool
Add your tool to the registry in `marimo/_ai/_tools/tools_registry.py`:
```python
from marimo._ai._tools.tools.your_tool import YourTool
SUPPORTED_BACKEND_AND_MCP_TOOLS: list[type[ToolBase[Any, Any]]] = [
GetMarimoRules,
GetActiveNotebooks,
# ... existing tools ...
YourTool, # Add your tool here
]
```
**That's it!** Your tool is now automatically registered in both backend and MCP contexts.
### 8. Add Args and Output to msgspec tests
Add your tool's Args and Output classes to the `TOOL_IO_CLASSES` list in `tests/_utils/test_msgspec_basestruct.py`. This ensures type compatibility between our serialization system and pydantic (used by the python mcp sdk).
```python
from marimo._ai._tools.tools.your_tool import (
YourToolArgs,
YourToolOutput,
)
TOOL_IO_CLASSES = [
# ... existing classes ...
YourToolArgs,
YourToolOutput,
]
```
### 9. Create Tests
#### Unit Tests
Create `tests/_ai/tools/tools/test_your_tool.py`:
```python
from __future__ import annotations
from unittest.mock import Mock
import pytest
from marimo._ai._tools.base import ToolContext
from marimo._ai._tools.tools.your_tool import (
YourTool,
YourToolArgs,
)
from marimo._ai._tools.utils.exceptions import ToolExecutionError
from marimo._types.ids import SessionId
@pytest.fixture
def tool() -> YourTool:
"""Create a YourTool instance."""
return YourTool(ToolContext())
@pytest.fixture
def mock_context() -> Mock:
"""Create a mock ToolContext."""
return Mock(spec=ToolContext)
def test_your_tool_basic_case(mock_context: Mock) -> None:
"""Test basic functionality."""
# Setup mock
mock_session = Mock()
mock_context.get_session.return_value = mock_session
tool = YourTool(ToolContext())
tool.context = mock_context
# Execute tool
result = tool.handle(YourToolArgs(session_id=SessionId("test")))
# Assertions
assert result.status == "success"
assert result.data is not None
def test_your_tool_error_handling(mock_context: Mock) -> None:
"""Test error handling."""
# Setup mock to raise error
mock_context.get_session.side_effect = ToolExecutionError(
"Session not found",
code="SESSION_NOT_FOUND",
)
tool = YourTool(ToolContext())
tool.context = mock_context
# Should raise ToolExecutionError
with pytest.raises(ToolExecutionError) as exc_info:
tool.handle(YourToolArgs(session_id=SessionId("invalid")))
assert exc_info.value.code == "SESSION_NOT_FOUND"
# if necessary
def test_your_tool_with_edge_cases(mock_context: Mock) -> None:
"""Test edge cases and boundary conditions."""
# Test your tool with edge cases
pass
```
### 10. Run Tests
Run tests:
```bash
# Run all tool tests
hatch run +py=3.12 test:test tests/_ai/tools
# Run your specific test
hatch run +py=3.12 test:test tests/_ai/tools/tools/test_your_tool.py
# Run with verbose output
hatch run +py=3.12 test:test tests/_ai/tools/tools/test_your_tool.py -v
```
### 11. Update Documentation
Add your tool to the user-facing documentation in `docs/guides/editor_features/tools.md`. Add a row to the appropriate category table:
```markdown
## Available tools
### [Appropriate Category]
| Tool | Description |
|------|-------------|
| **your_tool_name** | Brief description of what the tool does. Takes `param1` and `param2` parameters. Returns description of output. |
```
Choose the appropriate category:
- **Inspection**: Tools for exploring notebook structure and runtime
- **Data**: Tools for accessing variables and database information
- **Debugging**: Tools for finding and fixing issues
- **Reference**: Tools for accessing marimo documentation
## Best Practices
### Type Safety
- **Use dataclasses** for all input/output types
- **Add type hints** for all methods and attributes
- **Use TYPE_CHECKING** for imports only needed for type checking
- **Import from marimo types** (`SessionId`, `CellId_t`, etc.)
- **Keep types in your tool file** unless they're used by multiple tools—only add to `marimo/_ai/_tools/types.py` if shared across many files
### Documentation
- **Write clear docstrings** following the template
- **Document all Args** in the class docstring
- **Describe Returns** in the class docstring
- **Provide ToolGuidelines** to help AI assistants
- **Include examples** in docstrings when helpful
### Output Design
Design helpful outputs:
```python
return YourToolOutput(
data=result,
# Provide actionable next steps
next_steps=[
"Use get_cell_runtime_data to inspect cells",
"Check errors with get_notebook_errors",
],
# Optional user-facing message
message="Found 5 items matching your query",
# Optional metadata
meta={"query_time": 0.5},
)
```
### Helper Methods
- **Prefix private methods with `_`**
- **Keep handle() method focused** on orchestration
- **Extract complex logic** into helper methods
- **Reuse ToolContext methods** instead of duplicating logic
## Common Pitfalls
### ❌ Don't: Duplicate ToolContext Logic
```python
# Bad: Reimplementing context logic
def handle(self, args: Args) -> Output:
session = self.context.get_session(args.session_id)
cell_ops = session.session_view.cell_operations
errors = []
for cell_id, op in cell_ops.items():
if op.output and op.output.channel == CellChannel.MARIMO_ERROR:
errors.append(...) # Duplicating error extraction
```
### ✅ Do: Use ToolContext Methods
```python
# Good: Using context methods
def handle(self, args: Args) -> Output:
errors = self.context.get_notebook_errors(
args.session_id,
include_stderr=True
)
```
### ❌ Don't: Raise Generic Exceptions
```python
# Bad: Using generic exceptions
if not found:
raise ValueError("Not found")
```
### ✅ Do: Raise ToolExecutionError
```python
# Good: Structured error with metadata
if not found:
raise ToolExecutionError(
"Cell not found in session",
code="CELL_NOT_FOUND",
is_retryable=False,
suggested_fix="Use get_lightweight_cell_map to find valid cell IDs",
)
```
### ❌ Don't: Return Unstructured Data
```python
# Bad: Returning raw data
def handle(self, args: Args) -> Output:
return {"data": [...], "count": 5} # type: ignore
```
### ✅ Do: Use Typed Dataclass Output
```python
# Good: Structured output with SuccessResult
def handle(self, args: Args) -> Output:
return YourToolOutput(
data=[...],
count=5,
next_steps=["Review the results"],
)
```
### ❌ Don't: Use TypedDict or Other Type Annotations
```python
# Bad: Using TypedDict for tool input/output
from typing import TypedDict
class YourToolArgs(TypedDict):
session_id: str
count: int
```
### ✅ Do: Use Dataclasses
```python
# Good: Using dataclasses as required
from dataclasses import dataclass
@dataclass
class YourToolArgs:
session_id: SessionId
count: int = 0
```
**Why?** The tool system requires dataclasses for proper serialization, validation, and compatibility with both backend and MCP contexts.
## Advanced Topics
### Async Tools
For operations that need async/await:
```python
class AsyncTool(ToolBase[Args, Output]):
"""Tool with async operations."""
async def handle(self, args: Args) -> Output: # type: ignore[override]
"""Note: Add type: ignore[override] for async handle."""
session = self.context.get_session(args.session_id)
result = await self._async_work(session)
return Output(result=result)
```
### Tools with Side Effects
Generally it's better to avoid side effects in your tool. If it can't be avoided make sure to document side effects in guidelines:
```python
guidelines = ToolGuidelines(
side_effects=[
"Modifies notebook cells",
"Triggers cell re-execution",
],
)
```
### Complex Return Types
Use nested dataclasses for complex outputs:
```python
@dataclass
class CellInfo:
cell_id: str
code: str
@dataclass
class ComplexOutput(SuccessResult):
cells: list[CellInfo] = field(default_factory=list)
summary: dict[str, Any] = field(default_factory=dict)
```
## Review Checklist
Before submitting your tool:
- [ ] Tool class inherits from `ToolBase[ArgsT, OutT]`
- [ ] Input args are dataclasses ending with `Args`
- [ ] Output inherits from `SuccessResult` and ends with `Output`
- [ ] `handle()` method is implemented
- [ ] Tool is registered in `tools_registry.py`
- [ ] Args and Output added to `TOOL_IO_CLASSES` in `tests/_utils/test_msgspec_basestruct.py`
- [ ] Comprehensive docstring with Args/Returns
- [ ] `ToolGuidelines` provided (only if use cases are clear)
- [ ] Error handling uses `ToolExecutionError` for expected failures only
- [ ] Unit tests cover happy path and errors
- [ ] Tests mock `ToolContext` appropriately
- [ ] All tests pass
- [ ] Type hints are complete
- [ ] Documentation updated in `docs/guides/editor_features/tools.md`
## Additional Resources
- **Base Tool Class**: `marimo/_ai/_tools/base.py`
- **Tool Context**: `marimo/_ai/_tools/base.py` (`ToolContext`)
- **Exception Handling**: `marimo/_ai/_tools/utils/exceptions.py`
- **Type Definitions**: `marimo/_ai/_tools/types.py`
- **MCP Server Setup**: `marimo/_mcp/server/main.py`
- **Backend Tool Manager**: `marimo/_server/ai/tools/tool_manager.py`
## Questions?
If you have questions or run into issues:
1. Check existing tools in `marimo/_ai/_tools/tools/` for examples
2. Review tests in `tests/_ai/tools/tools/` for testing patterns
3. Ask in the marimo community channels