This project demonstrates how to implement different types of guardrails using the OpenAI Agent SDK with Trigger.dev for execution.
openai-agents-guardrails.mp4
It serves as both:
- A practical guide for integrating the OpenAI Agent SDK with Trigger.dev for production AI workflows
- Educational examples of implementing three different types of guardrails for AI safety and control
Guardrails are safety mechanisms that run alongside AI agents to:
- Validate input before processing
- Check output before returning responses
- Monitor streaming content in real-time
- Prevent unwanted or harmful behavior
This project includes three different guardrail implementations:
1. Input Guardrails (input-guardrails.py)
Purpose: Validates user input before the agent processes it.
Example: A math tutor agent that only responds to mathematics-related questions. If you ask about anything else (like "What's the weather?"), the guardrail triggers and returns a refusal message.
How it works:
- Uses an agent-based guardrail to check if input is math-related
- Throws
InputGuardrailTripwireTriggeredexception when non-math topics are detected - Returns a polite refusal instead of processing the request
2. Output Guardrails (output-guardrails.py)
Purpose: Validates the agent's response before returning it to the user.
Example: Ensures that a "Math Assistant" actually provides mathematical content in its responses. If the response doesn't contain sufficient math content, the guardrail triggers.
How it works:
- Agent generates a response first
- Guardrail agent evaluates if the response contains actual mathematical content
- Throws
OutputGuardrailTripwireTriggeredif response lacks math content - Can either retry or return an error message
3. Streaming Guardrails (streaming-guardrails.py)
Purpose: Monitors content as it streams in real-time, allowing early termination.
Example: Checks if streaming responses use language too complex for a 10-year-old. If complex terms are detected while streaming, it immediately stops generation.
How it works:
- Streams response text to stdout in real-time
- Runs guardrail checks every N characters (configurable interval)
- Immediately stops streaming if guardrail triggers
- Provides detailed metrics about where/when the guardrail activated
- Clone the repo and run
npm installto install the dependencies - Create a virtual environment
python -m venv venv - Activate the virtual environment:
- On Mac/Linux:
source venv/bin/activate - On Windows:
venv\Scripts\activate
- On Mac/Linux:
- Install the Python dependencies
pip install -r requirements.txt - Copy the project ref from your Trigger.dev dashboard and add it to the
trigger.config.tsfile - Run the Trigger.dev CLI dev command
- Test the guardrail tasks in the dashboard
- Deploy the tasks to production using the Trigger.dev CLI deploy command
-
Trigger Tasks:
- inputGuardrails.ts - Passes user prompts to Python script and handles InputGuardrailTripwireTriggered exceptions
- outputGuardrails.ts - Runs agent generation and catches OutputGuardrailTripwireTriggered exceptions with detailed error info
- streamingGuardrails.ts - Executes streaming Python script and parses JSON output containing guardrail metrics
-
Python Implementations:
- input-guardrails.py - Agent with @input_guardrail decorator that throws exceptions before main agent runs
- output-guardrails.py - Agent with @output_guardrail decorator that validates generated responses using a separate guardrail agent
- streaming-guardrails.py - Processes ResponseTextDeltaEvent streams with async guardrail checks at configurable intervals
-
Configuration: trigger.config.ts - Uses the Trigger.dev Python extension