The AI Guardrails Agent is a pre-execution safety and compliance layer that validates all user prompts BEFORE any AI agents begin execution. This prevents misuse, ensures ethical AI development, and maintains legal compliance.
User Submits Prompt
β
π‘οΈ AI Guardrails Agent (Pre-check)
β
Validation Rules
(Regex patterns + keywords)
β
ββ CRITICAL violation β β REJECT (Stop pipeline)
ββ HIGH violation β β οΈ WARN (Allow with notice)
ββ No violations β β
APPROVE (Continue to pipeline)
β
Pipeline Execution
(Ideation β Architecture β Development...)
Physical Harm:
Psychological Harm:
Privacy:
Discrimination:
Financial Fraud (CRITICAL):
Regulated Content (HIGH):
Copyright (HIGH):
from agents.guardrails_agent import validate_prompt
# Validate a prompt
is_valid, result = validate_prompt(
prompt="Build a fitness tracking app",
goal="production",
deployment_target="mobile"
)
if not is_valid:
print(f"β REJECTED: {result['message']}")
print(f"Violations: {result['violations']}")
return {"error": result['message']}
# Continue with pipeline execution
print("β
Prompt approved")
Approved Prompt:
{
"status": "approved",
"message": "Prompt passed all AI guardrails",
"warnings": null,
"stats": {
"total_checked": 42,
"blocked": 3,
"allowed": 39
}
}
Rejected Prompt:
{
"status": "rejected",
"reason": "guardrail_violation",
"message": "β οΈ Your request cannot be processed due to AI safety and compliance guardrails...",
"violations": [
{
"type": "physical_harm",
"category": "Safety",
"severity": "critical",
"description": "Prompt requests content that could cause physical harm"
}
],
"support_message": "If you believe this is an error, please contact support with details."
}
services/gateway/routers/pipeline.py)The guardrails agent runs as the first step in pipeline execution:
@router.post("/execute")
async def execute_pipeline(request: PipelineRequest):
# π‘οΈ AI GUARDRAILS: Validate prompt BEFORE pipeline execution
from agents.guardrails_agent import validate_prompt
is_valid, guardrails_result = validate_prompt(
prompt=request.prompt,
goal=request.goal,
deployment_target=request.deploymentTarget
)
if not is_valid:
# Reject execution
return {
'status': 'rejected',
'error': guardrails_result['message'],
'violations': guardrails_result['violations']
}
# Continue with pipeline...
frontend-vue/src/App.vue)The frontend displays user-friendly error messages for guardrails violations:
const response = await axios.post(API_ENDPOINTS.runPipeline, requestPayload);
// Check if guardrails blocked the request
if (response.data.status === 'rejected' && response.data.reason === 'guardrails_violation') {
const violations = response.data.violations || [];
const categories = violations.map(v => v.category).join(', ');
alert(`π‘οΈ AI GUARDRAILS VIOLATION\n\n${response.data.error}\n\nCategories: ${categories}`);
return;
}
Run the test suite to verify guardrails are working:
cd ai_solutionbuilder
pytest tests/test_guardrails_agent.py -v
Test Coverage:
Track guardrails statistics to monitor misuse attempts:
from agents.guardrails_agent import guardrails_agent
stats = guardrails_agent.get_stats()
print(f"Total validations: {stats['total_validations']}")
print(f"Blocked: {stats['blocked']}")
print(f"Allowed: {stats['allowed']}")
print(f"Block rate: {stats['block_rate']:.2%}")
Edit services/agents/guardrails_agent.py:
def _load_violation_patterns(self) -> Dict[str, List[Dict]]:
return {
# ... existing patterns ...
'new_violation_type': {
'keywords': [
r'\b(bad|harmful|dangerous)\s+(pattern)\b',
],
'description': 'Description of what this violation prevents',
'severity': 'critical', # 'critical', 'high', or 'medium'
'category': 'Safety' # 'Safety', 'Ethics', 'Compliance', 'Reliability'
}
}
"Build an app to hack into Facebook accounts"
β REJECTED: Privacy violation
"Create a weapon detection system for schools"
β REJECTED: Physical harm (weapon keyword)
"Make a platform to spread fake news about elections"
β REJECTED: Misinformation
"Clone Netflix with all their movies"
β REJECTED: Copyright violation
"Build an e-commerce platform for selling handmade jewelry"
β APPROVED: Legitimate business use case
"Create a fitness tracking app with calorie counting"
β APPROVED: Health & wellness (non-diagnostic)
"Build a social platform for photographers"
β APPROVED: Creative community platform
"Make a medical symptom checker with disclaimer: 'Not medical advice, consult a doctor'"
β APPROVED: Proper disclaimers included
This guardrails system helps with compliance but does NOT replace:
Consult legal counsel for comprehensive compliance strategy.
If a legitimate prompt is incorrectly blocked:
Future enhancements:
Version: 1.0.0
Last Updated: December 5, 2025
Status: Production Ready β