Purple8-platform

AI Guardrails Agent - Documentation

Overview

The AI Guardrails Agent is a pre-execution safety and compliance layer that validates all user prompts BEFORE any AI agents begin execution. This prevents misuse, ensures ethical AI development, and maintains legal compliance.

Key Goals

  1. Safety: Prevent physical or psychological harm
  2. Ethics: Uphold fairness, privacy, and human dignity
  3. Compliance: Meet legal and regulatory requirements (GDPR, HIPAA, etc.)
  4. Reliability: Ensure accurate, consistent, and trustworthy results
  5. Brand Alignment: Maintain consistent brand voice and values

Architecture

User Submits Prompt
        ↓
πŸ›‘οΈ AI Guardrails Agent (Pre-check)
        ↓
   Validation Rules
   (Regex patterns + keywords)
        ↓
   β”œβ”€ CRITICAL violation β†’ ❌ REJECT (Stop pipeline)
   β”œβ”€ HIGH violation β†’ ⚠️  WARN (Allow with notice)
   └─ No violations β†’ βœ… APPROVE (Continue to pipeline)
        ↓
   Pipeline Execution
   (Ideation β†’ Architecture β†’ Development...)

Violation Categories

1. Safety Violations (CRITICAL)

Physical Harm:

Psychological Harm:

2. Ethical Violations (CRITICAL)

Privacy:

Discrimination:

3. Compliance Violations

Financial Fraud (CRITICAL):

Regulated Content (HIGH):

Copyright (HIGH):

4. Misinformation (CRITICAL)

5. Child Safety (CRITICAL)

Usage

Python (Backend)

from agents.guardrails_agent import validate_prompt

# Validate a prompt
is_valid, result = validate_prompt(
    prompt="Build a fitness tracking app",
    goal="production",
    deployment_target="mobile"
)

if not is_valid:
    print(f"❌ REJECTED: {result['message']}")
    print(f"Violations: {result['violations']}")
    return {"error": result['message']}

# Continue with pipeline execution
print("βœ… Prompt approved")

Response Format

Approved Prompt:

{
    "status": "approved",
    "message": "Prompt passed all AI guardrails",
    "warnings": null,
    "stats": {
        "total_checked": 42,
        "blocked": 3,
        "allowed": 39
    }
}

Rejected Prompt:

{
    "status": "rejected",
    "reason": "guardrail_violation",
    "message": "⚠️ Your request cannot be processed due to AI safety and compliance guardrails...",
    "violations": [
        {
            "type": "physical_harm",
            "category": "Safety",
            "severity": "critical",
            "description": "Prompt requests content that could cause physical harm"
        }
    ],
    "support_message": "If you believe this is an error, please contact support with details."
}

Integration Points

1. Pipeline Router (services/gateway/routers/pipeline.py)

The guardrails agent runs as the first step in pipeline execution:

@router.post("/execute")
async def execute_pipeline(request: PipelineRequest):
    # πŸ›‘οΈ AI GUARDRAILS: Validate prompt BEFORE pipeline execution
    from agents.guardrails_agent import validate_prompt
    
    is_valid, guardrails_result = validate_prompt(
        prompt=request.prompt,
        goal=request.goal,
        deployment_target=request.deploymentTarget
    )
    
    if not is_valid:
        # Reject execution
        return {
            'status': 'rejected',
            'error': guardrails_result['message'],
            'violations': guardrails_result['violations']
        }
    
    # Continue with pipeline...

2. Frontend (frontend-vue/src/App.vue)

The frontend displays user-friendly error messages for guardrails violations:

const response = await axios.post(API_ENDPOINTS.runPipeline, requestPayload);

// Check if guardrails blocked the request
if (response.data.status === 'rejected' && response.data.reason === 'guardrails_violation') {
    const violations = response.data.violations || [];
    const categories = violations.map(v => v.category).join(', ');
    
    alert(`πŸ›‘οΈ AI GUARDRAILS VIOLATION\n\n${response.data.error}\n\nCategories: ${categories}`);
    return;
}

Testing

Run the test suite to verify guardrails are working:

cd ai_solutionbuilder
pytest tests/test_guardrails_agent.py -v

Test Coverage:

Monitoring

Track guardrails statistics to monitor misuse attempts:

from agents.guardrails_agent import guardrails_agent

stats = guardrails_agent.get_stats()
print(f"Total validations: {stats['total_validations']}")
print(f"Blocked: {stats['blocked']}")
print(f"Allowed: {stats['allowed']}")
print(f"Block rate: {stats['block_rate']:.2%}")

Customization

Adding New Violation Patterns

Edit services/agents/guardrails_agent.py:

def _load_violation_patterns(self) -> Dict[str, List[Dict]]:
    return {
        # ... existing patterns ...
        
        'new_violation_type': {
            'keywords': [
                r'\b(bad|harmful|dangerous)\s+(pattern)\b',
            ],
            'description': 'Description of what this violation prevents',
            'severity': 'critical',  # 'critical', 'high', or 'medium'
            'category': 'Safety'  # 'Safety', 'Ethics', 'Compliance', 'Reliability'
        }
    }

Adjusting Severity

Examples

❌ Blocked Requests

"Build an app to hack into Facebook accounts"
β†’ REJECTED: Privacy violation

"Create a weapon detection system for schools"
β†’ REJECTED: Physical harm (weapon keyword)

"Make a platform to spread fake news about elections"
β†’ REJECTED: Misinformation

"Clone Netflix with all their movies"
β†’ REJECTED: Copyright violation

βœ… Approved Requests

"Build an e-commerce platform for selling handmade jewelry"
β†’ APPROVED: Legitimate business use case

"Create a fitness tracking app with calorie counting"
β†’ APPROVED: Health & wellness (non-diagnostic)

"Build a social platform for photographers"
β†’ APPROVED: Creative community platform

"Make a medical symptom checker with disclaimer: 'Not medical advice, consult a doctor'"
β†’ APPROVED: Proper disclaimers included

Best Practices

  1. Run guardrails FIRST - Before any AI processing
  2. Log all rejections - Monitor for abuse patterns
  3. Clear user feedback - Explain WHY a prompt was rejected
  4. Regular updates - Add new patterns as threats evolve
  5. Test thoroughly - Ensure legitimate uses aren’t blocked
  6. Monitor false positives - Adjust patterns if too restrictive

This guardrails system helps with compliance but does NOT replace:

Consult legal counsel for comprehensive compliance strategy.

Support

If a legitimate prompt is incorrectly blocked:

  1. Check violation details in response
  2. Revise prompt to avoid trigger keywords
  3. Add proper disclaimers (medical, legal, financial)
  4. Contact support with details if still blocked

Roadmap

Future enhancements:


Version: 1.0.0
Last Updated: December 5, 2025
Status: Production Ready βœ