Get Started
Core Concepts
Utilities
Integrations
Interfaze provides configurable content safety guardrails. It allows you to automatically detect and filter potentially harmful or inappropriate content in both text and images, ensuring your applications maintain appropriate content standards.
The guard system uses a comprehensive set of safety categories to evaluate content and can be customized to match your specific requirements and compliance needs.
The LLAMA Guard system supports the following safety categories:
To enable content safety guardrails, include the guard configuration in your system prompt using the following format:
Basic Safety Guardrails:
Comprehensive Content Filtering:
Image NSFW Detection:
Custom Combination:
When guardrails are triggered, the system returns a structured response indicating which safety category was violated:
When guardrails are enabled, the system automatically evaluates all text content against the specified safety categories. If content violates any enabled guard, the request will be blocked with an appropriate error message.
For image content, use the S12_IMAGE
guard to automatically detect and filter NSFW content. The system will analyze uploaded images and block requests containing inappropriate visual content.