Alpha

docs

sign in

Get Started

Introduction

Core Concepts

Vision

Web

Speech to Text

Code Execution

Guardrails

Utilities

Structured Outputs

Function Calling

Reasoning

Streaming

Security

Integrations

OpenAI SDK

Vercel AI SDK

Langchain

Guardrails

Interfaze provides configurable content safety guardrails. It allows you to automatically detect and filter potentially harmful or inappropriate content in both text and images, ensuring your applications maintain appropriate content standards.

The guard system uses a comprehensive set of safety categories to evaluate content and can be customized to match your specific requirements and compliance needs.

Guard Categories

The LLAMA Guard system supports the following safety categories:

S1: Violent Crimes
S2: Non-Violent Crimes
S3: Sex-Related Crimes
S4: Child Sexual Exploitation
S5: Defamation
S6: Specialized Advice
S7: Privacy
S8: Intellectual Property
S9: Indiscriminate Weapons
S10: Hate
S11: Suicide & Self-Harm
S12: Sexual Content
S12_IMAGE: Sexual Content (Image)
S13: Elections
S14: Code Interpreter Abuse

How to Enable Guardrails

To enable content safety guardrails, include the guard configuration in your system prompt using the following format:

...

Example System Prompts

Basic Safety Guardrails:

...

Comprehensive Content Filtering:

...

Image NSFW Detection:

...

Custom Combination:

...

Guard Response

When guardrails are triggered, the system returns a structured response indicating which safety category was violated:

...

Implementation

Text Content Guarding

When guardrails are enabled, the system automatically evaluates all text content against the specified safety categories. If content violates any enabled guard, the request will be blocked with an appropriate error message.

Image NSFW Detection

For image content, use the S12_IMAGE guard to automatically detect and filter NSFW content. The system will analyze uploaded images and block requests containing inappropriate visual content.

NSFW Guardrail Example