Core Concepts

Policies & Rate Limits

Policies control what traffic your projects can send — request rates, token budgets, and content rules.

Rate limits

Rate limits cap the number of requests or tokens a project can consume per time window. When a limit is exceeded, the gateway returns 429 Too Many Requests immediately — the request never reaches the upstream provider.

Limit type	Window	Scope
Requests per minute (RPM)	60 seconds	Per project
Tokens per minute (TPM)	60 seconds	Per project
Requests per day (RPD)	24 hours (UTC)	Per project
Tokens per month	Calendar month	Per account

Rate limit configuration is available in the project settings. Default limits apply to all new projects.

Quota Coming soon

Hard monthly token budgets per project. When a project exhausts its quota, requests are blocked until the next billing cycle or you increase the limit.

Audit retention

Control how much detail is stored in the audit log per request. Three tiers:

zero_retention No audit record written. Lowest storage cost; no forensics.

metadata All fields except raw request/response bodies. Default.

full Complete record including request and response bodies for replay and debugging.

Content policies Coming soon

Define allow/deny rules for model outputs based on content categories. Denied responses are replaced with a configurable fallback message and logged as policy violations in the audit trail.

Policies & Rate Limits

Rate limits

Quota Coming soon

Audit retention

Content policies Coming soon

Related