Core Concepts
Policies & Rate Limits
Policies control what traffic your projects can send — request rates, token budgets, and content rules.
Rate limits
Rate limits cap the number of requests or tokens a project can consume per time window. When a limit is
exceeded, the gateway returns 429 Too Many Requests immediately —
the request never reaches the upstream provider.
| Limit type | Window | Scope |
|---|---|---|
| Requests per minute (RPM) | 60 seconds | Per project |
| Tokens per minute (TPM) | 60 seconds | Per project |
| Requests per day (RPD) | 24 hours (UTC) | Per project |
| Tokens per month | Calendar month | Per account |
Rate limit configuration is available in the project settings. Default limits apply to all new projects.
Quota Coming soon
Hard monthly token budgets per project. When a project exhausts its quota, requests are blocked until the next billing cycle or you increase the limit.
Audit retention
Control how much detail is stored in the audit log per request. Three tiers:
zero_retention No audit record written. Lowest storage cost; no forensics. metadata All fields except raw request/response bodies. Default. full Complete record including request and response bodies for replay and debugging. Content policies Coming soon
Define allow/deny rules for model outputs based on content categories. Denied responses are replaced with a configurable fallback message and logged as policy violations in the audit trail.