Plurence is currently in Public Beta. Features and pricing may change. Not recommended for production workloads. Beta Terms

Core Concepts

Policies & Rate Limits

Policies control what traffic your projects can send — request rates, token budgets, and content rules.

Rate limits

Rate limits cap the number of requests or tokens a project can consume per time window. When a limit is exceeded, the gateway returns 429 Too Many Requests immediately — the request never reaches the upstream provider.

Limit type Window Scope
Requests per minute (RPM) 60 seconds Per project
Tokens per minute (TPM) 60 seconds Per project
Requests per day (RPD) 24 hours (UTC) Per project
Tokens per month Calendar month Per account

Rate limit configuration is available in the project settings. Default limits apply to all new projects.

Quota Coming soon

Hard monthly token budgets per project. When a project exhausts its quota, requests are blocked until the next billing cycle or you increase the limit.

Audit retention

Control how much detail is stored in the audit log per request. Three tiers:

zero_retention No audit record written. Lowest storage cost; no forensics.
metadata All fields except raw request/response bodies. Default.
full Complete record including request and response bodies for replay and debugging.

Content policies Coming soon

Define allow/deny rules for model outputs based on content categories. Denied responses are replaced with a configurable fallback message and logged as policy violations in the audit trail.

Related