Why 'unlimited free' has a catch
Every token costs a provider real GPU time, so any offer that's both unlimited and free is, somewhere, subsidised — usually by rate limits, a queue, a trial clock, or ads. The honest goal isn't a mythical infinite free plan; it's stacking the legitimate free capacity that already exists so you rarely hit a wall.
How close you can actually get
Three mechanisms, combined, cover most real workloads at zero or near-zero cost:
| Mechanism | How 'unlimited' it is | The honest limit |
|---|---|---|
| Free model tier | Free every day | A daily request allowance |
| Your own provider free tiers (BYOK) | As large as the provider gives you | The provider's own quota — at $0 gateway markup |
| Shared key pool | Grows with the community | You must donate spare quota to draw from it |
Bring-your-own-keys is the closest thing to 'unlimited free' that's also sustainable: if a provider hands you a generous free tier, routing it through a gateway adds failover, logs, and unified billing without adding any markup. You get the provider's full free quota, just better organised.
Stack them behind one API
Because it's all one OpenAI-compatible endpoint, you can start on the free tier, add your own free-tier provider keys, and let traffic fall back across them automatically — no code change as you scale your free capacity.