* Storage is charged even if you scale to zero due to pre-caching of images across our fleet. * e.g. 5 max replicas with scale to zero and 20 GiB per replica is charged 5*20 = 100 GiB = $8 / mo
We understand sending 1 extra request to a single replica can cause gen AI to respond slower or worst case, cause GPUs to OOM. We use Redis to track every single request while scaling up in real-time.
Min / Max Replicas
Configure apps to control minimum and maximum bound of scale. Scale to zero by setting min replicas to 0. Scale ups do incur higher cold-starts and we do our best to freeze GPU state to reduce cold-starts down to 200ms.
Requests Per Replicas
Precise control of how many requests a single replica can handle. We use Redis to track every single request while scaling up in real-time. Any time your total request volume is higher than your config, we scale up more replicas within seconds to meet your demand.
Scale Down Delay
Define scale down delay in seconds to give you extra time to handle more requests and avoid premature scaling down replicas.
Teams
Invite members and create teams for different projects or environments.
Billing
Real-time billing updates within minutes with aggregated views.
Access Control
Assign roles to members and manage their access levels.
API Keys
Create API keys to programmatically access 8Scale APIs.