8Scale uses P2P GPU compute to keep prices low while also optimizing for best performance and scale up replicas in seconds.
GPU Pricing
Nvidia 4090
24 GiB vRAM
$0.30 / hr $0.00008 / sec
2x CPU • 18 GiB RAM
$0.35 / hr $0.00009 / sec
4x CPU • 24 GiB RAM
$0.45 / hr $0.00012 / sec
6x CPU • 48 GiB RAM
Nvidia 5090
32 GiB vRAM
$0.45 / hr $0.00012 / sec
4x CPU • 32 GiB RAM
$0.55 / hr $0.00015 / sec
6x CPU • 48 GiB RAM
$0.65 / hr $0.00018 / sec
8x CPU • 64 GiB RAM
Storage
$0.08 / GiB / mo
•
100 GiB = $8 / mo
* Storage is charged even if you scale to zero due to pre-caching of images across our fleet. * e.g. 5 max replicas with scale to zero and 20 GiB per replica is charged 5*20 = 100 GiB = $8 / mo
Bandwidth
FREE
Frequently asked questions
What is a replica?
Replica is a single instance of a container. We scale apps by enabling you to define min and max replicas. Min replicas is the least amount of replicas running at any given time. Setting min replicas to 0 allows your app to scale to zero when there is no traffic. Max replicas controls max number of instances that can be run to handle your traffic.
Storage is always charged based on max replicas. There is no scale to zero for storage, this is how we enable very fast scaling across our infra. Max replicas also represent amount of container image copies we scatter across our infra.
What does scale to zero mean?
Scale to zero enables you to save money when there are no requests for your app and we automatically scale down all replicas. While all replicas are scaled down, you are still charged for storage to keep containers image pre-cached but you are not charged for any GPU or CPU compute.
When is compute billed?
GPU compute is billed only when there are actively running replicas. You are not charged when we download container images and prep replicas. You are charged when we start replicas which includes cold-start times incurred while loading models into GPU vRAM and other tasks.
Every minute we decrement your balance but only charge rounded up by the second. If your replica ran for 20.5 seconds within the minute, you are charged for 21 seconds.
How does pre-pay work?
We maintain a credits balance in your billing section. Every time you pre-pay, credits are added to your balance, and there is no expiration for these credits. As you use resources like GPU compute and storage, your balance will go down over time.
What happens if my balance reaches 0?
If your balance reaches $1, we stop all replicas and don't allow you to scale up. You must have a balance over $1 to scale up any apps.
If your balance reaches $0, we set all your apps min and max replicas to 0. This means we remove all instances of your container images from our nodes. You can in future add more credits and scale up your app.