GPU Compute Exchange · Beta

Route AIcomputeat cost.

Flopex is a real-time GPU exchange that routes your inference jobs to the fastest, cheapest provider in milliseconds — across Groq, DeepInfra, Together AI, Featherless, and RunPod.

16K+
Models available
<200ms
Avg. routing latency
5
Live GPU providers
Live routing feed
Quick start
# One line to switch from OpenAI
import requests

response = requests.post(
  "https://api.flopex.ai/v1/inference",
  headers={ "Authorization": "Bearer sk_live_..." },
  json={ "model": "llama-3-70b", "input": prompt }
)
# billing.cost_usd in every response
Groq LPU · 40msDeepInfra · $0.07/1MTogether AI · 154 modelsFeatherless · 15,886 modelsRunPod · GPU burstReal-time pricingAutomatic failover35% cheaper on averageGroq LPU · 40msDeepInfra · $0.07/1MTogether AI · 154 modelsFeatherless · 15,886 modelsRunPod · GPU burstReal-time pricingAutomatic failover35% cheaper on average
How it works

The exchange clears in
milliseconds.

01
Request
You send a job
One API call with your model, prompt, and performance profile. Economy, balanced, or fast.
02
Quote
All providers bid
Flopex pings every eligible provider simultaneously. Each returns a real-time cost and latency quote.
03
Route
Exchange clears
The winning provider is selected by price, latency, reliability, and your profile. No human in the loop.
04
$ Settle
You see the cost
Every response includes exact token counts and cost in USD. Your balance updates in real time.
Live providers

Five GPU networks,
one API.

Groq
Speed tier · LPU
~40ms
Avg. latency
DeepInfra
Economy tier
$0.07
Per 1M tokens
Together AI
Balanced tier
154
Models available
Featherless
Long tail
15,886
Models available
RunPod
GPU burst
H100
On-demand GPUs
Transparent pricing
You pay exactly what we charge. No hidden markups on hidden markups.
Direct provider
llama-3.1-8b$0.20/1M
llama-3.3-70b$0.88/1M
qwen-2.5-7b$0.30/1M
deepseek-r1$3.00/1M
Via Flopex
llama-3.1-8b$0.07/1M -65%
llama-3.3-70b$0.59/1M -33%
qwen-2.5-7b$0.10/1M -67%
deepseek-r1$3.00/1M best