One HTTPS endpoint fordecentralized GPU power
What Relay does
Relay is the routing layer that sits between your application and the OpenGPU network. Instead of managing clusters, scheduling, and GPU provisioning, you call a single HTTPS endpoint. Relay takes care of provider selection, routing, retries, and failover in the background. You keep your existing stack. Relay slots in as a simple API that your backends, agents, and internal tools can call whenever they need GPU power.

Enterprise billing that works like any cloud
Relay includes a full fiat billing system so teams can use decentralized GPUs the same way they use AWS, Google Cloud, or Azure. No tokens, no crypto wallets, no blockchain steps required. Monthly invoices and standard payment methods. Cost visibility and predictable billing. No idle GPU waste you only pay when jobs run. 60%–80% cheaper than centralized clouds due to zero overhead. This makes Relay drop-in compatible with procurement, finance teams, and enterprise workflows.

How Relay is designed
Relay focuses on three things that matter in production environments. A simple interface, smart routing, and reliability by default.

One HTTPS endpoint
Send AI workloads through a single API. Works with any backend or framework.
Global GPU mesh
Relay selects the best providers based on performance, memory, and live health.
Automatic failover
Relay retries or re-routes automatically if a provider slows or fails.
What you can run through Relay
Relay moves real AI workloads across the OpenGPU network, not benchmarks or demos.


LLM inference
Low-latency inference for agents, chatbots, tools, and real products.
Image and video generation
Diffusion and rendering workloads managed automatically across providers.
Batch and offline jobs
Embeddings, indexing, and long-running tasks routed efficiently.
Agents and automation
Relay keeps agent systems running even when providers churn.
How Relay works in practice
Relay continuously evaluates and routes jobs behind the scenes
- You send a request. Relay receives model + inputs + params.
- Relay evaluates the job. It checks memory, type, duration.
- Matched to providers. Relay selects one or more GPU providers.
- Execution + monitoring. Relay tracks progress and timeouts.
- Results returned. You get output + run metadata.

OpenGPU Relay Pricing
Relay exposes the network through a simple HTTPS endpoint with fiat billing. Pricing is already more than 50 percent cheaper than centralized services on average and many workloads land in the 60-80 percent savings band compared to major clouds.

Ready to try Relay
Connect your stack through a single HTTPS endpoint and scale at your own pace.
