The process of running a pre-trained AI model to generate predictions or outputs from new input data. Inference is the most common GPU workload type, used for chatbots, image generation, and more.
Benchmark OpenGPU against
any cloud.
Measure inference or training workloads on distributed GPUs
with instant elasticity and real world performance.