Whitepaper

OpenGPU Network: A True Decentralized Computing Ecosystem

by OpenGPU Research Team

You can also download this document as a PDF.

Abstract

The OpenGPU Network aims to establish a true decentralized physical infrastructure network (DePIN) scheme and an ecosystem on top of it for efficient distribution of computation-heavy tasks, such as AI training and inference, across a network of global peers.

This whitepaper introduces the incentives, architecture, workflow, and security measures that enable an accessible and scalable ecosystem for decentralized computing. By leveraging blockchain technology, OpenGPU presents an open and collaborative environment where individuals, small enterprises, and large organizations can contribute and utilize computational resources without relying on centralized procedures or entities.

1. Introduction

The rapid growth of AI and machine learning has outpaced the capabilities of traditional centralized infrastructures in terms of accessibility and efficiency. AI applications, especially those involving deep learning, natural language processing, and generative models, require immense computational power.

According to recent industry reports, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud collectively control over 60% of the global cloud infrastructure market, highlighting the concentration of power in the hands of a few large entities (Synergy Research Group, 2024; Statista, 2024). The dominance of these centralized cloud systems has resulted in a market with limited competition and a hindrance for innovation. Individuals or organizations needing GPU resources face significant challenges, such as budget limitations, scalability constraints, vendor lock-in and susceptibility to single points of failure.

1.1. Current Landscape of AI Computation

As computational needs grow, infrastructure must scale accordingly, and the expertise and time required to manage such large systems become a significant barrier for Artificial Intelligence enterprises. Cloud services present significant advantages over maintaining a local infrastructure, particularly in terms of flexibility and process automation. However, there are inherent trade-offs to this deal that may not be immediately apparent.

One major concern when using cloud services is vendor lock-in, where switching providers or migrating workloads become costly and complex due to proprietary services and data formats. According to Flexera's 2020 Report on CIO priorities, 68% of CIOs express worry about vendor lock-in with cloud services (Flexera, 2020).

Building massive data centers to host cloud services is in the best interest of cloud providers economically. Capitalizing on their market dominance and vendor lock-in advantage, they have the power to influence client decisions. They could easily charge more for serving on multiple locations or overrent their software services and computational capacity. This could create central points of failure and effectively increase the price per unit of computation for the client, resulting in high costs when scaling.

Large cloud providers have the power to influence the decisions of AI initiatives, undermining freedom and innovation. By tightly integrating certain software within their platforms, these providers steer the market and limit diversity.

A daring challenge of the emerging era of AI computing is maximizing the utilization of available hardware, both in individual devices and data centers. Despite the acceleration in manufacturing of processing units, especially GPUs, many devices owned by individuals spend the majority of time idle. Similarly, data centers often operate below optimal capacity.

Leading suppliers of GPUs used in AI computations have increasingly focused on selling their most advanced chips to hyper-scale data centers and research institutions. This preferential access effectively sidelines smaller players, creating an uneven playing field in the AI scene (Wall Street Journal, 2024).

1.2. Decentralization Efforts

The current range of decentralized computing solutions uses blockchain technology as a broker for peer-to-peer transactions. In these systems, clients who need computing power connect with hardware providers through a blockchain. The blockchain manages transactions and contracts, allowing clients to find and rent the needed resources directly. Payments are typically made using cryptocurrencies, which provide a secure and automated way to complete the transactions.

We acknowledge that these attempts are well-received and pave the way for further innovations in decentralized computing; nonetheless, they remain unripe. We believe that the true form of Decentralized Physical Infrastructure Network (DePIN) should go beyond and achieve decentralizing the entire end-to-end process. Then the potential of a free, competitive and equitable distribution of computing resources could be unlocked to power a democratized AI revolution.

2. The OpenGPU Network

The OpenGPU Network is an ecosystem layout designed to establish a fully democratized process for executing computational tasks across a network of diverse providers. It operates on a blockchain based, permissionless and trustless model with a low barrier to entry for all peers. Creating a free, competitive market in real-time and presenting an infinite scalability potential, it aims to address each and every problem of the current state of the industry.

Clients of the network propose tasks in predetermined frames specifying the workload, conditions, and payment function. Providers, connected to the network via an accessible suite, evaluate the feasibility and profitability of tasks and submit bids in real-time. Once an agreement is reached, tasks are executed either atomically or collaboratively.

2.1. Preliminary Concepts

The ecosystem comprises three types of actors:

Clients are peers in need of computational resources to run their tasks, such as an AI company deploying a language model to serve its customers.
Providers are entities with computational resources to offer for profit. These could range from individuals with spare GPU capacity to fully operational data centers.
Users subscribe to the services provided by clients, such as the customers of the aforementioned AI company.

The tasks are categorized into two types:

Batch-tasks have quantitative limits or targets intended for a fixed set of providers. Training an AI model with given settings for a set amount of time is a good example of a batch-task.
Stream-tasks are flexible and dynamic. With their terms accepted by a group of providers a priori, stream-tasks are then submitted by the client in real-time and on-demand. A very good example is a large language model of a client operating under a pay-per-response agreement.

For a given task:

Executor is the provider or the set of providers selected to execute the task.
Council is the group of providers assigned to validate the task through the computation consensus.

2.2. Architectural Overview

At the foundation of the OpenGPU Network is a proof-of-stake blockchain serving as a distributed ledger and a decentralized state machine. Originally authored by Vitalik Buterin (2014), Ethereum initially used a proof-of-work consensus mechanism. However, Ethereum transitioned to proof-of-stake, a change referred to as Ethereum 2.0 (Consensys, 2020).

The blockchain component enables actors to submit and trace tasks in a trustless and transparent manner. Providers take on the role of block validators of the blockchain. They ensure secure and decentralized execution of transactions and storage of public information.

Providers execute client tasks and collaboratively form a consensus to verify the honesty and integrity of their responses. This is achieved in the computation layer, implemented as a computation consensus system on top of the proof-of-stake mechanism. Verification records are stored publicly on the blockchain, so that all network peers are able to query them.

Figure 1: Solid arrows indicate providers maintaining the blockchain and computation layer, while dashed arrows represent interaction between entities.

2.3. Task Workflows

A task prescription is denoted as:

T = {D, M, I, f(I'), e(r, r'), H}

Where D is the dataset related to task, M is the task model, I is the set of instructions specifying requirements and quantitative targets of execution, and f(I') is a function to calculate the amount the client is willing to pay given instructions are fulfilled. e(r, r') is a function with a binary codomain to determine whether two responses r and r' are equivalent. H is the set of conditions to halt or finalize the process.

Batch-task workflow:

Client submits a task prescription via the protocol onto the blockchain.
Available providers read the submission event and evaluate the prescription. If the execution is feasible, profitable and there are no better bids placed by another provider, provider p places a bid.
When the halting conditions are met, the protocol announces the executor and the council.
The executor executes the prescribed task while the council validates the work at checkpoints designated by the protocol.
The final response to the batch-task is sent to the client and the process is finalized upon mutual confirmation.

Stream-task response methods:

Optimistic response: The first submitted response is sufficient to mark the task completion. The later responses from the other providers in the council asynchronously validate the first response via consensus.
Weighted response: A weight w is assigned to the subscription. The task is complete when a weighted threshold of responses are submitted by the executors. If there are different responses, the most common response is sent.
Unanimous response: A task is complete only after all the providers in the council submit responses and a consensus is reached.

Figure 2: (a) Optimistic response (b) Weighted response (c) Unanimous response. Solid lines represent validation occurring before the response is sent, while dashed lines represent validation occurring after.

Stream-task workflow:

Client submits a task prescription via the protocol onto the blockchain.
Available providers read the submission event and evaluate the prescription. If the execution is feasible and potentially profitable, provider sets the given model up and subscribes to the stream.
Having reached enough subscribers to form a council, streaming starts with the preferred response method. Stream is interrupted if:
- The number of executors drops below c due to idle or unsubscribed providers.
- Consensus is failed critically.
- The client retracts.

3. Implementation

3.1. The Blockchain

At the core of the OpenGPU Network lies a blockchain infrastructure. Among the available options, Ethereum stands out as the standard-setter and the most widely adopted blockchain for decentralized applications. Its proven technology, modular architecture, and mature ecosystem make it a natural choice.

Ethereum serves as the decentralized ledger and state machine, providing the foundation for task submission, assignment, and verification. By leveraging Ethereum's proof-of-stake consensus mechanism, the network ensures both scalability and energy efficiency.

The native currency of the blockchain, $oGPU, is integral to the network's operation. It is used to pay gas fees for blockchain transactions and serves as the default currency for task payments.

3.2. The Computation Layer

The computation layer is implemented as a system atop the Ethereum stack, serving as an intermediary between the blockchain and task execution workflows. This layer handles task-specific computation requirements, including consensus validation for batch-tasks and stream-tasks.

The integrity and reliability of the task workflows are underpinned by crypto-economic incentives:

Staking: Providers and council members must stake $oGPU to participate in task execution and validation.
Penalties: Malicious actors attempting to submit fraudulent responses face penalties, including slashing of their stakes.
Rewards: Honest execution and validation are awarded proportionally based on the client-supplied payment functions.

3.3. The Protocol Suite

The core protocol serves as the intermediary through which clients and providers interact with the OpenGPU Network. It is implemented as a decentralized application suite, comprising:

A client application, enabling clients to define task prescriptions, specify requirements, and monitor task progress.
A provider application, allowing providers to register their resources, evaluate task prescriptions, and participate in the bidding process.
A system of smart contracts that serve as the backbone of the protocol by automating task submissions, bidding processes, payment distributions, and dispute resolutions.

The governance of the protocol should ideally be managed by a decentralized autonomous organization (DAO) as the ecosystem matures.

4. Domain of AI Agents

Within the OpenGPU Network, AI agents bridge the gap between the intricate workings of the decentralized ecosystem and its human peers. Those autonomous and intelligent digital entities actively assist clients in defining task prescriptions, evaluating feasibility, and optimizing requirements. They can predict task costs based on market conditions, recommend adjustments to maximize value, and ensure security by aligning task configurations with network standards.

For providers, AI agents manage bidding strategies, assess profitability based on real-time market conditions, and computational costs of tasks. They ensure adherence to agreed terms and maintain high levels of reliability.

AI agents could easily play an active role in the overall security of the ecosystem, with their focus on the safety of tasks. Tasks submitted by clients or responses of providers may, intentionally or not, contain harmful content that poses risks to other peers. While crypto-economic security is the main protection against bad actors, AI agents act as vigilant overseers, detecting anomalous activities.

As the OpenGPU Network evolves, AI agents will presumably become integral to its decentralized infrastructure; while being powered by the network itself.

5. Conclusion

The OpenGPU Network is a transformative step towards the true decentralization of computing power at a global scale. It overcedes previous decentralization approaches by achieving a complete workflow integration onto the blockchain infrastructure while maintaining a very low barrier to entry for peers. The realization and adoption of the OpenGPU Network will undoubtedly accelerate the coming AI revolution.

References

Statista, 2024, "Cloud computing - statistics & facts" by Lionel Sujay Vailshery
Synergy Research Group, 2024, "Cloud market growth stays strong in Q2 while Amazon, Google, and Oracle nudge higher"
Flexera, 2020, "The Flexera 2020 CIO Priorities Report"
New York Times, 2019, "Prime Leverage: How Amazon Wields Power in the Technology World"
Uptime Institute, 2024, "Global Data Center Survey Results"
WSJ (Wall Street Journal), 2024, "AI's Future and Nvidia's Fortunes Ride on the Race to Pack More Chips Into One Place"
Vitalik Buterin, 2014, "Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform"
Consensys, 2020, "What is Proof of Stake?" by Everett Muzzy