Debates around blockchain scalability solutions often revolve around throughput, finality, transaction fees, and security (part of the blockchain trilemma). But there’s a deeper issue underlying attempts at scaling blockchains: data availability.
Data availability in blockchains is a guarantee that data for newly proposed blocks is accessible to all participants on the network. Data availability is crucial for secure and sustainable operations: if leader nodes (i.e., block producers) advance the blockchain’s state without publishing transaction data, other nodes cannot verify the validity of transactions nor the correctness of state transitions.
Data availability also matters for liveness: withholding transaction data prevents others from interpreting the state of the blockchain (e.g., checking account balances) or interacting with it (e.g., adding new blocks). In such a scenario, users cannot interact with the chain—killing the property of liveness.
Solutions for solving the data availability problem include forcing nodes to publish transaction data on the network when proposing blocks and storing transaction data redundantly across multiple nodes without compromising bandwidth.
The first ensures safety since it eliminates the possibility of hiding malicious transactions in blocks, while the second ensures liveness: data required to reconstruct the chain’s state is guaranteed to be available from multiple peers. But storing blockchain data this way places limits on scalability due to bandwidth. As all nodes must process the same transaction data, the time it takes to process each transaction scales linearly in proportion to the number of blockchain nodes.
Scaling blockchains, therefore, requires managing data availability efficiently without compromising scalability and security. If anything, the optimality of any scaling solution depends largely on its approach to data availability. Data availability committees (DACs) are one solution to the data availability problem being adopted by protocol developers.
Infura is already part of such committees and joined the StarkEx DAC in June 2020 and most recently, Arbitrum Nova DAC this year. Infura is joined by leading organizations from Web2 and Web3 including Offchain Labs, Reddit, Google Cloud, FTX, P2P, and QuickNode in the Arbitrum Nova DAC. The selection of Infura to these DAC demonstrates our long-standing role as a leading and trusted provider of high-availability infrastructure and reliable access to blockchain data.
In this article we’ll deep dive into the dynamics of DACs, why they are important scalability solutions, and what the future of DACs could hold.
What is a data availability committee (DAC)?
A data availability committee (DAC) is a set of nodes charged with storing copies of off-chain data and making it available on request. DACs feature in scaling solutions that increase throughput on a blockchain by processing transactions on a separate layer (i.e., off-chain scaling).
While off-chain scaling solutions are easier to implement, they must confront the data availability problem. Without access to transactions processed off-chain, the main chain cannot guarantee the safety and liveness of the scaling protocol.
Some scaling solutions ("layer 2s") work around this issue by posting transaction data on the base layer, albeit in compressed form to avoid reintroducing scalability issues described earlier. However, this approach reduces gains in scalability since throughput on the L2 chain is limited by the L1 chain's data processing capacity.
A data availability committee (DAC) is an alternative to on-chain data storage. Instead of publishing transaction data on the base layer, block producers send blocks to the DAC for storage off-chain. This reduces data posted on the underlying blockchain—increasing scalability—and decentralizes data storage to improve data availability guarantees.
Scaling solutions approach data availability in various ways. (source)
How data availability committees work
Data availability committees are either trusted or trustless. In a trusted DAC scheme, data availability managers are known beforehand and appointed to their roles. Such DACs comprise entities with real-world reputations, therefore, committee members can be held accountable by the community for their actions.
A trustless DAC is permissionless—nodes can join the protocol and participate in off-chain data storage without needing approval from an authority. Real-world identities of nodes are hidden (pseudonymity), so social accountability cannot be used to enforce honest behavior.
Trustless DACs, however, leverage other mechanisms for ensuring security, namely cryptoeconomic incentives. Here, data availability managers are required to provide a bond as a guarantee of honest behavior, which can be slashed (destroyed) as punishment for malicious behavior.
A trustless DAC is considered the ideal data availability solution in the blockchain community. But decentralized DACs have one fundamental issue that remains unsolved: the Fisherman’s Problem. In data availability literature, The Fisherman’s Problem is used to illustrate issues that appear in interactions between clients requesting data and nodes storing data in a trustless DAC protocol.
Below is a brief description of The Fisherman’s Problem:
Imagine a client requesting data from a node finds out parts of a block are unavailable and alerts other peers on the network. The node, however, releases the data afterwards so that other nodes find that the block's data is available upon inspection. This creates a dilemma: Was the node deliberately withholding data, or was the client raising a false alarm?
Data unavailability is not a uniquely attributable fault. (source)
The Fisherman’s Problem has implications for the security of any DAC protocol under different scenarios:
1. A client (i.e., the fisherman) receives a reward for detecting a node withholding data, even if the latter releases the data immediately to other nodes: This could lead to collusion between nodes and clients to earn money from false slashings (undermining the DAC’s economic security in the process).
2. A client receives no reward for detecting unavailable blocks: This could lead to low-cost denial-of-service (DoS) attacks where nodes are forced to download blocks to check if a data unavailability claim checks out. This is especially the case if clients don’t incur any costs for initiating challenges relating to the availability of a block.
3. A client incurs a cost (or "negative reward") for disputing unavailable blocks: This would discourage clients from detecting data unavailability, giving malicious nodes the freedom to perform data withholding attacks.
The Fisherman’s Problem is one reason creating a secure and efficient decentralized data availability layer is still an open research problem (being worked on by projects like EigenLayr, Celestia, and Polygon Avail). In contrast, there are multiple examples of permissioned DAC protocols already in use today.
How do permissioned and permissionless DACs compare?
A permissioned DAC operates with a fixed set of trusted participants, making it smaller in size and easier to coordinate. This type of DAC is also simpler to design and implement and incurs lower operating costs.
However, using a trusted DAC forces users to trust the honesty of members of the DAC (although the degree of honest assumption can vary). Where the DAC is corrupt, it can perform data withholding attacks on its own or act in collusion with block producers. This leads to the safety and liveness issues as explored in the following hypothetical scenarios:
1. Safety issues: An optimistic rollup sequencer executes a transaction to steal funds and bribes the DAC to ignore requests from users for transaction data. Without access to data to independently reconstruct the L2 state, users cannot create fraud proofs and challenge invalid state updates.
2. Liveness issues: A malicious DAC withholds data for creating Merkle proofs (needed to prove ownership of funds) from users of a validium chain, thereby freezing withdrawals. Users are forced to pay a ransom before withdrawing funds back to the main chain.
Even if members of a DAC are honest, liveness may still be at risk. As permissioned DACs store data with a limited set of nodes, the ability to tolerate faults (e.g., nodes being unable to provide data due to system crashes) is low. Thus, there’s always the possibility of users never getting access to critical state data.
At first glance, permissionless DACs solve most of the problems described above. In a permissionless DAC participants have more incentive to act honestly, as stakes can be slashed if they attempt to subvert the protocol by withholding data. Off-chain data is also replicated across a larger set of nodes, reducing the likelihood of data unavailability even if some nodes are faulty on purpose or by accident.
But the decentralized nature of a permissionless DAC—one of its key benefits—can also be disadvantageous.
With participants free to join and leave as they wish, coordinating communication between nodes incurs significant overhead. This is more so because permissionless DACs aren’t like permissioned DACs operating with a fixed set of approved participants (where parties know how to reach each other). Nodes in a permissionless DAC are often unknown to each other (and untrusted), which complicates individual participation in data storage and retrieval.
Also, permissionless DACs often require complex mechanisms to ensure security since adversarial participants cannot be reliably distinguished from honest parties (as seen in the Fisherman’s Problem). This can significantly increase the costs of using such data availability solutions—a cost that is likely passed down to application users.
Why are data availability committees important for blockchain scalability?
Better security model
The existence of data availability committees makes it possible to create scaling solutions that store data off-chain without heavily reducing security guarantees. For example, validiums and optimistic chains are two off-chain solutions using DACs for managing data availability.
Optimistic chains are similar to optimistic rollups. For instance, an optimistic chain assumes transactions are valid until challenged (via fraud proofs), and has 1-of-N honesty assumptions (i.e., there’s always an honest node available to challenge transactions and advance the rollup’s state). But optimistic chains store data off-chain unlike optimistic rollups that post transaction data on the L1 chain.
Validiums are similar to zero-knowledge rollups. For example, validity proofs are generated for blocks and verified on the base layer, guaranteeing the correctness of off-chain computation. However, block producers on a validium chain store data off-chain instead of publishing it to the main chain as ZK-rollups do.
Storing transaction data off-chain means validiums and optimistic chains do not inherit security from the underlying blockchain. Nevertheless, relying on data availability committees—as most validiums and optimistic chains do today—provides better security guarantees than other scaling solutions (e.g., sidechains and plasma chains).
Reduction in fees
Rollups pay for publishing transaction data to L1, and this cost is typically borne by users of applications running on L2. Storing data with a DAC instead of putting it on the base layer minimizes on-chain footprint and reduces costs for application users.
This is why blockchains built with high-volume applications in mind find data availability committees appealing. As such applications must process a considerable amount of transactions, the costs of publishing CALLDATA would be higher. Off-chain data storage by a DAC solves this problem, allowing users to benefit from lower fees.
A real-world example is Arbitrum Nova, an optimistic chain designed for social and gaming applications. As these applications require sending many transactions, they’d be infeasible to use on Ethereum due to high gas fees.
Arbitrum Nova sequencers send transaction batches to a DAC off-chain and publish an attestation from DAC members (comprising signatures from a quorum) confirming the availability of data. The cost of publishing and verifying DA attestations on L1 is low and fixed, compared to the variable and sometimes expensive cost of posting rollup data to L1.
While rollups have better scalability properties, their throughput is limited by the bandwidth of the base-layer blockchain. As compressed transactions from a rollup must compete with other L1 transactions for blockspace, the block size limit places an effective upper bound on how many transactions a rollup can process per second.
Rollup-like constructions with off-chain data storage such as validiums and optimistic chains aren’t constrained by throughput on L1. Thus, they can afford to increase block sizes and boost throughput.
What are some examples of data availability committees?
Arbitrum Nova DAC
Arbitrum Nova, which we discussed earlier, is an Ethereum sidechain (optimistic chain) built to reduce gas fees and provide faster transactions. Arbitrum Nova is different from Arbitrum One (an optimistic rollup), as sequencers store blocks with a data availability committee instead of publishing it to Ethereum as CALLDATA.
The Arbitrum Nova DAC comprises reputable organizations, including Infura, FTX, Google Cloud, and Reddit. Nova is based on Arbitrum’s AnyTrust technology, which requires a quorum of the DAC to sign a data availability (DA) certificate posted on Ethereum along with the Merkle root of the transaction batch.
If the DAC is unavailable or unwilling to sign the DA attestation, the Nova chain defaults to rollup mode where new transaction batches are accepted by the L1 contract only if the transaction data is available on L1. (This ensures a malicious DAC cannot arbitrarily freeze the chain at will by refusing to cooperate).
Starkware StarkEx DAC
StarkEx is a Validium operated by Starkware (which also operates the StarkNet ZK-rollup). After processing transactions and generating a validity proof, the StarkEx operator sends transaction data to a permissioned DAC and obtains an attestation from committee members promising to make data available. This attestation functions as a proof of data availability and is verified along with the zero-knowledge proof on Ethereum. Note that StarkEx can operate either as Validium, or as Zk-Rollup (like it does for dYdX). And it can also operate as Volition, which lets users decide with data availability method they want for each transaction.
As a validium, StarkEx doesn't face the risk of a malicious sequencer stealing user funds (validity proofs guarantee correctness of state transitions). But users still need access to transaction data when creating Merkle proofs to prove ownership of funds. This is particularly important when the sequencer goes rogue and users must use the escape-hatch mechanism.
Celestia and Polygon Avail
Celestia and Polygon Avail are "modular blockchains" that focus solely on providing data availability to other blockchains (e.g., rollups). Transactions are sent from external block producers, such as rollup sequencers, to Celestia and Polygon Avail full nodes for storage. Light nodes on both chains don’t store blocks, but can reliably verify if blocks are available through data availability sampling.
Celestia and Polygon Avail are essentially permissionless DACs. Anyone can run a full node and store data from other chains, but they must first provide a stake. This stake can be slashed if they act maliciously (e.g., storing unavailable blocks), providing cryptoeconomic guarantees of honesty.
Celestia and Polygon Avail exist as general-purpose data availability layers for multiple blockchains. This makes them different permissioned DACs that are often designed to serve a single protocol.
The future of data availability committees
As we have seen, data availability committees can bridge the security-vs-scalability tradeoff when designing new scaling solutions. But the most exciting future benefit of DACs is that they enable more flexibility in designing new blockchains.
For example, Celestia touts Celestiums—rollups that use Celestia for data availability and Ethereum for dispute arbitration. Instead of publishing CALLDATA to L1, the rollup sends a data availability attestation (the Merkle root of the L2 transaction batch signed by a quorum of Celestia validators) to an L1 bridge contract.
When a new L2 state root is published on L1 the rollup contract confirms that the data is available by verifying the DA attestation stored in the bridge contract. In the event of a disputed transaction, rollup users can request data from Celestia nodes to run fraud proofs on Mainnet. This means Celestium-style rollups can still inherit some security from L1 while further reducing costs and improving scale.
Proposed workflow for rollups using Celestia as a DA layer. (source)
Nevertheless, data availability committees—especially permissionless types—face roadblocks to adoption. Designing cryptoeconomic security mechanisms for DACs is a non-trivial task, and data availability sampling in its current form faces challenges.
How these problems are solved will eventually determine the place of DACs in the race to scale public blockchains and prepare Web3 for mass adoption. Infura is ready to take part in this journey and continue to make providing data availability a core goal of our service.