Hot DA on Cold Storage: Building cost-effective DA on FileCoin
Thanks to Yuki Yuminaga, Jonathan Victor, Matthew Frehlich, Connor Ohara, and Kobby Chen for great discussions and feedback. Thanks to Terry and Wei Dai for their work on Rollups and DA Execution.
Data availability (DA) is a core technology in the scaling of Ethereum, allowing a node to efficiently verify that data is available to the network without having to host the data in question. This is essential for the efficient building of rollups and other forms of vertical scaling, allowing execution nodes to ensure that transaction data is available during the settlement period. This is also crucial for sharding and other forms of horizontal scaling, a planned future update to the Ethereum network, as nodes will need to prove that transaction data (or blobs) stored in network shards are indeed available to the network.
Several DA solutions have been discussed and released recently (e.g., Celestia, EigenDA, Avail), all with the intent of providing performant and secure infrastructure for applications to post DA.
The advantage of an external DA solution over an L1 such as Ethereum is that it provides an inexpensive and performant vehicle for on-chain data. DA solutions often consist of their own public chains built to enable cheap and permissionless storage. Even with modifications, the fact remains that hosting data natively from a blockchain is extremely inefficient.
Thus, we find that it is intuitive to explore a storage-optimized solution such as FileCoin for the basis of a DA layer. FileCoin uses its blockchain to coordinate storage deals between clients and storage providers but allows data to be stored off-chain.
In this post, we investigate the viability of a DA solution built on top of a Distributed Storage Network (DSN). We consider FileCoin specifically, as it is the most adopted DSN to date. We outline the opportunities that such a solution would offer, and the challenges that need to be overcome to build it.
A DA layer provides the following to services relying on it:
- Client Safety: No node can be convinced that unavailable data is available.
- Global Safety: The un/availability of data is agreed upon by all except at most a small minority of nodes.
- Efficient data retrievability.
All of this needs to be done efficiently to enable scaling. A DA layer provides higher performance at a lower cost across the three points above. For example, any node can request a full copy of the data to prove custody, but this is inefficient. By having a system that provides all three of these, we achieve a DA layer that provides the security required for L2s to coordinate with an L1, along with stronger lower bounds in the presence of a malicious majority.
Custody of Data
Data posted to a DA solution has a useful lifetime: long enough to settle disputes or verify a state transition. Transaction data needs to be available only long enough to verify a correct state transition or to give validators enough opportunity to construct fraud proofs. As of writing, Ethereum calldata is the most common solution used by projects (rollups) requiring data availability.
Efficient Verification of Data
Data Availability Sampling (DAS) is the standard method of answering the question of DA. It comes with additional security benefits, strengthening network actors’ ability to verify state information from their peers. However, it relies on nodes to perform sampling: DAS requests must be answered to ensure mined transactions won’t be rejected, but there is no positive or negative incentive for a node to request samples. From the perspective of nodes that request samples, there is no negative penalty for not performing DAS. As an example, Celestia provides the first and only light client implementation to perform DAS, delivering stronger security assumptions to users and reducing the cost of data verification.
Efficient Access
A DA needs to provide efficient access to data to the projects using it. A slow DA may become the bottleneck for the services relying on it, causing inefficiencies at best and system failures at worst.
Decentralized Storage Network
A Decentralized Storage Network (DSN, as formalized in the FileCoin Whitepaper¹) is a permissionless network of storage providers that offer storage services for users of the network. Informally, it allows independent storage providers to coordinate storage deals with clients that need storage services and provides cheap and resilient data storage to clients seeking storage services at a low price. This is coordinated through a blockchain that records storage deals and enables the execution of smart contracts.
A DSN scheme is a tuple of three protocols: Put, Get, and Manage. This tuple comes with properties such as fault tolerance guarantees and participation incentives.
Put(data) → key
Clients execute Put to store data under a unique key. This is achieved by specifying the duration for which data will be stored on the network, the number of replicas of the data that are to be stored for redundancy, and a negotiated price with storage providers.
Get(key) → data
Clients execute Get to retrieve data that is being stored under a key.
Manage()
The Manage protocol is called by network participants to coordinate the storage space and services made available by providers and repair faults. In the case of FileCoin, this is managed via a blockchain. This blockchain records data deals being made between clients and data providers and proofs of correctly stored data to ensure that data deals are being upheld. Correctly stored data is proved via the posting of proofs generated by data providers in response to challenges from the network. A storage fault occurs when a storage provider fails to generate a Proof-of-Replication or Proof-of-Spacetime promptly when requested by the Manage protocol, which results in the slashing of the storage provider’s stake. Deals can self-heal in the case of a storage fault if more than one provider is hosting a copy of the data on the network by finding a new storage provider to honor the storage deal.
DSN Opportunities
The work done thus far in DA projects has been to transform a blockchain into a platform for hot storage. Since a DSN is storage-optimized, rather than transforming a blockchain into a storage platform, we can simply transform a storage platform into one that provides data availability. The collateral of storage providers in the form of native FIL token can provide crypto-economic security that guarantees data is stored. Finally, the programmability of storage deals can provide flexibility around the terms of data availability.
The most compelling motivation to transform the capabilities of a DSN to solve DA is the cost reduction in the data storage under the DA solution. As we discuss below, the cost of storing data on FileCoin is significantly cheaper than storing data on Ethereum. Given current Ether/USD prices, it costs over 3 million USD to write 1 GB of calldata to Ethereum, only to be pruned after 21 days. This calldata expense can contribute to over half of the transaction cost of an Ethereum-based rollup. However, 1 GB of storage on FileCoin costs less than .0002 USD per month. Securing DA at this or any similar price would bring transaction costs down for users and contribute to the performance and scalability of Web3.
Economic Security
In FileCoin, collateral is required to make storage space available. This collateral is slashed when a provider fails to honor its deals or uphold network guarantees. A storage provider that fails to provide services faces losing both its posted collateral and any profit that would have been earned from providing storage.
Incentive Alignment
Many of FileCoin’s protocol incentives align with the goals of DA. FileCoin provides disincentives for malicious or lazy behavior: storage providers must actively provide proofs of storage during consensus in the form of Proof-of-Replicas and Proof-of-Spacetime, continuously proving that the storage exists without honest majority assumptions. Failure of a storage provider to provide proof results in stake slashing, and removal from consensus, among other penalties. Current DA solutions lack incentive for nodes to perform DAS, relying on ad-hoc altruistic behavior for proof of DA.
Programmability
The ability to customize data deals also makes a DSN an attractive platform for DA. Data deals can have varying durations, allowing users of a DSN-based DA to pay for only the DA that they need. Fault tolerance can also be tuned by setting the number of copies that are to be stored throughout the network. Further customization is supported via smart contracts on FileCoin (called Actors), which are executed on the FEVM. This leads to FileCoin’s growing ecosystem of DApps, from compute-over-storage solutions such as Bacalhau to DeFi and liquid staking solutions such as Glif. Retriev makes use of FileCoin Actors to provide incentive-aligned retrieval with permissioned referees. FileCoin’s programmability can be used to tailor DA requirements needed for different solutions, so that platforms that rely on DA are not paying for more DA than they need.
Challenges to a DSN-Based DA Architecture
In our investigation, we have identified significant challenges that need to be overcome before a DA service can be built on a DSN. As we now talk about the feasibility of implementation, we will use FileCoin as our main focus of the discussion.
Proof Latency
The cryptographic proofs that ensure the integrity of deals and stored data on FileCoin take time to prove. When data is committed to the network, it is partitioned into 32 gigabyte sectors and “sealed.” The sealing of data is the foundation of both the Proof-of-Replication (PoRep), which proves that a storage provider is storing one or more unique copies of the data, and Proof-of-Spacetime (PoST), which proves that a storage provider stored a unique copy continuously throughout the duration of the storage deal. Sealing has to be computationally expensive to ensure that storage providers aren’t sealing data on demand to undermine the required PoReP. When the protocol presents the periodic challenge to a storage provider to provide proof of unique and continuous storage, sealing has to safely take longer than the response window so that a storage provider can’t falsify proofs or replicas on the fly. For this reason, it can take providers approximately three hours to seal a sector of data.
Storage Threshold
Because of the computational expense of the sealing operation, the sector size of the data being sealed has to be economically worthwhile. The price of storage has to justify the cost of sealing to the storage provider, and likewise, the resulting cost of data being stored has to be low enough at scale (in this case, for an approximately 32GB chunk) for a client to want to store data on FileCoin. Although smaller sectors could be sealed, this would drive up the price of storage to compensate storage providers. To get around this, data aggregators collect smaller pieces of data from users to be committed to FileCoin as a chunk close to 32 GB. Data aggregators commit to user’s data via a Proof-of-Data-Segment-Inclusion (PoDSI), which guarantees the inclusion of a user’s data in a sector, and a sub-piece CID (pCID), which the user will be able to use to retrieve the data from the network.
Consensus Constraints
FileCoin’s consensus mechanism, Expected Consensus, has a block time of 30 seconds and finality within hours, which may improve in the near future (see FIP-0086 for fast finality on FileCoin). This is generally too slow to support the transaction throughput needed for a Layer 2 relying on DA for transaction data. FileCoin’s block time is lower-bounded by storage provider hardware; the lower the block time, the more difficult it is for storage providers to generate and provide proofs of storage, and the more storage providers will be falsely penalized for missing the proving window for the proper storage of data. To overcome this, InterPlanetary Consensus (IPC) subnets can be leveraged to take advantage of faster consensus times. IPC uses Tendermint-like consensus and DRAND for randomness: in the case that DRAND is the bottleneck, we would be able to achieve a 3-second block-time with an IPC subnet. In the case of a Tendermint bottleneck, PoCs such as Narwhal have achieved blocktimes in the hundreds of milliseconds.
Retrieval Speed
The final barrier-to-build is retrieval. From the constraints above, we can deduce that FileCoin is suitable for cold or lukewarm storage. However, the DA data is hot and needs to support performant applications. Incentive-aligned retrieval is difficult in FileCoin; data needs to be unsealed before it is served to clients, which adds latency. Currently, rapid retrieval is done via SLAs or the storage of un-sealed data alongside sealed sectors, neither of which can be relied on in the architecture of a secure and permissionless application on FileCoin. Especially with Retriev proving that retrieval can be guaranteed via the FVM, incentive-aligned rapid retrieval on FileCoin remains an area to be further explored.
Cost Analysis
In this section, we consider the cost that comes from these design considerations. We show the cost of storing 32GB as Ethereum calldata, Celestia blobdata, EigenDA blobdata, and as a sector on FileCoin using near-current market prices.
The analysis highlights the price of Ethereum calldata: 100 million USD for 32 GB of data. This price showcases the cost of security behind Ethereum’s consensus, and is subject to the volatility of Ether and gas prices. The Dencun upgrade, which introduced Proto-Danksharding (EIP-4844), introduced blob transactions with a target of 3 blobs per block of approximately 125 KB each, and variable gas blob pricing to maintain the target amount of blobs per block. This upgrade cut the cost of Ethereum DA by ⅕: 20 million USD for 32 GB of blob data.
Celestia and EigenDA provide significant improvements: 8,000 and 26,000 USD for 32 GB of data, respectively. Both are subject to the volatility of market prices and reflect to some extent the cost of consensus securing their data: Celestia with its native TIA token, and EigenDA with Ether.
In all of the above cases, the data stored is not permanent. Ethereum calldata is stored for 3 weeks, with blobs stored for 18 days. EigenDA stores blobs for a default of 14 days. As of the current Celestia implementation, blob data is stored indefinitely by archival nodes but only sampled by light nodes for a maximum of 30 days.
The final two tables are direct comparisons between FileCoin and current DA solutions. Cost equivalence first lists the cost of a single byte of data on the given platform. The amount of FileCoin bytes that can be stored for the same amount of time for the same cost is then shown.
This shows that FileCoin is orders of magnitude cheaper than current DA solutions, costing fractions of a cent to store the same amount of data for the same amount of time. Unlike Ethereum nodes and that of other DA solutions, FileCoin’s nodes are optimized to provide storage services, and its proof system allows nodes to prove storage, rather than replicate storage across every node in the network. Without accounting for the economics of storage providers (such as the energy cost to seal data), it shows that the basic overhead of the storage process on FileCoin is negligible. This shows a market opportunity in the millions of USD per gigabyte compared to Ethereum for a system that can provide secure and performant DA services on FileCoin.
Throughput
Below, we consider the capacity of DA solutions and the demand that is generated by major layer 2 rollups.
Because FileCoin’s blockchain is organized in tipsets with multiple blocks at every block-height, the number of deals that can be done is not restricted by consensus or block size. The strict data constraint of FileCoin is that of its network-wide storage capacity, not what is allowed via consensus.
For daily DA demand, we pull data from Rollups DA and Execution from Terry Chung and Wei Dai, which includes a daily average across 30 days and a singular sampled day. This allows us to consider average demand while not overlooking aberrations from the average (for example, Optimism’s demand on 8/15/2023 of approximately 261,000,000 bytes was over 4x its 30 day average of 64,000,000 bytes).
From this selection, we see that despite the opportunity of lower DA cost, we would need a dramatic increase in DA demand to make efficient use of the 32 GB sector size of FileCoin. Although sealing 32 GB sectors with less than 32 GB of data would be a waste of resources, we could do so while still reaping a cost advantage.
Architecture
In this section, we consider the technical architecture that can be achieved if we were to build this today. We will consider this architecture in the context of arbitrary L2 applications and an L1 chain that the L2 is serving. Since this solution is an external DA solution, like that of Celestia and EigenDA, we do not consider FileCoin as example L1.
Components
Even at a high-level, a DA on FileCoin will make use of many different features of the FileCoin ecosystem.
Transactions: Downstream users make transactions on a platform that requires DA. This could be an L2.
Platforms Using DA: These are the platforms that use DA as a service. This could be an L2 which posts transaction data to the FileCoin DA and commitments to an L1, such as Ethereum.
Layer 1: This is any L1 that contains commitments pointing to data on the DA solution. This could be Ethereum, supporting an L2 that leverages the FileCoin DA solution.
Aggregator: The frontend of FileCoin-based DA solution is an aggregator, a centralized component which receives transaction data from L2’s and other DA clients and aggregates them into 32 GB sectors suitable for sealing. Although a simple proof-of-concept would include a centralized aggregator, platforms using the DA solution could also run their own aggregator,for example as a sidecar to an L2 sequencer. The centralization of the aggregator can be seen as similar to that of an L2 sequencer or EigenDA’s disperser. Once the aggregator has compiled a payload near 32GB, it makes a storage deal with storage providers to store the data. Clients are given a guarantee that their data will be included in the sector in the form of a PoDSI (Proof of Data Segment Inclusion), and a pCID to identify their data once it is on the network. This pCID is what would be included in the state commitments on the L1 to reference supporting transaction data.
Verifiers: Verifiers request the data from the storage providers to ensure the integrity of state commitments and build fraud proofs, which are committed to the L1 in the case of provable fraud.
Storage Deal: Once the aggregator has compiled a payload near 32GB, the aggregator makes a storage deal with storage providers to store the data.
Posting blobs (Put): To initiate a put, a DA client will submit their blob containing transaction data to the aggregator. This can be done in an off-chain manner, or an on-chain manner via an on-chain aggregator oracle. To confirm receipt of the blob, the aggregator returns a PoDSI to the client to prove that their blob is included in the aggregated sector that will be committed to the subnet. A pCID (sub-piece Content IDentifier) is also returned. This is what the client and any other interested party will use to reference the blob once it is being served on FileCoin.
Data deals would appear on-chain within minutes of the deal being made. The largest barrier to latency is the sealing time, which can take 3 hours. This means that although the deal has been made, and the client can be confident that the data will appear in the network, the data cannot be guaranteed to be queryable until the sealing process is complete. The Lotus client has a fast-retrieval feature in which an unsealed copy of the data is stored alongside the sealed copy that may be able to be served as soon as the unsealed data is transferred to the data storage provider, as long as a retrieval deal does not depend on the proof of sealed data to appear on the network. However, this functionality is at the discretion of the data provider, and is not cryptographically guaranteed as part of the protocol. If a fast-retrieval guarantee is to be provided, there would need to be changes to consensus and dis/incentive mechanisms in place to enforce it.
Retrieving blobs (Get): Retrieval is similar to a put operation. A retrieval deal needs to be made, which will appear on-chain within minutes. Retrieval latency will depend on the terms of the deal and whether an unsealed copy of data is stored for fast retrieval. In the fast retrieval case, the latency will depend on network conditions. Without fast retrieval, data will need to be unsealed before being served to the client, which takes the same amount of time as sealing, on the order of 3 hours. Thus without optimizations we have a maximum round-trip of 6 hours, major improvement in data serving would need to be made before this becomes a viable system for DA or fraud proofs.
Proof of DA: proof of DA can be considered in two steps; via the PoDSI that is given when the data is committed to the aggregator while the deal is being made and then the continued commitment of PoRep and PoST that storage providers provide via FileCoin’s consensus mechanism. As discussed above, the PoRep and PoST give scheduled and provable guarantees of data custody and persistence.
This solution will make heavy use of bridging, as any client that relies on DA (regardless of the construction of proofs) will need to be able to interact with FileCoin. In the case of the pCID included in the state transition that is posted to the L1, a verifier can make an initial check to make sure that a bogus pCID wasn’t committed. There are several ways that this could be done, for example, via an oracle that posts FileCoin data on the L1 or via verifiers that verifies the existence of a data deal or sector that corresponds to the pCID. Likewise, the verification of validity or fraud proofs that get posted to the L1 may need to make use of a bridge to be convinced of a proof. Current available bridges are Axelar and Celer.
Security Analysis
FileCoin’s integrity is enforced through the slashing of collateral. Collateral can be slashed in two cases: storage faults or consensus faults. A storage fault corresponds to a storage provider not being able to provide proof of stored data (either PoRep or PoST), which would correlate to a lack of data availability in our model. A consensus fault corresponds to malicious action in consensus, the protocol that manages the transaction ledger from which the FEVM is abstracted.
- A Sector Fault refers to the penalty incurred from the failure to post proof of continuous storage. Storage providers are allowed a one-day grace period during which a penalty is not incurred for faulty storage. After 42 days from a sector becoming faulty, the sector is terminated. Incurred fees are burnt.
BR(t) = ProjectedRewardFraction(t) * SectorQualityAdjustedPower
- A Sector Termination occurs after a sector has been faulty for 42 days or a storage provider purposefully terminates a deal. Termination fees are equivalent to the maximum amount that a sector has earned up to termination, with an upper bound of 90 days’ worth of earning. Unpaid deal fees are returned to the client. Incurred fees are burnt.
max(SP(t), BR(StartEpoch, 20d) + BR(StartEpoch, 1d) * terminationRewardFactor * min(SectorAgeInDays, 140))
- Storage Market Actor Slashing occurs in the event of a terminated deal. This is the slashing of the collateral that the storage provider puts up behind the deal.
The security provided by FileCoin is very different from that of other blockchains. Whereas blockchain data is typically secured via consensus, FileCoin’s consensus only secures the transaction ledger, not the data referred to by the transaction. The data that is stored on FileCoin has only enough security to incentive-align storage providers to provide storage. This means that the data stored on FileCoin is secured via fault penalties and business incentives such as reputation with clients. In other words, a data fault on a blockchain is equivalent to a breach of consensus, and breaks the safety of the chain or its notion of the validity of transactions. FileCoin is designed to be fault tolerant when it comes to data storage, and therefore only uses its consensus to secure its dealbook and deal-related activities. The cost of a storage miner not fulfilling its data deal has a maximum of 90 days worth of storage reward in penalties, and the loss of the collateral put up by the miner to secure the deal.
Therefore, the cost of a data withholding attack being launched from FileCoin providers simply the opportunity cost a retrieval deal. Data retrieval on FileCoin relies on the storage miner being incentivized by a fee paid for by the client. However, there is no negative impact to a miner for not responding to a data retrieval request. To mitigate the risk of a single storage miner ignoring or refusing data retrieval deals, data on FileCoin can be stored by multiple miners.
Since the economic security behind the data being stored on FileCoin is considerably less than that of blockchain based solutions, the prevention of data manipulation must also be considered. Data manipulation is protected via FileCoin’s proof system. Data is referred to via CIDs, through which data corruption is immediately detectable. A provider therefore cannot serve corrupt data, as it is easy to verify whether the fetched data matches the requested CID. Data providers cannot store corrupted data in the place of uncorrupted data. Upon the receipt of client data, providers must provide proof of a correctly sealed data sector to initiate the data deal (check this). Therefore, a storage deal cannot be started with corrupt data. During the lifetime of the storage deal, PoSTs are provided to prove custody (recall that this proves both custody of the sealed data sector and custody since the last PoST). Since the PoST is reliant on the sealed sector at the time of proof generation, a corrupt sector would result in a bogus PoST, resulting in a sector failure. Therefore, a storage provider can neither store nor serve corrupted data, cannot claim reward for services provided for uncorrupted data, and cannot avoid being penalized for tampering with a client’s data.
Security can be strengthened through increasing the collateral committed by the storage provider to the Storage Market Actor, which is currently decided by the storage provider and the client. If we assume that this was sufficiently high enough (for example, the same stake as an Ethereum validator) to incentivize a provider not to default, we can think of what is left to secure (even though this would be extremely capital-inefficient, as this stake would be needed to secure each transaction blob or sector with aggregated blobs). Now, a data provider could choose to make data unavailable for maximums of 41-day chunks before the storage deal is terminated by the Storage Market Actor. Assuming a shorter data deal, we could assume that the data can be made unavailable until the last day of the deal. In the absence of coordinated malicious actors, this can be mitigated via replication on multiple storage providers so that the data can continue being served.
We can consider the cost of an attacker overriding consensus to either accept a bogus proof or rewrite ledger history to remove a deal from the orderbook without penalizing the responsible storage provider. It is worth noting however that in the case of such a safety violation, an attacker would be able to manipulate FileCoin’s ledger however they want. In order for an attacker to commit such an attack, they would need at least a majority stake in the FileCoin chain. Stake is related to storage provided to the network; with a current 25 EiB (10¹⁶ bytes) of data securing the FileCoin chain, at least 12.5 EiB would be needed for a malicious actor to offer its own chain that would win the fork-choice rule. This is further mitigated by slashing related to consensus faults, for which the penalty is the loss of all pledged collateral and block rewards and all suspension from participation in consensus.
Aside: Withholding attacks on other DA solutions
Although the above shows that FileCoin is lacking in protecting data from withholding attacks, it is not alone.
- Ethereum: In general, the only way to guarantee that a request to the Ethereum network is answered is to run a full node. Full nodes have no requirements to fulfill data retrieval requests outside of consensus — and therefore. Constructs such as PeerDAS introduce a peer scoring system for a node’s responses to data retrieval in which a node with a low enough score (essentially a DA reputation) could be isolated from the network.
- Celestia: Even though Celestia has much stronger security per-byte against withholding attacks in comparison to our FileCoin construction, the only way to take advantage of this security is to host your own full node. Requests to Celestia infrastructure that are not owned and operated in-house can be censored without penalty.
- EigenDA: Similar to Celestia, any service can run an EigenDA Operator node to ensure retrieval of their own data. As such, any out protocol data retrieval request can be censored. Also note that EigenDA has a centralized and trusted dispenser in charge of data encoding, KZG commitment, and data dispersal, similar to our aggregator.
Retrieval Security
Retrievability is necessary for DA. Ideally, market forces motivate economically rational miners to accept retrieval deals, and compete with other miners to keep prices for clients low. It is assumed that this is enough for data providers to provide retrieval services, however given the importance of DA, it is reasonable to require more security.
Retrieval is currently not guaranteed via the economic security stipulated above. This is because it is cryptographically difficult to prove that data wasn’t received by a client (in the case where a client needs to refute a storage miner’s claim of sending data) in a trust-minimized manner. A protocol-native retrieval guarantee would be required in order for retrieval to be secured through the FileCoin’s economic security. With minimal changes to the protocol, this means that retrieval would need to be associated with a sector fault or deal termination. Retriev is a proof-of-concept which was able to provide data retrieval guarantees by using trusted “referees” to mediate data retrieval disputes.
Aside: Retrieval on other DA solutions
As can be seen above, FileCoin lacks the protocol-native retrieval guarantees necessary to keep storage (or retrieval providers) from acting selfishly. In the case of Ethereum and Celestia, the only way to guarantee that data from the protocol can be read is to self-host a full node, or trust a SLA from an infrastructure provider. It is not trivial to guarantee retrieval as a FileCoin storage provider; the analogous setting in FileCoin would be to become a storage provider (requiring significant infrastructure cost) and successfully accept the same storage deal as a storage provider that was posted as a user, at which point one would be paying themselves to provide storage to themselves.
Latency Analysis
Latency on FileCoin is determined by several factors, such as network, topology, storage mining client configuration, and hardware capabilities. We provide a theoretical analysis which discusses these factors, and the performance that can be expected by our construct.
Due to the design of FileCoin’s proof system and lack of retrieval incentives, FileCoin is not optimized to provide high-performance round trip latency from the initial posting of data to the initial retrieval of data. High performance retrieval on FileCoin is an active area of research that is constantly changing as storage providers increase their capabilities and as FileCoin introduces new features. We define a “round trip” as the time from the submission of a data deal to the the earliest moment the data submitted to FileCoin can be downloaded.
Block Time
In FileCoin’s Expected Consensus, data deals can be included within the block-time of 30 seconds. 1 hour is the typical time for confirmation of sensitive on-chain data (such as coin transfers).
Data Processing
Data processing time varies widely between storage providers and configurations. The sealing process is designed to take 3 hours with standard storage mining hardware. Miners often outperform this 3 hour threshold via special client configurations, parallelization, and investing in more capable hardware. This variation also affects the duration of sector un-sealing, which can be circumvented altogether by quick retrieval options in FileCoin client implementations such as Lotus. The quick retrieval setting stores an unsealed copy of data alongside sealed data, significantly speeding up retrieval time. Based on this, we can assume a worst-case delay of three hours from the acceptance of a data deal to when the data is available on-chain.
Conclusion and Future Directions
This article explores building a DA by leveraging an existing DSN, FileCoin. We consider the requirements of a DA with respect to its role as a critical element of scaling infrastructure in Ethereum. We consider building on top of FileCoin for the viability of DA on a DSN, and use it to consider the opportunities that a solution on FileCoin would provide to the Ethereum ecosystem, or any that would benefit from a cost-effective DA layer.
FileCoin proves that a DSN can dramatically improve the efficiency of data storage in a distributed, blockchain-based system, with a proven saving of 100 million USD per 32 GB written at current market prices. Even though the demand for DA is not yet high enough to fill 32 GB sectors, the cost advantage of a DA still holds if empty sectors are sealed. Although current latency of storage and retrieval on FileCoin is not appropriate for the hot storage needs, storage miner-specific implementations can provide reasonable performance with data being available in under 3 hours.
The increased trust in FileCoin storage providers can be tuned via variable collateral, such as in EigenDA. FileCoin extends this tunabel security to allow for a number of replicas to be stored across the network, adding tunable byzantine tolerance. Guaranteed and performant data retrieval would need to be solved in order to robustly deter data withholding attacks, however like any other solution, the only way to truly guarantee retrievability is to self-host a node or trust infrastructure providers.
We see opportunities for DA in the further development of PoDSI, which could be used (alongside FileCoin’s current proofs) in place of DAS to guarantee data inclusion in a larger sealed sector. Depending on how this looks, this may make slow turnaround of data tolerable, as fraud proofs could be posted in a window of 1 day to 1 week, while DA could be guaranteed on demand. PoDSIs are still new and under heavy development, and so we make no implication yet on what an efficient PoDSI could look like, or the machinery needed to build a system around it. As there are solutions for compute on top of FileCoin data, the idea of a solution that computes a PoDSI on sealed or unsealed data may not be out of the realm of near-future possibilities.
As both the field of DA and FileCoin grows, new combinations of solutions and enabling technologies may enable new proof of concepts. As Solana’s integration with the FileCoin network shows, DSNs hold potential as a scaling technology. The cost of data storage on FileCoin provides an open opportunity with a large window of optimization. Although the challenges discussed in this article are presented in the context of enabling DA, their eventual solution will open a plethora of new tools and systems to be built beyond DA.
¹ Although this isn’t the construction of FileCoin, it is useful for those who are unfamiliar with programmable decentralized storage.
Graph data from FileCoin spec, EIP-4844, EigenDA, Celestia implementation, Celenium, Starboard, file.app, Rollups DA and Execution, and current approximate market prices.