Are you curious about Swarm and how it may be useful for Ethereum developers? Great! You’ve come to the right place. To learn blockchain development and be certified I recommend visiting Ivan on Tech Academy.
Blockchain is currently #1 ranked skill by LinkedIn, hence you should definitely learn more about Ethereum to get a full-time position in crypto during 2020.
Now to the topic at hand: what is Swarm and why is it useful for Ethereum developers?
On my first and second pieces I’ve discussed how Ethereum 2.0 and the best tools for developers. On my third and fourth articles I’ve discussed quadratic voting and open governance models. Today, I want to look at Ethereum’s infrastructure and how storage works.
Hence, today I’ll discuss one of the key aspects of back-end development: how data storage works. Ethereum is a decentralised network, meaning file-stores for the Ethereum protocol should follow the same approach.
Swarm is one of my favourite projects on Ethereum. It is, perhaps, a central piece to the entire decentralised ecosystem.
According to the website, Swarm is a censorship-resistant, permissionless, decentralised storage and communication infrastructure layer.
The main purpose of Swarm is to be a decentralised store for dApp code, user data, blockchain data, and state data.
Swarm sets out to provide various base layer services for Web 3.0. Services include node-to-node messaging, media streaming, decentralised database services, and scalable state-channel infrastructure for decentralised service economies.
Swarm’s record keeping
Before I dive deeper into Swarm’s technical structure, I will define how Swarm records, stores and maintains data, as well as how retrievers can access the data at any time.
The idea is that random nodes store documents. Nodes just keep a tag of the root-hash and subsequent directory hashes. Afterwards, rendering a document is quite straightforward. Essentially, a requester simply pulls the page.html from the manifest entries.
Swarm’s base-layer infrastructure provides the services mentioned above. Swarm works pretty well as each service can contribute resources to each other.
These contributions are accurately accounted for on a peer-to-peer basis. Nodes trade resource for resource while offering monetary compensation to nodes consuming less than they serve.
Swarm is using existing smart contract platforms like Ethereum to implement its incentives mechanism, explained in greater detail below.
But first, let’s look into Swarm’s data structure. There are three main components that make up the Swarm decentralised storage system:
- Chunks: These are pieces of data of limited size (max 4K) that act as the basic unit of storage and retrieval in Swarm. Chunks llink to addresses.
- Reference: This is a unique identifier of a file that allows clients to retrieve and access the content.
- Manifest: This is a data structure describing file collections. It specifies paths and corresponding content hashes allowing for URL-based content retrieval.
The image above shows how a request renders through Swarm. Essentially, chunks represent hashed information such as “page.html” or “page.css”.
Each chunk contains a reference that is in the Manifest, telling the requester how to retrieve and render the information.
Next, I’ll look into Swarm’s architecture and how different nodes write and upload data to the network.
Swarm stack: upload
Initially, the Distributed Pre-image Archive, or DPA, splits each data blob into many chunks of data. The DPA chooses randomly which nodes get to store which chunks. Afterwards, those nodes store chunks locally. These chunks of data are tagged and dropped in a bin of random nodes. After receiving the data nodes will communicate with other nodes on the same network, or address space.
Nodes will automatically sync the data according to each chunk’s timestamp. Therefore, there won’t be any data loss or corruption of blobs.
Finally, each bin (0, 1, … , 31) shows how nodes on the same address-space will store related chunks.
Because nodes store, sync and share all information, any given node can send the entire data piece back to the retriever.
To conclude this section it’s important to underline, a requester can retrieve a piece of data at any time (asynchronous model).
Swarm stack: Storage
The actual storage layer of Swarm consists of two main components: the LocalStore and the NetStore. An in-memory fast cache (Memstore) and a persistent disk storage (DBStore) compose the LocalStore. The NetStore extends the LocalStore to a distributed storage of Swarm and implements the DPA.
The FileStore is the local interface for storage and retrieval of files. When a file is handed to the FileStore for storage, it chunks the document into a Merkle hash tree and hands its root key back to the caller.
That root key is later used to retrieve the document.
Finally, the FileStore takes the Swarm hash and uses the NetStore to retrieve the root chunk of the document for the user.
From the end user’s perspective, Swarm does not affect navigation or behaviour.
However in the back-end, a peer-to-peer storage network hosts content, instead of individual servers. This peer-to-peer network is self-sustaining due to a built-in incentive system. Incentives are only possible due to the use of a public blockchain that allows trading resources for payment.
Swarm also deeply integrates with the DevP2P multi-protocol network layer of Ethereum. DevP2P is a set of network protocols which essentially form the Ethereum peer-to-peer network.
Adding to the above, Swarm links to the Ethereum blockchain for domain name resolution (ENS), service payments, and content availability insurance.
Swarm vs IPFS vs Filecoin
To conclude this piece, I would like to underline the key differences between Swarm and other distributed filestores such as IPFS and Filecoin.
IPFS, or the interplanetary file-system, is a peer hypermedia protocol aiming at making the web faster, safer, and more open. Filecoin, on the other hand, is a decentralised file-storage system with a token working as an incentive for node operators.
To better understand how Swarm differs from both, let me make a few simple comparison points:
- Swarm’s core storage component is an immutable content address rather than a generic distributed hashed table, or DHT (IPFS uses DHT).
- Swarm, Filecoin, and IPFS use different network communication layers and peer management protocols.
- Swarm has deep integration with the Ethereum blockchain and the incentive system benefits from both smart contracts as well as the semi-stable peer-pool. Filecoin uses proof of retrievability as part of mining. IPFS has no incentive mechanism built in.