Cryptocurrencies have changed the ways of transactions whether they be business or personal. Below is a graph which shows the number of ethereum transactions per month over years.
Image source: Etherscan.io
It is impressive that cryptocurrencies such as Ether have grown so much over years. But the problem is that cryptocurrencies were not designed for such a widespread use. And as the virtual currencies have grown, many problems have surfaced. The most important and real problem is that of scalability.
What is the Scalability?
According to Wikipedia, Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. An algorithm is only considered scalable if it is efficient enough when applied to a large number of problems.
Let’s try to understand it in simple terms. Let’s say that you want to make payment to someone using PayPal or Visa. When you transact, at the same time, many other people are also transacting. The network through which transactions go through can take only a certain number of transactions per second. As the number of transactions increase, the speed decreases. So the methods used to deal with this problem come under scalability.
With PayPal managing 193 transactions per second and Visa managing 1667, Ethereum manages only 20 transactions per second with Bitcoin being the lowest at only 7 transactions per second. The only way Ethereum and Bitcoin can improve the performance is by working on their scalability.
The scalability problem can be categorized in 2 types
a) The time taken to put a transaction into a block
b) The time taken to reach a consensus
The time taken to put a transaction into a block
In Bitcoin, Ethereum (as of now) and other blockchains following Proof of Work (PoW) protocol, the transactions only go through when miners put the data (transaction data) into the block they have mined. The transaction will be considered completed only when it has been added to the block. Now as these cryptocurrencies grow, more and more people transact every day. This increases the mining time. As the mining time increases, miners have to utilize more and more resources. Due to this reason, for your transaction to go through, you have to pay some amount to the miner which is known as the transaction fee. The higher the transaction fee, the faster a miner will put that transaction into the block. This is a serious problem because many people do not possess a very large amount of bitcoins or ethers and thus cannot pay if the fee is high. This slows down the transaction speed and increases the time it takes to complete a transaction. Below is a graph showing the amount of time people had to wait for their transaction to be completed.
Image source: blockchain.info
Sometimes, people even had to wait for a new block to be mined because the previous blocks reached their limit. (In cryptocurrency, the limit of a block is the amount of the cyrptocurrency it can handle before being added to the blockchain). The block limit of a bitcoin block is 1mb.
Theoritically speaking, ethereum must process around 1000 transactions per second but it is too limited by 6.7 million gas limit on each block (gas is the fuel you need in ethereum for any process to run. It can be considered analogous to fuel used in cars).
Since each block has a certain gas limit, miners can add only transactions whose sum of gas doesn’t go greater than the limit of the block. This limits the transactions going through.
The time taken to reach a consensus
All the cryptocurrencies work on a peer-to-peer network. This ensures that all the people connected to the blockchain(aka nodes) are treated as equal and no extra priviledge is given to any particular node and that there is no governing body which can decide the value of the currency. Thw whole idea is to create an egalitarian network which does not require third party(governing bodies) involvements.
However, without a central governing body, how will the people come to know that a certain transaction has heppened? This problem is solved by a "gossip protocol". When one node makes any transaction, its nearest neighbours come to know about it. They then tell it to their neighbouring nodes which in turn tell it to their neighbouring nodes and this goes on until everyone on the same blockchain is aware of the transaction.The problem here is, if suppose A tells B that a transaction is valid, B cannot just blindly trust A and has to do some calculations itself to see if the transaction is really valid or not. This means that every node must have a copy of the blockchain to ensure the validity of a transaction. This slows down the whole process.
As the number of nodes increase, the process becomes slower and slower. What's more is that consensus happens in a linear fashion which means if there are 3 nodes say A,B and C, first A will verify the transaction, then B and then C. If one more node say D is added to the chain, it affects the overall process.
This is especially true with Ethereum as the number of nodes on Ethereum are much more than Bitcoin. In fact, in May 2017, Ethereum had 25000 nodes whereas Bitcoin only had 7000. That's more than 3 times!
So what can be done to increase the scalability and make the transaction time faster. Well, both Ethereum and Bitcoin have come up with some ideas to face this challenge such as:
- Increasing the Block size
- Proof of State
- Off chain state channels
In this article, we will cover Sharding in detail. Let's try to understand what sharding is and how can it make transactions faster.
What is Sharding?
The term Sharding is taken from database systems. Let's first understand what sharding in database systems means.
In reference to database
Suppose you are building a website. In the database you have a large amount of data. When you save it in a single database, it not only limits the time of accessing/searching for data, but also makes causes scalability problems. As a result, the site becomes slower. So what do we do in such cases?
The answer is to partition the data (table) horizontally and store it in different database servers; or in other words, Sharding of the database. The figure below shows how this can be done.
image courtesy: agildata.com
Follow the link to read more on database sharding.
The partition has to be horizontal. A vertical partition would result in an entirely new table. Look at the example below to understand this.
Image: Vertical partitioning
Image: Horizontal partitioning
So as we see from the above tables, vertical partition results in 2 additional tables whereas horizontal partition reduces the data stored in one server thus reducing the time taken to fetch the data.
In reference to Blockchain
Sharding is an effective way to scale up database systems and this method can also be applied to blockchain. But before we go into the details on how does the method reduce transaction time, here are some important terms that you need to understand.
State: The entire set of information that describes a system at any given time, In Ethereum, this is a current account set containing information like current balances, smart contract code, and nonces at any given time.
Transaction: Any operation performed by a user which changes the state of the system
Merkle Tree: A data structure that can store large amount of data and uses cryptographic hashes. These data structures make it easy to check if a piece of data is a part of the structure in a short time and effort. Read more about Concept of Merkle Tree in Ethereum.
Receipt: A side-effect of a transaction that is not stored in the state of the system, but is kept in a Merkle tree so that its existence can be easily verified to a node.
Now let us take a closer look at the sharded blockchain structure of Ethereum.
This is how a normal block of ethereum or bitcoin looks like. It has a header and the data part consisting of the transactions. The Merkle Tree root of all the transactions is in this block header.
This is how a sharded blockchain would look like (Merkle Tree). We have shards which are further divided into sub-states. In layman language, the idea is that a group of nodes would constitute the shard and will only be able to see and verify the transactions in thier own shards. These shards would then be taken up by the super nodes which will see if the sub shard fulfills the acceptance criteria. If it does, they add it to a block which is in turn added to the blockchain. Let us unnderstand this concept in more detail step by step.
First, we have nodes called collators in each shard. Their task is to create collation (a specific structure that encompasses important information about the shard in question).
These collations are like mini-descriptions of the state and the transactions on a certain shard. They each have something a collation header, which is a piece of data containing the following:
- Information about the shard to which the collation corresponds to
- Information about the current state of the shard before and after the transactions have been done and applied.
- Digital signatures from atleast about 2/3 of the collators confirming that the collation is legit
- Lastly, we have supernodes which will collect the information about these shards and if the collations are found valid(the validity of the shard is discussed below), they add them to a block which in turn is added to the blockchain
With this structure, we have to define some rules to see if the collations are valid or not. This is fulfilled by the following criteria:
- transactions in all the collations must be valid
- the state of collations is the same as the current state of collations before the transactions
- the state of collations after the transactions have been done is the same as what the collation header specifies
- the collations must be signed by atleast 2/3 of all collators.
Operations on a regular blockchain(without sharding) involves one level of interaction which makes the process more scalable. However, Ethereum suggests to change this into a 2 level interaction scheme.
The first level
The first level of interaction is the transaction group. As you can see in the image below, each shard has its own group of transactions.
Image source: Hackernoon
This is a single shard divided into 2 parts namely, Transaction group header and Transaction group body.
The header is further divided into left and right parts where
the left part contains:
- Shard ID: The ID of the shard to which the Transaction group belongs
- Pre-state root: The state of root of shard before applying the Transactions
- Post-state root: The state of root after applying the transactions
- Receipt root: The receipt root of the shard after all transactions are applied
The right part contains validators that are chosen randomly who need to verify the transactions in the Shard itself.
Now, let's take a look at the second level
image source: Hackernoon
As you can see in the picture, there is a normal blockchain but now it consists of 2 roots
- The state root
- The transaction group root
The state root represents the entire state, and as we have seen before, the state is broken down into shards, which contain their own substates.
The transaction group root contains all the transaction groups inside that particular block.
With this we have seen almost every aspect of sharding. However, there is still one question remaining. If the shard groups can only transact among themselves, then is cross-sharding communication possible and if so, how? One thing you need to understand here is that the purpose of sharding is to enable many parallel transactions to increase the performance. So if random cross-shard communication is allowed, then we wouldn't have to do sharding in the first place. To tackle this, ethereum follows the receipt paradigm for cross-shard communication.
image source: Hackernoon
As the image shows, any transaction receipt can be accessed via multiple Merkle Trees from the transaction group Merkle root. Every transaction in a shard will do two things:
- Change the state of the shard it belongs to
- Generate a receipt
The receipts are stored in a distributed shared memory, which can be seen by other shards but not modified. The image below shows how cross-shard communication happens.
image source: Hackernoon
Challenges Of Implementation
There must be a mechanism to know which node implements which shard. This needs to be done in a secure way to ensure parallelization and security
Since blockchain system works on trustless system, after implementation of sharding, one node will have to come up with a some proof to prove that they have finished the work on their part.
____________________________________________________________________________________________________ Disclaimer: This is not an investment advice and should NOT be viewed as project endorsement by EtherWorld. Readers are suggested to do their research before investing into any project.
Disclaimer: This is not an investment advice and should NOT be viewed as project endorsement by EtherWorld. Readers are suggested to do their research before investing into any project.