Overview of blockchain and bitcoin
Disclaimer
Articles in this series use contents from many resources in the internet, whose roots are hard to trace. Majority of ideas are from secondary research.
I also do not provide any financial information, such as which cryptocurrency to invest, when to buy or sell, or price prediction. I am only taking a look on technology and application viewpoint.
First, let’s start with what blockchain and bitcoin are.
What is it
Blockchain is a technology. It is proposed by an anonymous person/group named Satoshi Nakamoto.
Bitcoin is an application of blockchain, among many others.
Bitcoin’s purpose is to improve current payment system. It is decentralized, persistence, anonymous, auditable…
When using bitcoin, you do not need an intermediate organization to validate your account and transfer money. Everything is done inside itself. Bitcoin system is incorporated and self-operated by its participants. Everything is ready to work after you join.
To achieve these goals, bitcoin has employed some techniques.
Architecture
At the bottom of a blockchain architecture is blockchain. It acts like a data structure and works as a database. This database is stored in every full nodes in bitcoin system. On top of the blockchain is bitcoin protocol, which is bitcoin core project. With that protocol, coin (bitcoin, specifically) is created and transferred.
Difference between token and coin:
Tokens are used to bind physical to digital world. Token can bind money (in case of coin), a company’s share, a physical product, a song, an idea… anything.
Coin (any kind of it: bitcoin, ether, litecoin…) is a special token, which is used like money.
On the top of bitcoin are applications like wallets, exchange platform, merchants accepting bitcoin… We usually interact with applications at this level while not interacting with lower stuffs.
There are many other types of coin systems (altcoins), which also have similar architecture like bitcoin.
Ethereum has another story, with different philosophy, design and algorithm. Ethereum is platform, Ether is cryptocurrency developed based on that platform. It has many useful application which I will research about.
That is overall architecture. Let’s go to detailed design of bitcoin.
Design
Coin
Any digital currency must solve double spending problem.
Double spending is when a digital currency is replicated and spent more than once.
This problem becomes harder on a decentralized environment, without any central authority. To avoid it, every smallest piece of coins are traced from their creation, through every transaction, to recent status. Any coin is authenticated before spending.
After joining this flow, a coin have to be an output of a transaction before becoming input of another one. Total number of output is always less than or equal to number of input. Coin can just be transferred from transaction to transaction, cannot be validly replicated. Origin of every coins are traceable. Mining is the only way to create coins from nothing.
In case there are many transactions using the same output, the first transaction with earliest timestamp will take effect, others will be discarded.
Many output is when someone transfer money to more than one address.
Many input is when amount required for a transaction is not enough and need to be gathered from many transactions, of the same source address.
Unspent transaction output (UTXO) is output amount which is not spent in any transaction. In other words, that amount is not used as input of any transaction.
Transaction fee is the difference between total input and total output. Which is paid to miners for validating a transaction.
Transaction
Every transaction includes sender address, receiver address, bitcoin amount (input-output), signature of sender, and other ton of things. They are chained together by input-output mechanism. Output of previous transaction is input of next one. If someone modifies amount of a transaction, he/she must modify all transactions subsequent to that one, every one of them. And must somehow make whole network accept that modification. This is theoretical impossible.
Every address (address is a public key) has a private key to prove ownership of that address. Sender provides transaction with that private key to authorize a transaction. This private key is used to sign information attached with transaction. If owner lost that private key, or it is stolen, everything stored in that address will be lost forever, non-refundable.
Not only addresses (public keys), and transfer amount, private key is used to sign many information related to transaction: previous transaction id, output index number, amount… If someone modify a transaction, he/she not only need to modify all subsequent transactions, but also need private key of everybody receiving that coin. Getting private key of that large amount of people, who is anonymous, is extremely hard, needs huge effort, and somehow impossible.
Coinbase transaction is the first transaction in all blocks (transaction 0). It is the only place where bitcoins are created from nothing, as reward for validating transactions, putting them into blocks, and solving hard hashing problems. Coinbase transaction also includes fees of all transactions that are included in this block.
Rewarded bitcoins starting from 50 bitcoins each block in 2009, halved every 4 years. This will limit total number of bitcoins to 21 millions.
Block
Block is where transactions are grouped, sealed and stored. Transactions grouped together by a mechanism called Merkle tree. Computer will hash all transactions in a block, to find Merkle root. Merkle root hash is stored in block header.
If a transaction cannot find another one to form a pair, it is hashed with itself.
Merkle root is the final hash, after hashing every transactions in a block.
To save storage, some blocks may not store all transactions, but keep only intermediate hashed results. Reducing storage by skipping transactions like this will decrease security of whole system. Because these lightweight node cannot verify transactions and therefore vulnerable to attacks.
If any transaction in block is altered, Merkel root must be updated. If Merkle root of a block is updated, that block’s data is updated. Hence that block must be re-hashed, which process we call proof-of-work.
Proof-of-work
After having all required data to store in block header, block is ready for calculating block hash.
Calculating block hash from block header data is super easy, a regular computer can finish that job in less than one second. What if everybody can easily rewrite blockchain, modify some blocks, insert invalid transactions, then say that the modified version is the valid one? Process of calculating block hash must be harder, cost huge amount of compute power, which will make brute force attack longer and more costly.
The method used is target threshold. Calculated block hash must be below a pre-defined threshold. This make block hash starts with some zeros. If you look into a block in bitcoin blockchain, you can see block hashes like this:
The hashing process is one-way. Nobody can find original data from hashed result. The only way to check whether one piece of data will hash to a specific string is do hashing. This is brute force. Because data in block header have already specified, one way to hash to some specific string (which are below target threshold), is adding a piece of data. That piece of data is called nonce:
When changing nonce, block header data is changed, hash result will change. The process is: Add a nonce to block data -> hash block -> check if hash result is below target threshold. If so, notify network that a new block is found and built. Otherwise, try a new nonce. Repeat until finding one valid hash result, or get notification from another computer that they found before you.
Block generating and hash calculating is controlled to be about 10 minutes each block. But sometimes computers in network can do that faster or slower. After every 2016 blocks (about 2 weeks), difficulty (target threshold) is adjusted up to 300% harder if the last 2016 blocks are generated faster than 2 weeks; or adjusted down to 75% easier if they are generated longer than 2 weeks. This make proof-of-work always hard enough despite of total compute power. So nobody can invest into power and gain control of bitcoin system.
Chain
After calculating block hash as proof-of-work, blocks are chained together. Every blocks (except the first one) stores previous block’s hash in its header. If any transaction or block is changed, every subsequent blocks must be changed, too. So whole costly proof-of-work hashing process of every dependent blocks must be re-done. This mechanism strengthen blockchain and reinforce confirmation of transactions. If someone want to change a transaction, that person must rewrite blockchain from the block holding that transaction, and must do all hashing process faster than whole honest network combined, to form a longer chain. This is considered extremely hard if attacker does not have majority (51%) compute power.
Number of confirmations is number of blocks after a transaction. The more confirmation, the more secure that transaction is. With small transactions like buying coffee, merchant only require 1 confirmation after bitcoin is transferred, to make it fast and convenience. But with larger transactions like buying a phone, merchant may ask for 6 confirmations. More confirmations, harder to rewrite blockchain and undo bitcoin transfer.
Blockchain will collapse if someone or group control 51% compute power of whole system. With that compute power, that person/group can rewrite blockchain data and insert any transaction that benefit themselves, so damage the society.
Consensus rule
What if there are many chains coexist at one time? (This usually happens)
Consensus rule: the longest chain is the honest one.
If someone want to cheat system, that person must produce a valid (proof-of-work) chain longer than the chain that all honest people are creating, at faster speed. To do so, that person must control 51% computing power. This is called 51% attack. Currently, there is no computer system (including governments) has such computing power.
But we also need to know that the 5 largest mining organizations own more than 75% of total mining capacity; and 58% of hashing power is from China. Therefore, bitcoin mining pools are monitored closely by community.
Bitcoin system is created to be more profit when honestly generating blocks than cheating. Generating blocks earns bitcoin rewards, which is better than use compute power to start a race with all honest computers. But in the future, when bitcoin reward decreases (halved every 4 years), or bitcoin to USD price decreases, while number of miners are huge, then nobody can guarantee kindness will win.
Because of those design principals, number of transactions growth and bitcoin price booming, bitcoin system has just uncovered a lot of limitations.
Limitations
1. Time
Bitcoin blocks size are limited to 1MB. This rule limits number of transactions can be stored in each block. Blocks are added to blockchain every 10 minutes. So this allows only 7 transactions added to bitcoin blockchain every second.
But that’s not the only story about speed. Some transactions, like transfer bitcoin for buying a phone, requires 6 confirmations to be valid. 6 confirmations requirement means that after transaction is added to a block, confirmed, broadcasted to network, there must be at least 6 subsequent blocks added to blockchain after that block. This requirement restricts possibility that an attacker can change transaction after receiving product. The more confirmations, the more secure (hard to alter) transaction is. Now maybe you see the problem. To get that kind of transaction trustworthy confirmed, buyer must wait at least 60 minutes to buy a phone. But that is the happiest case. When bitcoin price is increasing, if transaction fee buyer added to transaction is not attractive enough, miners would not add that transaction into blocks soon. Then waiting days for the first confirmation is usual. Transferring money, then pay high fee or wait for some days? What do you choose? That maybe not convenience enough for any financing system.
There were some tries to increase block size. The most famous one is Bitcoin Cash, which increased block size from 1MB to 8MB. That was a hard fork. But increasing block size also increase delay in block broadcasting. That’s why bitcoin developer team did not agree with the 8MB proposal. Increasing block size can improve transaction handling speed (more transactions stored in blocks), but slow down block broadcasting, leading to multiple branches existing in blockchain at every moment. Then take time to correct, waste many blocks in not-longest chains. Increasing block size is a tradeoff.
Hard fork is is a method for developers to update bitcoin’s software.
Decreasing difficulty of hashing algorithm may increase hashing (in other words: mining, validation, confirmation) speed, but will decrease system’s security.
When bitcoin price is increasing nowadays (20 times from beginning of the year), 1MB every 10 minutes is not enough for storing huge amount of transactions. That’s why bitcoin transfers are taking too much time, and expensive.
2. Cost
For those 7 transactions/second, bitcoin is already estimated to use 35 times as much energy as Visa. If you brought bitcoin’s transaction volume up to Visa’s it would be using as much electricity as the rest of the world put together. Because bitcoin system consumes tremendous compute power, billion dollars of electricity and network bandwidth, storage capacity, it becomes not as cheap as planned. Miners want to make their money payback and profitable, in the form of transaction fee (while rewarding bitcoin is halving every 4 years and only equal to 1/4 of the first days). So mining investors are always attracted by high-fee transactions.
Let’s look back to the first far days, when transactions only cost a small portion, some percents of a bitcoin. But recently, that “small portion” worth about $20 to $50. Which has not cheap anymore. If I want to transfer a small amount of bitcoin, which worth about $2 to my friend, and do not accept to pay tens of dollars for fee, my transaction would wait for days in prioritized-by-fee queue to be confirmed. My friend has no way to receive that money until that transaction is confirmed, 2 days later.
More people are joining, more transactions are generated. Along with increment of bitcoin price, transaction processing time and cost are increasing too. This makes people harder to do normal financing activities. This is a paradox.
3. Privacy
Bitcoin originally promotes privacy, since it is not possible to know who is behind an address by just reading it. But blockchain system is not transactional private. Values of all transactions and balances are publicly visible. Give anyone an address, he/she can get all transaction in and out of that address from the beginning of time, and that address’s balance. If I transferred bitcoin to a friend, then I know his bitcoin address, so I can see all his transactions, and his balance. Solution for this problem is using one-off addresses. But it is not easy for normal people to use such an annoying method. Because lastly, all your coins should me transferred to some few addresses. It does not make sense if today I have 1000 addresses, next year I will have 5000 ones.
Another privacy leaking is user’s transactions can be linked to reveal user’s information. Your financial status can be linked to your identity. If you want to hide, you should hide your payments and income also. But bitcoin is not transactional private. So you should keep your address private (!), or one-off.
Bitcoin employs peer-to-peer network, on top of the internet. Every communication in bitcoin system goes through the internet, which use IP addresses. And IP addresses are linked to real-world addresses, by Internet Service Provider. Knowing an IP, everybody can easily query for location of that IP. Knowing real-world address to personal identity is not a long way. So bitcoin users need to hide IP. How? Using Virtual Private Network, or Tor. I am thinking about teaching my non-tech girl friend using VPN…
But in this real world, many people are making transactions with the same address frequently. I know a lot of my friends, who are software engineers, using only 1 or 2 bitcoin addresses for trading online. After sending them some coins, I can query transaction history of their address, and I know how many money they are having. Asking people to use multiple bitcoin address is just like convincing them to change bank account after each activity. That does not make sense at all.
4. Security
Everybody must keep their private key(s) secret, as secure as ATM card PIN. If they lose those keys to bad people, they will lose all things protected by them. People are relying on single-point encryption, the private keys, rather than a more sophisticated mechanism that might involve.
If people lost their house’s door key, worthy properties in their house may gone away. But they can get money back by insurance. If people want to transfer a big amount of money to someone, they have to sign in some pieces of paper. If someone steal your money, they can be caught by police. These methods exists to keep people safe. These methods do not exist in bitcoin world. You have to protect yourself, with all risks acknowledged.
Bitcoin private key is hard/impossible to remember. They are just sequence of meaningless characters like: e9873d79c6d87dc0fb6a… Can you remember it? Mini private keys exist, but they do not help much. Not only impossible to remember, but another problem to private key is how to keep them secret, and able to find when needed. There are several ways: online, offline, printed copy… Online: you must trust the online service. Offline: save it in some hard drive, but must keep that computer virus-free and protect its login password. Printed copy: keeping it more secret than your house’s key. Hence, bitcoin keys are harder to keep than physical keys. There are people who printed private key to paper, cut that paper into pieces and kept in multiple bank vaults.
What if you have many one-off addresses (along with many private keys), one for each transaction? How to manage these addresses and private keys, keeping them all secret? This is where wallet software take place. But you have to put all your bitcoins into trust of the software. Not only the software, how about malware, virus, trojan, hardware failure… There were people whose bitcoin accounts drained because their email had been hacked and their password was stolen. They were stunned to have no recourse.
If everybody have to do all above things to protect their anonymity and security, the usability of bitcoin would severely decrease.
5. Usability
Bitcoin is easy for people to use, but very hard for them to be protected. Best practice is a long list. Researchers are working on finding new ways to attack and protect blockchain, while keeping it convenience. In the meantime, people are trading bitcoins, try to keep balance between usability and other benefits.
6. Storage
At the beginning days, bitcoin was expected to only use waste storages, which are very cheap. But recent years while bitcoin price is booming, people are putting their money into bitcoin. Transactions occurs every second. Storage is becoming big and expensive. Moore law has not worked at the same pace like bitcoin’s growth. Not only that, in bitcoin world, all miners must download whole chain, which is now 150GB. And it is growing.
With 1MB per block for every 10 minutes, bitcoin blockchain uses 52.5 GB a year. Assume that blocks increase 25% annually, so the chain will be 52.5 * 1.25^10 ~ 3TB after 10 years.
Before 2014, bitcoin blockchain size was only 20MB, which is tiny, nearly free, and can be stored in any personal computer. But nowadays, it is big enough to be carefully considered. Total number of storage bitcoin system using would be million times bigger when more people are joining, and the chain is cloning everywhere in the world.
People can use lightweight node to save disk storage, but that will decrease security of system as a whole and make it more vulnerable to attacks. People are recommended to use full node, with full blockchain downloaded to support decentralized system’s security. So, bitcoin is not cheap any more.
7. Scalability
Combining other problems of bitcoin, we got this big one. Storage is increasing to infinity, people are waiting days for transaction confirmation if they don’t want to put tens of dollars into transaction fee… These roadblocks are slowing down bitcoin on the way to be widely accepted.
There were some proposals. One of which is increasing block size from 1MB to 8MB (as mentioned above). Or periodically removing old transactions, only store balance of non-empty addresses in a distributed database. Sharding hashing process: not all nodes validate all transactions but assign to a subset of nodes. This method will sacrifices security for benefits of scalability. With sharding, bitcoin will be more vulnerable to 51% attack, because now attacker only need to control half of subset instead of whole network.
More proposals are in research and applying to improve bitcoin security and scalability. If you have an idea, please put it at Bitcoin Improvement Proposals. If a proposal is not approved by bitcoin development team, it can be implemented by separate team, which will form a hard fork, then create a new coin “branch”: Bitcoin Cash, Bitcoin Gold, Super Bitcoin… Or that separate team will give birth to a fully new cryptocurrency. There are thousands of them: Cryptocurrency Market Capitalizations.
8. Vulnerability
Bitcoin is vulnerable to news. When good news or bad news occur, especially from government, bitcoin price go up or down very quickly. If a government ban bitcoin usage, it will fall. If blockchain technology is in consideration of governments, it will rise. Bitcoin to USD price is dominated by news, not law of supply and demand. And because bitcoin does not represent any physical or real asset, it only exist in the internet and in human’s mind, so it is vulnerable.
When price decreases, it is tremendous. Let’s see what happened when Mt. Gox exchange went down:
Bitcoin system as a whole is also vulnerable to security attacks. If an attack succeeds, nothing is returned. Everything go to somewhere, but nobody know where it is.
Those limitations must be fixed, if bitcoin want to compete with current financing system, other cryptocurrencies and move to next milestone.
Threats
Bitcoin protocol is not stable yet, its version is now just 0.15.1. The first stable version usually starts at 1. Bitcoin developers say that “Bitcoin is an experimental digital currency…" Bitcoin price is going up nowadays, people are putting their money into it. They should know that in case of security leak, or technological failure, their assets may disappear.
For being able to trade at cryptocurrency exchange platform, people must give bitcoins to them, immediately give up their ownership. They own only a non-contractual IOU (I Owe You). If something unexpected happen, they will lose everything they deposited. IOU does not have any legal constraint. When Mt. Gox was hacked, traders lost $400mil bitcoin (which is several billions now).Mt. Gox just bankrupted, closed. Nobody can found their money back. This year, similar situation happened to Youbit, a South Korean cryptocurrency exchange. They lost 17% total assets after being hacked, and cannot make anything of it returned.
Although blockchain itself has never been hacked successfully, but applications and services around it have. Someone may say that blockchain is secure. It is. But attackers usually intend to hack people and applications, because doing so is easier and cheaper than attacking blockchain. So you must trust services you are using. Chose them very carefully.
Bitcoin wealth distribution
A big minus point to bitcoin community is its wealth distribution. Small number of holders are owning majority of market. This is because great mining rewards paid out in early days to a few people. In that time, constructing a block get 50BTC reward. Very few miners joined the game. Those people at that time, now have assets which worth billions of dollars.
Cryptocurrency wealth distribution is not transparent. Owners and their transactions are hidden to public. That is not good for small traders. When whales do something, everybody affected. When these whales cooperate, they can drive direction of market without restrictions.
Whales are holders of large amounts of bitcoin.
Real life have similar wealth distribution model but with more control and restrictions. Big shareholders and managers of a company must public their buying/selling deals. In stock market, if someone use internal information for trading; or use many accounts (of one or many people) to manipulate stock price, that person can be arrested. These activities are illegal in real world. But in cryptocurrency exchanges, these illegal-in-real-world methods are not considered carefully, they are still valid, and there is nothing out there to protect small traders. People must aware and protect themselves.
Trade at your own risk.
Protect yourself
Cryptocurrency world is unsafe, but there are several methods that people can use to protect their money. Although these methods are not easy to adapt, but helpful at the moment. Some of them are:
- Choose service provider carefully. You totally depend on them. All your assets are in their wallet. If they are hacked, or they intentionally steal your money, or just accidents, you have no way to get your money back. An apologize from service provider is all what you get. Similar to applications and wallet softwares.
- Keep only small portion of asset on computer/mobile for daily use. If that computer is hacked, or mobile phone is lost, remaining money still safe elsewhere.
- Backup wallet regularly, and encrypt it. If losing computer or getting hard drive failure, data is still recoverable. There was a person who threw hard drive, which stores hundred-million-dollar bitcoin, to garbage. He is trying to get government acceptance to find that drive in city landfills.
- Remember to encrypt. Without encryption, anyone has data can get money. Make sure to remember strong, hard-to-remember passwords, even after years without using. If you lose that password, you will lose everything. Someone printed that password, split it, and keep in vaults.
- Use cold storage. It is a computer which is reseted to manufacturer configuration, always disconnected from network, and only store sensitive information. You can use it like this: create a transaction on online computer, copy that transaction to a USB, put that USB to cold storage, sign transaction, put USB back to online computer, and proceed transaction.
- Use hardware wallet. These devices only have one functionality: signing transactions. They are safe to virus because no software can be installed on them. But keeping them like keeping physical keys.
- Have a backup plan for family, in case of death or disability. In real world, law can help. But in cryptocurrency world, without private key, your relatives cannot do anything. Your asset will still there forever. But nobody can access it, forever.
- Get updated with your softwares. Newer versions fix security problems. Bitcoin environment is not stable, there are many things to examine, and improvements are announced regularly.
- Get updated with news. Cryptocurrency world is vulnerable to news.
- Get updated with new new methods and techniques. Bitcoin and blockchain are now researched by many organizations.
Not only bitcoin
Many people think bitcoin is the only application of blockchain, but it is not. Cryptocurrency is one of thousands application that blockchain can bring. Let’s take a look, and see what blockchain can do:
The most famous one is smart contract. Every rules are put clearly inside source code. Any modification related to contract is recorded in blockchain, publicly visible and irreversible. Contract is coded, run by computer, so it cannot be cheated (assumes code is correct). That kind of contract forces participants to comply, without intermediate organization - like government, or law. Rules in contract are self-executed, fast, and trustworthy. Examples are:
- When a car drive onto a space, it can automatically purchase that parking slot.
- A licensed song is delivered to blockchain. If someone want to listen or remix it, they have to pay a specified amount. Payment conditions are coded and executed right after action occurred.
- When a package arrived, system will automatically send 1BTC to merchant, and 0.1 BTC to shipper.
Another application is supply chain. Blockchain can be used to track position and status of containers all around the world, using IoT (Internet of Things). Not only containers, it can be used to verify origin of products, checking status of packages, and customs services.
Blockchain can be used in healthcare. Health records are stored in blockchain, accessible everywhere, synchronized. It becomes the single source of truth. It is public, but anonymous. Health data can only accessed by patient and doctor. Anonymous data in blockchain can be used for medical researches.
Blockchain can be used by government for citizen records; or in schools as student scoreboards. It can be used for reputation system of people (for small loans), products, or services.
Data in blockchain is responsive, updated in real time. It is open, but keeps anonymity. It is useful for small startups and large enterprises. The most imaginative mind cannot think what people can do with these data.
Blockchain is not only digital coin, it can be any kind of digital asset, or any digital representation of any physical asset.
Real world maps into blockchain\ Blockchain represents real world
What is next
Blockchain has grown to version 2.0, 3.0. Besides bitcoin, thousands of cryptocurrencies are in market, each solve some specific problems. There are many blockchain frameworks are published to the society, some are in development. Bringing huge number of applications, solving several real-world problems, making our lives easier. Some limitations of blockchain first version have been solved, leading to new era of technology.