Bitcoin and Ethereum provide a decentralized means of handling money, contracts, and ownership tokens. From a technical perspective, they have a lot of moving parts and provide a good way to demo a programming language.
This article will develop a simple blockchain-like data structure, to demonstrate these in Haskell:
Writing a binary serializer and deserializer
Using cryptographic primitives to calculate hashes
Automatically adjusting the difficulty of a miner in response to computation time.
We’ll name it Haskoin. Note that it won’t have any networking or wallet security until a future article.
What is a Blockchain?
The first step when writing any software application is always to figure out your data structures. This is true whether it’s Haskell, Perl, C, or SQL. We’ll put the major types and typeclass instances in their own module:
MerkleF is a higher-order Merkle tree type that adds a layer onto some other type. The Cofree MerkleF Block does two things: It recursively applies MerkleF to produce a type for all depths of Merkle trees, and it attaches an annotation of type Block to each node in the tree.
When using Cofree, anno :< xf will construct one of these annotated values.
It will be more useful to look at an “inverted” tree where each node knows its parent, rather than one where each node knows its children. If each node knew its children, adding a single new block to the end would require changing every node in the tree. So MerkleF produces a chain, not a tree.
Protolude is a replacement Prelude that I’ve been using recently in moderately-sized projects. Prelude has a lot of backwards-compatibility concerns, so a lot of people shut it off with the NoImplicitPrelude language extension and import a custom one.
Why do we choose this weird MerkleF type over the simpler one below?
The main reason is to get those Functor, Traversable, and Foldable instances, because we can use them to work with our Merkle tree without having to write any code. For example, given a blockchain
, here’s how you can get all of its transactions:
I tested the above using stack ghci to enter an interactive prompt.
Real blockchains have a lot of useful things in the header, such as timestamps or nonce values. We can add them to BlockHeader as we need them.
Constructing Chains
A bunch of abstract types that are awkward to use aren’t very useful by themselves. We need a way to mine new blocks to do anything interesting. In other words, we want to define mineOn and makeGenesis:
The genesis block is pretty easy, since it doesn’t have a header:
We can write mineOn without any difficulty, transaction limiting, or security pretty easily if we knew how to calculate a parent node’s hash:
Crypto.Hash has plenty of ways to hash something, and we’ve chosen type HaskoinHash = Digest SHA1 earlier. But in order to use it, we need some actual bytes to hash. That means we need a way to serialize and deserialize a Blockchain. A common library to do that is binary, which provides a Binary typeclass that we’ll implement for our types.
It’s not difficult to write instances by hand, but one of the advantages of using weird recursive types is that the compiler can generate Binary instances for us. Here’s complete code to serialize and deserialize every type we need:
I only included deserialize and serialize to make it clearer what the end result of this module is. Let’s drop them in favor of decode and encode from Data.Binary.
Generic is a way of converting a value into a very lightweight “syntax tree” that can be used by serializers(JSON, XML, Binary, etc.) and many other typeclasses to provide useful default definitions. The Haskell wiki has a good overview. binary uses these Generic instances to define serializers that work on just about anything.
We had to hand-write a Binary instance for HaskoinHash because Digest SHA1 from the Crypto.Hash library didn’t provide it or a Generic instance. That’s okay - digests are pretty much bytestrings anyways, so it was only a few lines.
Here’s how to use them to implement mineOn:
And here’s how to test that this actually works:
If you’re testing serialization code at home, you may prefer to use the base16-bytestring library to hex-encode your ByteStrings:
Note that it will probably be a PITA for a C programmer trying to follow our serialization/deserialization code because the byte-wrangling is hidden in a lot of really generic code. If you want to produce a spec for people to use(always a good idea), you’ll probably want to hand-roll your serialization code so it’s self-documenting.
Mining
There are a couple mining-related problems with this so-called blockchain:
People can have negative balances, so people can create a “scapegoat account” that they transfer unlimited amounts of money from.
There is no transaction limiting, so someone could create a huge block and run our miners out of memory.
We always mine empty blocks, so nobody can transfer money.
There is no difficulty, so miners aren’t proving they’ve done any work.
I say that these are all mining problems because the code that miners run is going to deal with them.
#3 we’ll wait for Networking to solve. The rest we can do now.
To solve #1, we need account balances for anyone with a transaction that we’re mining a block for. Let’s go ahead and calculate all possible account balances:
And then once we have a parent blockchain, we know how to filter out the invalid transactions:
To solve #2, I’ll let the current miner choose however many transactions he wants to put in his block. That means I’ll put a constant globalTransactionLimit = 1000 at the top that we’ll use when mining, but we won’t verify past blocks using it.
To solve #4, we need to add a nonce field to our BlockHeader that the miner can increment until he finds a good hash. We’ll make it an arbitrarily-large integer to avoid the scenario that no nonce values yield a sufficiently-difficult hash. And since we want to adjust our difficulty so blocks take roughly the same time to mine, we’ll store a timestamp in the header.
We enter loop and keep incrementing the counter and fetching the time until we find a candidate with the right difficulty. The actual difficulty of a Blockchain is just its hash converted to an integer:
How do we know what the right difficulty is? To start with, we’ll calculate the average time-between-blocks for the last 100 blocks:
Let’s have a target time of 10 seconds. Suppose blockTimeAverage bc gives 2 seconds, so we want blocks to take 5 times as long: adjustmentFactor = targetTime / blockTimeAverage bc = 5. Which means we want only 1/5 of the originally-accepted blocks to be accepted.
Since hashes are uniformly-distributed, 1/5 of the original hashes are less than originalDifficulty / 5, which will be our new difficulty. That’s what Bitcoin does: difficulty = oldDifficulty * (2 weeks) / (time for past 2015 blocks).
Here are a few recent mining times using these calculations:
They hover around 10s because targetTime = 10.
Persistence
We’ll save the blockchain on disk, and give people 3 tools:
A tool to mine blocks and create a new chain
A tool to list account balances
The first tool is the miner:
The second one prints all of the account balances
Here’s its output:
So I’ve apparently mined 23 blocks just testing stack exec mine.
Conclusion
We developed a simple blockchain data structure. You can browse the repository on Github.
Future Haskoin-related articles may cover
Using networking and concurrency primitives to set up a peer-to-peer network.
Securing accounts in wallets, so other people can’t transfer money out of your account.
Building a ‘blockchain explorer’ website
GPU-accelerating our hashing
FPGA-accelerating our hashing
Future cryptocurrency-related articles may cover:
You may have heard about proof-of-work and proof-of-stake. What about proof-of-proof - where the miners compete to prove novel theorems in an approriate logic?
Michael Burge has years of experience developing complex software, including custom compilers, data stores for "big data", and machine learning models.