Bitcoin and Ethereum provide a decentralized means of handling money, contracts, and ownership tokens. From a technical perspective, they have a lot of moving parts and provide a good way to demo a programming language.

This article will develop a simple blockchain-like data structure, to demonstrate these in Haskell:

  • Writing a binary serializer and deserializer
  • Using cryptographic primitives to calculate hashes
  • Automatically adjusting the difficulty of a miner in response to computation time.

We’ll name it Haskoin. Note that it won’t have any networking or wallet security until a future article.

What is a Blockchain?

The first step when writing any software application is always to figure out your data structures. This is true whether it’s Haskell, Perl, C, or SQL. We’ll put the major types and typeclass instances in their own module:

{-# LANGUAGE GeneralizedNewtypeDeriving, NoImplicitPrelude, DeriveTraversable, DeriveDataTypeable, StandaloneDeriving, TypeSynonymInstances, FlexibleInstances #-}

module Haskoin.Types where

import Protolude
import Crypto.Hash

import Control.Comonad.Cofree
import Data.Data
import qualified Data.Vector as V

newtype Account = Account Integer deriving (Eq, Show, Num)

data Transaction = Transaction {
  _from   :: Account,
  _to     :: Account,
  _amount :: Integer
  } deriving (Eq, Show)

newtype BlockF a = Block (V.Vector a) deriving (Eq, Show, Foldable, Traversable, Functor, Monoid)
type Block = BlockF Transaction

type HaskoinHash = Digest SHA1

data BlockHeader = BlockHeader {
  _miner       :: Account,
  _parentHash  :: HaskoinHash
  } deriving (Eq, Show)

data MerkleF a = Genesis
               | Node BlockHeader a
               deriving (Eq, Show, Functor, Traversable, Foldable)

type Blockchain = Cofree MerkleF Block

MerkleF is a higher-order Merkle tree type that adds a layer onto some other type. The Cofree MerkleF Block does two things: It recursively applies MerkleF to produce a type for all depths of Merkle trees, and it attaches an annotation of type Block to each node in the tree.

When using Cofree, anno :< xf will construct one of these annotated values.

It will be more useful to look at an “inverted” tree where each node knows its parent, rather than one where each node knows its children. If each node knew its children, adding a single new block to the end would require changing every node in the tree. So MerkleF produces a chain, not a tree.

Protolude is a replacement Prelude that I’ve been using recently in moderately-sized projects. Prelude has a lot of backwards-compatibility concerns, so a lot of people shut it off with the NoImplicitPrelude language extension and import a custom one.

Why do we choose this weird MerkleF type over the simpler one below?

newtype Block = Block (V.Vector Transaction)
data Blockchain = Genesis Block
                | Node Block BlockHeader Blockchain

The main reason is to get those Functor, Traversable, and Foldable instances, because we can use them to work with our Merkle tree without having to write any code. For example, given a blockchain

import qualified Data.Vector as V

let genesis_block = Block (V.fromList [])
let block1 = Block (V.fromList [Transaction 0 1 1000])
let genesis_chain = genesis_block :< Genesis
let chain1 = block1 :< Node (BlockHeader { _miner = 0, _parentHash = undefined }) genesis_chain
let chain2 = block1 :< Node (BlockHeader { _miner = 0, _parentHash = undefined }) chain1

, here’s how you can get all of its transactions:

let txns = toList $ mconcat $ toList chain2
-- [Transaction {_from = Account 0, _to = Account 1, _amount = 1000},Transaction {_from = Account 0, _to = Account 1, _amount = 1000}]
let totalVolume = sum $ map _amount txns
-- 2000

I tested the above using stack ghci to enter an interactive prompt.

Real blockchains have a lot of useful things in the header, such as timestamps or nonce values. We can add them to BlockHeader as we need them.

Constructing Chains

A bunch of abstract types that are awkward to use aren’t very useful by themselves. We need a way to mine new blocks to do anything interesting. In other words, we want to define mineOn and makeGenesis:

module Haskoin.Mining where

type TransactionPool = IO [Transaction]

mineOn :: TransactionPool -> Account -> Blockchain -> IO Blockchain
mineOn pendingTransactions minerAccount root = undefined

makeGenesis :: IO Blockchain
makeGenesis = undefined

The genesis block is pretty easy, since it doesn’t have a header:

makeGenesis = return $ Block (V.fromList []) :< Genesis

We can write mineOn without any difficulty, transaction limiting, or security pretty easily if we knew how to calculate a parent node’s hash:

mineOn :: TransactionPool -> Account -> Blockchain -> IO Blockchain
mineOn pendingTransactions minerAccount parent = do
  ts <- pendingTransactions
  let block = Block (V.fromList ts)
  let header = BlockHeader {
        _miner = minerAccount,
        _parentHash = hash parent
        }
  return $ block :< Node header parent

hash :: Blockchain -> HaskoinHash
hash = undefined

Crypto.Hash has plenty of ways to hash something, and we’ve chosen type HaskoinHash = Digest SHA1 earlier. But in order to use it, we need some actual bytes to hash. That means we need a way to serialize and deserialize a Blockchain. A common library to do that is binary, which provides a Binary typeclass that we’ll implement for our types.

It’s not difficult to write instances by hand, but one of the advantages of using weird recursive types is that the compiler can generate Binary instances for us. Here’s complete code to serialize and deserialize every type we need:

{-# LANGUAGE StandaloneDeriving, TypeSynonymInstances, FlexibleInstances, UndecidableInstances, DeriveGeneric, GeneralizedNewtypeDeriving #-}

module Haskoin.Serialization where

import Haskoin.Types
import Control.Comonad.Cofree
import Crypto.Hash
import Data.Binary
import Data.Binary.Get
import Data.ByteArray
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BSL
import Data.Vector.Binary
import GHC.Generics

instance (Binary (f (Cofree f a)), Binary a) => Binary (Cofree f a) where
instance (Binary a) => Binary (MerkleF a) where
instance Binary BlockHeader where
instance Binary Transaction where
deriving instance Binary Account
deriving instance Binary Block

deriving instance Generic (Cofree f a)
deriving instance Generic (MerkleF a)
deriving instance Generic BlockHeader
deriving instance Generic Transaction
instance Binary HaskoinHash where
  get = do
    mDigest <- digestFromByteString <$> (get :: Get BS.ByteString)
    case mDigest of
      Nothing -> fail "Not a valid digest"
      Just digest -> return digest
  put digest = put $ (convert digest :: BS.ByteString)

deserialize :: BSL.ByteString -> Blockchain
deserialize = decode

serialize :: Blockchain -> BSL.ByteString
serialize = encode

I only included deserialize and serialize to make it clearer what the end result of this module is. Let’s drop them in favor of decode and encode from Data.Binary.

Generic is a way of converting a value into a very lightweight “syntax tree” that can be used by serializers(JSON, XML, Binary, etc.) and many other typeclasses to provide useful default definitions. The Haskell wiki has a good overview. binary uses these Generic instances to define serializers that work on just about anything.

We had to hand-write a Binary instance for HaskoinHash because Digest SHA1 from the Crypto.Hash library didn’t provide it or a Generic instance. That’s okay - digests are pretty much bytestrings anyways, so it was only a few lines.

Here’s how to use them to implement mineOn:

import Crypto.Hash(hashlazy)

mineOn :: TransactionPool -> Account -> Blockchain -> IO Blockchain
mineOn pendingTransactions minerAccount parent = do
  ts <- pendingTransactions
  let block = Block (V.fromList ts)
  let header = BlockHeader {
        _miner = minerAccount,
        _parentHash = hashlazy $ encode parent
        }
  return $ block :< Node header parent

And here’s how to test that this actually works:

testMining :: IO Blockchain
testMining = do
  let txnPool = return []
  chain <- makeGenesis
  chain <- mineOn txnPool 0 chain
  chain <- mineOn txnPool 0 chain
  chain <- mineOn txnPool 0 chain
  chain <- mineOn txnPool 0 chain
  chain <- mineOn txnPool 0 chain
  return chain

-- GHCI
> chain <- testMining
Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = efb3febc87c41fffb673a81ed14a6fb4f736df79}) (
  Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = 2accb557297850656de70bfc3e13ea92a4ddac29}) (
    Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = f51e30233feb41a228706d1357892d16af69c03b}) (
      Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = 0072e83ae8e9e22d5711fd832d350f5a279c1c12}) (
        Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = c259e771b237769cb6bce9a5ab734c576a6da3e1}) (
          Block [] :< Genesis)))))
> encode chain
"\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4\239\179\254\188\135\196\US\255\182s\168\RS\209Jo\180\247\&6\223y\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4*\204\181W)xPem\231\v\252>\DC3\234\146\164\221\172)\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4\245\RS0#?\235A\162(pm\DC3W\137-\SYN\175i\192;\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4\NULr\232:\232\233\226-W\DC1\253\131-5\SIZ'\156\FS\DC2\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4\194Y\231q\178\&7v\156\182\188\233\165\171sLWjm\163\225\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL"
> (decode $ encode chain) :: Blockchain
Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = efb3febc87c41fffb673a81ed14a6fb4f736df79}) (
  Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = 2accb557297850656de70bfc3e13ea92a4ddac29}) (
    Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = f51e30233feb41a228706d1357892d16af69c03b}) (
      Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = 0072e83ae8e9e22d5711fd832d350f5a279c1c12}) (
        Block [] :< Node (BlockHeader {_miner = Account 0, _parentHash = c259e771b237769cb6bce9a5ab734c576a6da3e1}) (
          Block [] :< Genesis)))))

If you’re testing serialization code at home, you may prefer to use the base16-bytestring library to hex-encode your ByteStrings:

> import qualified Data.ByteString.Base16.Lazy as BSL
> chain <- testMining
> BSL.encode $ encode chain
00000000000000000100000000000000000000000014efb3febc87c41fffb673a81ed14a6fb4f736df79000000000000000001000000000000000000000000142accb557297850656de70bfc3e13ea92a4ddac2900000000000000000100000000000000000000000014f51e30233feb41a228706d1357892d16af69c03b000000000000000001000000000000000000000000140072e83ae8e9e22d5711fd832d350f5a279c1c1200000000000000000100000000000000000000000014c259e771b237769cb6bce9a5ab734c576a6da3e1000000000000000000

Note that it will probably be a PITA for a C programmer trying to follow our serialization/deserialization code because the byte-wrangling is hidden in a lot of really generic code. If you want to produce a spec for people to use(always a good idea), you’ll probably want to hand-roll your serialization code so it’s self-documenting.

Mining

There are a couple mining-related problems with this so-called blockchain:

  1. People can have negative balances, so people can create a “scapegoat account” that they transfer unlimited amounts of money from.
  2. There is no transaction limiting, so someone could create a huge block and run our miners out of memory.
  3. We always mine empty blocks, so nobody can transfer money.
  4. There is no difficulty, so miners aren’t proving they’ve done any work.

I say that these are all mining problems because the code that miners run is going to deal with them.

#3 we’ll wait for Networking to solve. The rest we can do now.

To solve #1, we need account balances for anyone with a transaction that we’re mining a block for. Let’s go ahead and calculate all possible account balances:

blockReward = 1000

balances :: Blockchain -> M.Map Account Integer
balances bc =
  let txns = toList $ mconcat $ toList bc
      debits = map (\Transaction{ _from = acc, _amount = amount} -> (acc, -amount)) txns
      credits = map (\Transaction{ _to = acc, _amount = amount} -> (acc, amount)) txns
      minings = map (\h -> (_minerAccount h, blockReward)) $ headers bc
  in M.fromListWith (+) $ debits ++ credits ++ minings

And then once we have a parent blockchain, we know how to filter out the invalid transactions:

validTransactions :: Blockchain -> [Transaction] -> [Transaction]
validTransactions bc txns =
  let accounts = balances bc
      validTxn txn = case M.lookup (_from txn) accounts of
        Nothing -> False
        Just balance -> balance >= _amount txn
  in filter validTxn txns

To solve #2, I’ll let the current miner choose however many transactions he wants to put in his block. That means I’ll put a constant globalTransactionLimit = 1000 at the top that we’ll use when mining, but we won’t verify past blocks using it.

To solve #4, we need to add a nonce field to our BlockHeader that the miner can increment until he finds a good hash. We’ll make it an arbitrarily-large integer to avoid the scenario that no nonce values yield a sufficiently-difficult hash. And since we want to adjust our difficulty so blocks take roughly the same time to mine, we’ll store a timestamp in the header.

import Data.Time.Clock.POSIX

-- Add new fields
data BlockHeader = BlockHeader {
  _miner       :: Account,
  _parentHash  :: HaskoinHash,
  _nonce       :: Integer,
  _minedAt     :: POSIXTime
  } deriving (Eq, Show)

-- Add serializers for POSIXTime
instance Binary POSIXTime where
  get = fromInteger <$> (get :: Get Integer)
  put x = put $ (round x :: Integer)

globalTransactionLimit = 1000

mineOn :: TransactionPool -> Account -> Blockchain -> IO Blockchain
mineOn pendingTransactions minerAccount parent = do
  ts <- pendingTransactions
  ts <- return $ validTransactions parent ts
  ts <- return $ take globalTransactionLimit ts
  loop ts 0
  where
    validChain bc = difficulty bc < desiredDifficulty parent
    loop ts nonce = do
      now <- getPOSIXTime
      let header = BlockHeader {
            _miner = minerAccount,
            _parentHash = hashlazy $ encode parent,
            _nonce = nonce,
            _minedAt = now
            }
          block = Block (V.fromList ts)
          candidate = block :< Node header parent
      if validChain candidate
        then return candidate
        else loop ts (nonce+1)

difficulty :: Blockchain -> Integer
difficulty = undefined

desiredDifficulty :: BlockChain -> Integer
desiredDifficulty = undefined

We enter loop and keep incrementing the counter and fetching the time until we find a candidate with the right difficulty. The actual difficulty of a Blockchain is just its hash converted to an integer:

import Crypto.Number.Serialize(os2ip)

difficulty :: Blockchain -> Integer
difficulty bc = os2ip $ (hashlazy $ encode bc :: HaskoinHash)

How do we know what the right difficulty is? To start with, we’ll calculate the average time-between-blocks for the last 100 blocks:

numBlocksToCalculateDifficulty = 100

blockTimeAverage :: BlockChain -> NominalDiffTime
blockTimeAverage bc = average $ zipWith (-) times (tail times)
  where
    times = take numBlocksToCalculateDifficulty $ map _minedAt $ headers bc

headers :: BlockChain -> [BlockHeader]
headers Genesis = []
headers (_ :< Node x next) = x : headers next

average :: (Foldable f, Num a, Fractional a, Eq a) => f a -> a
average xs = sum xs / (if d == 0 then 1 else d) where d = fromIntegral $ length xs

Let’s have a target time of 10 seconds. Suppose blockTimeAverage bc gives 2 seconds, so we want blocks to take 5 times as long: adjustmentFactor = targetTime / blockTimeAverage bc = 5. Which means we want only 1/5 of the originally-accepted blocks to be accepted.

Since hashes are uniformly-distributed, 1/5 of the original hashes are less than originalDifficulty / 5, which will be our new difficulty. That’s what Bitcoin does: difficulty = oldDifficulty * (2 weeks) / (time for past 2015 blocks).

genesisBlockDifficulty = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
targetTime = 10

-- BEWARE: O(n * k), where k = numBlocksToCalculateDifficulty
desiredDifficulty :: Blockchain -> Integer
desiredDifficulty x = round $ loop x
  where
    loop (_ :< Genesis) = genesisBlockDifficulty
    loop x@(_ :< Node _ xs) = oldDifficulty / adjustmentFactor
      where
        oldDifficulty = loop xs
        adjustmentFactor = min 4.0 $ targetTime `safeDiv` blockTimeAverage x

Here are a few recent mining times using these calculations:

> exampleChain <- testMining
> exampleChain <- mineOn (return []) 0 exampleChain -- Repeat a bunch of times
> mapM_ print $ map blockTimeAverage $ chains exampleChain
6.61261425s
6.73220925s
7.97893375s
12.96145975s
10.923974s
9.59857375s
7.1819445s
2.2767425s
3.2307515s
7.215131s
15.98277575s

They hover around 10s because targetTime = 10.

Persistence

We’ll save the blockchain on disk, and give people 3 tools:

  • A tool to mine blocks and create a new chain
  • A tool to list account balances

The first tool is the miner:

{-# LANGUAGE NoImplicitPrelude, OverloadedStrings #-}

module Haskoin.Cli.Mine where

import Haskoin.Mining
import Haskoin.Serialization
import Haskoin.Types

import Protolude
import System.Environment
import Data.Binary
import qualified Data.ByteString.Lazy as BSL
import System.Directory
import Prelude(read)

defaultChainFile = "main.chain"
defaultAccount = "10"

main :: IO ()
main = do
  args <- getArgs
  let (filename, accountS) = case args of
        [] -> (defaultChainFile, defaultAccount)
        [filename] -> (filename, defaultAccount)
        [filename, account] -> (filename, account)
        _ -> panic "Usage: mine [filename] [account]"
      swapFile = filename ++ ".tmp"
      txnPool = return []
      account = Account $ read accountS
  forever $ do
    chain <- loadOrCreate filename makeGenesis :: IO Blockchain
    newChain <- mineOn txnPool account chain
    encodeFile swapFile newChain
    copyFile swapFile filename
    print "Block mined and saved!"

loadOrCreate :: Binary a => FilePath -> (IO a) -> IO a
loadOrCreate filename init = do
  exists <- doesFileExist filename
  if exists
    then decodeFile filename
    else do
      x <- init
      encodeFile filename x
      return x

The second one prints all of the account balances

{-# LANGUAGE NoImplicitPrelude, OverloadedStrings #-}

module Haskoin.Cli.ListBalances where

import Haskoin.Mining
import Haskoin.Serialization
import Haskoin.Types

import Protolude
import System.Environment
import Data.Binary
import qualified Data.Map as M
import qualified Data.ByteString.Lazy as BSL

defaultChainFile = "main.chain"

main :: IO ()
main = do
  args <- getArgs
  let (filename) = case args of
        [] -> (defaultChainFile)
        [filename] -> (filename)
        _ -> panic "Usage: list-balances [filename]"
  chain <- decodeFile filename :: IO Blockchain
  forM_ (M.toAscList $ balances chain) $ \(account, balance) -> do
    print (account, balance)

Here’s its output:

$ stack exec list-balances
(Account 10,23000)

So I’ve apparently mined 23 blocks just testing stack exec mine.

Conclusion

We developed a simple blockchain data structure. You can browse the repository on Github.

Future Haskoin-related articles may cover

  • Using networking and concurrency primitives to set up a peer-to-peer network.
  • Securing accounts in wallets, so other people can’t transfer money out of your account.
  • Building a ‘blockchain explorer’ website
  • GPU-accelerating our hashing
  • FPGA-accelerating our hashing

Future cryptocurrency-related articles may cover:

  • You may have heard about proof-of-work and proof-of-stake. What about proof-of-proof - where the miners compete to prove novel theorems in an approriate logic?
  • Adding a Turing-complete scripting language
  • Better ways to parse command line options
  • Building a Bitcoin exchange