Writing a Formally-Verified Porn Browser in Coq and Haskell
It’s uncommon to use formal verification when developing software. Most people are unfamiliar with the tools and techniques, or assume they’re only for specialized use.
This article will show how to write a simple image browser with:
- Core data structures and operations formally verified using the Coq theorem prover.
- A Haskell web server that handles HTTP requests
- An HTML/CSS/Javascript frontend
Definitions
We’re going to make an image browser. It’s an unspoken truth that most large image databases people have are actually porn collections, so we’ll be honest with ourselves and call it a porn browser.
We’ll have Coq-verified data structures and operations, which some Haskell will use later on to respond to HTTP requests. Specifically, we need the ability to:
- Add new images to the database
- Get a sorted list of images by date added, or category
- Delete an image
- Add a category to an image
- Remove a category from an image
- Add categories
Here are the major types:
FMapAVL
and FSetAVL
are finite maps and sets implemented using AVL trees. Our database is just a map of image ids to images, a map of category ids to sets of image ids, and an increasing id column incremented after every new addition.
It looks like the Coq developers are currently refactoring the equivalent of Haskell’s Ord
typeclass, so I use Backport_OT N_as_OT
to generate the older interface from the newer one.
This should be the Haskell equivalent:
We’re eventually going to generate Haskell from our formally-verified Coq implementation. What are the generated Haskell data types?
Which prints
which are essentially simple tree types hidden behind a few aliases. We also get a bunch of functions for manipulating the trees, which I left out. Our Haskell code will be using our own custom functions that manipulate these types, which will be formally-verified.
So what are these functions anyways? I’ll spec out their types using the “Axiom” command:
This is similar to using undefined
in Haskell for placeholders. In fact, that’s pretty much what the Extraction
command generates:
In Haskell, our next step would be to jump right into the implementation. In Coq, we can think for a moment about what invariants should hold:
- A new database has no images.
count_empty_db
is the invariant here; num_images
is a helper function we’ll define later.
- Whenever we add an image to the database, the number of images should increase by 1:
- When we delete an image, either the image was in the database and the count of images decreases by 1, or it wasn’t and the count stays the same:
- When we tag or untag an image, the number of images stays the same:
I can think of others, but these 4 should be enough to demonstrate the technique.
Implementation
Here are the implementations, in order:
newDb
create_image
You can test it with the Compute
command:
which prints:
= {|
images := {|
M.this := M.Raw.Node (M.Raw.Leaf Image) 0%N
{| id := 0; filename := "testing"; timestamp := 0 |}
(M.Raw.Leaf Image) 1%Z;
M.is_bst := M.Raw.Proofs.add_bst 0%N
{| id := 0; filename := "testing"; timestamp := 0 |}
(M.Raw.Proofs.empty_bst Image) |};
indices := {|
M.this := M.Raw.Leaf Index;
M.is_bst := M.Raw.Proofs.empty_bst S.MSet.t_ |};
next_id := 1%N |}
: ImageDb
It looks scarier than it is: It’s showing us the guts of the binary search trees it makes, while Haskell uses fromList
in its Show
instance to hide that:
find_categories
In Haskell, let bindings and top-level definitions are recursive by default. In Coq, we explicitly mark functions as Fixpoint
to allow recursion.
delete_image
tag_image
untag_image
Theorems
We’re almost ready to prove some theorems. Let’s implement those helper functions we deferred:
The first theorem we can do just by calculation:
A lot of these proofs consist of a sequence of imperative mutations to a “proof context”. These mutations are called “tactics”. If you use a tool like Proof General, you can step through each tactic to see how it changes the proof context. I’ll add annotations above each tactic, showing the proof context.
We start off with the statement we’re trying to prove as our “goal”. A theorem is proved when there are no remaining goals. We reduce the lefthand side using compute
. The goal becomes 0 = 0
, which is a tautology that the taut
tactic can complete.
Not too difficult, right? The follow-up ones were involved enough that I’ll have to leave their detailed proofs on Github.
Now, you might be curious: “Coq lets us extract code to Haskell. What exactly does a theorem convert to?” Here’s your answer:
You can’t do it - Haskell’s type system isn’t powerful enough to handle theorem proving. Roughly, things that can be extracted into living, breathing computer code lie in Coq’s Set
; while the world of theorems and proofs is Coq’s Prop
. And the classic tripup for beginners is “bool vs. Prop”
A bool
is decidably true
or false
. A bool
when extracted is eventually going to live on the system heap. A bool
has a bitstring representation.
A Prop
is a statement in an intuitionist logic. In classical logic, you know for any proposition P
, that P | ~P
. You can actually add that axiom to Coq, but it’s fundamentally not constructive: Just because you’ve excluded the possibility of ~P
doesn’t mean that you can physically instantiate a bitstring representing a specific P
in memory somewhere.
Inductive types like bool
are closed. There are only 2 constructors true
and false
, and if you’ve excluded true
you know it’s false
.
Propositions are open world: You can’t conclude a positive from a bunch of negatives.
Let’s try the next one:
This theorem is actually false! All of our functions should be safe, but anyone can call mkImageDb
with a next_id
that overlaps an existing image. Then, calling create_image
would overwrite it, and num_images
wouldn’t change.
Se we need to specify that this theorem only holds for InternallyConsistent
databases, and that all of our operations preserve this property.
The only consistency check we’ll have for now is that next_id
should not exist in the images
map:
Now we’ll prove that newDb
is internally consistent, using empty_in_iff.
We import all of the propositions like empty_in_iff
relating to finite maps using the Facts
functor, applied to the FMapAVL
implementation we instantiated earlier.
That was easy because InternallyConsistent
is a property of the indices in the images
map. In newDb
, the map is empty so the statement is vacuously true.
I’ll pick one more interesting one before moving on.
Here’s the idea of the proof in words:
MF.add_in_iff
says that “If you add an element to a map, any element in the new map was either already in the map or had the same key as the one we added.”. If it was already in the map, then the InternallyConsistent
hypothesis already applies to it. Otherwise, the key we added was exactly next_id
, and the definition of InternallyConsistent
gives next_id db < succ (next_id db)
after we increment it in create_image
.
Full proofs for the theorems are in the Github repository.
Haskell Server
Coq allows us to map inductive types directly to Haskell. There are a few built-in modules for strings, and I’ll include examples for list
and option
also:
From Coq Require Import
extraction.ExtrHaskellString
extraction.ExtrHaskellNatInteger
extraction.ExtrHaskellNatNum
Extract Inductive list => "[]" ["[]" "(:)"].
Extract Inductive option => "Prelude.Maybe" ["Prelude.Just" "Prelude.Nothing"].
Extraction Language Haskell.
Extraction "Database" Database.
Once we set the extraction options and run the Extraction
command, we’ll end up with a Database.hs
file that can be imported by our Haskell code.
The routing part of the web server is here:
And here’s a fragment of Javascript for interacting with this API:
In Haskell, a let
binding creates a group of mutually-recursive functions. We can do that in Javascript with a Y combinator, or we can unroll it 4-5 times(whatever the maximum stack depth we want):
Conclusion
Hopefully this example shows that there’s nothing really stopping anyone from using Coq in their Haskell programs today. You can see the full repository on Github.
Future Coq-related articles may cover:
- An overview of the common tactics and techniques used in proofs
- Defining custom LTAC commands to automate proofs
- Writing a more robust Coq-Haskell compatibility layer.
- Proving mathematical theorems using Coq
- Implementing compilers or cryptographic primitives in Coq
- Using GHCJS on the resulting Haskell to get formally-verified Javascript
Future formal verification articles may cover:
- Using Liquid Haskell to add lightweight proofs to your existing Haskell code
- Using Idris to develop e.g. a bitcoin exchange
- Using randomized testing as a cheap substitute for a formal proof.
If you’re interested in trying Coq, I recommend starting with Benjamin Pierce’s book Software Foundations.