Haskell Typeclasses vs. C++ Classes
Classes from Object-Oriented Programming languages such as C++ allow types to declare a pre-selected list of overridable functions.
Typeclasses in Haskell are a general-purpose way to write functions whose implementations change depending on the type of a parameter.
They’re used very differently in practice, but this article will:
- Describe how vtables implement C++ classes
- Show that Haskell constructs a very similar data structure in typeclass-using code
Classes
C++ has the concept of a “class” - a record together with functions that manipulate it. The class creates a new namespace for the functions, so that different classes can have functions with the same name. Each function also takes the record as an implicit argument.
Here is an example of a class:
and here is how it might translate into C:
A C++ compiler infers the namespace from a variable’s type, while a C programmer would have to remember which variant of f
he wanted to invoke.
C++ also supports subtyping: x
is a subtype of y
if every record field and function in y
is also in x
. The narrower type x
is responsible for declaring that it is a subtype of y
, and then it doesn’t have to repeat the fields and functions it derived from y
.
Notice that all_funcs_x
was invoked on an argument of type y
even though we earlier declared it to take an x
. Conceptually, a new function y_cast_x
is created that converts the x
into a y
by copying its record fields. The C++ compiler will silently insert calls to this function to convert y
into x
by dropping extra fields; but will not convert x
into y
, since it has no way of knowing how to repopulate them.
In practice, a C++ compiler will organize the fields so that y_cast_x
is free when used with references or pointers.
Notice that f
, g
, and h
must refer to x
’s namespace. They can’t refer to y
, because the functions have no way of knowing what the “original type” of _x
was. They are only given a copy without the extraneous fields. So if y
declared its own f
, g
, and h
functions in its own y
namespace, these would be ignored and the ones in x
’s namespace called instead.
C++ also allows you to declare that certain functions should be looked up in the most-specific namespace available.
This is implemented by creating a new implicit type x_vtable
that has references to f
, g
, and h
, adding an implicit field for it to x
, and then copying these references during each call to y_cast_x
:
There is also an implicit subtyping relation between x_vtable
and y_vtable
, and the compilers arrange the fields such that yvtable_cast_xvtable
is usually free in the same manner than y_cast_x
is usually free.
The compiler can be made to output its vtables.
Besides the three functions we intended, we also get an offset-to-top parameter at offset 0 and a typeinfo pointer at offset 8. The first is used for multiple-inheritance, and the second is a pointer to a type_info
object. The vtable usually points to offset 16
to simplify code generation and not offset 0
, so negative offsets are used when accessing the first two.
Why are functions not virtual by default? Two reasons:
- The data in a C-style struct can be copied with
memcpy
and reused in other programs(or later versions of the same program). vtables are absolute pointers, so a simple bytewise copy is unusable except by the original program. - They are slightly slower, because every implicit cast might cause an extra 64-bit pointer to be written, every virtual function call adds a layer of indirection, and - most importantly - virtual function calls prevent inlining.
You can see the indirections in the disassembly.
The vtable pointer is located at offset 0
in x
, so the first mov
copies it into rax
. Then, because the vtable points at the first function f
, the call
and jmp
instructions use offsets 0
, 0x8
, 0x10
to call f
, g
, and h
.
Here’s the version using non-virtual function calls:
Those three relocations means that non-virtual function calls are actually available to the linker, which patches their addresses in at link-time. Thus, non-virtual calls always go to the same target regardless of what inputs are passed in, while virtual calls look at the argument to decide which function to call.
Typeclasses
A Haskell typeclass is a collection of related functions shared by many types. Typeclass instances give rules for finding a specific implementation for any given type. The simplest rule is that a concrete type is given a specific implementation:
In all_funcs
above, a
is a type such that the compiler knows how to find implementations for f
, g
, and h
(the TC
typeclass).
In an object-oriented language like C++, the rules for locating an implementation might involve traversing the class hierarchy upwards until you find the first defined function with the same name. Depending on the language, this could be greatly complicated by multiple inheritance, roles/traits, or adding methods to the class at runtime. Typeclasses should generalize any situation where it is possible to locate the implementation at compile-time.
Foreign C code
To start with, we need the non-virtual case. In the previous section, we showed that the virtual/non-virtual distinction can be observed by looking for relocations in the resulting object file. So we want to write Haskell that emits relocations that are resolved at link-time.
In C, every function by default is exposed to the linker and available to be called from other modules; and you have to explicitly mark functions as static
to hide them from the linker. Haskell depends very heavily on inlining, so functions are hidden by default and you have to explicitly mark linker visibility using the Foreign Function Interface.
The full disassembly generated by these function calls is a bit long, but I can go over the section immediately wrapping the f x
call. I use ====>
to mark where a block transfers control to the next.
The compiler generates 3 relocations for this expression. The latter two are the most relevant:
f - 4
: This is the location of ourf
function.ghczmprim_GHCziTuple_Z - 3
: This is a library function that generates an empty tuple.
The important point here is that the location of f
is resolved by the linker, so it is a fixed implementation comparable to the earlier C code.
Clearly C has a more efficient FFI for calling into foreign C code than Haskell does. But Haskell programmers usually use the module system to import and export code, rather than depending entirely on the linker. You also can’t export typeclass constraints via the FFI, which I need in order to claim that they are similar to C++’s vtables.
Haskell Modules
Let’s look at that again without the FFI:
The constants make it easier to locate the implementation of all_funcs
in the disassembly:
GHC inlines the 3 pointer writes, and boxes up an empty tuple to return. Thus, the choice of function is determined at compile-time(and therefore link-time), which for our purposes agrees with the non-virtual function case from C.
That FFI imports and exports are matched by the linker, and that module imports are inlined probably won’t surprise anybody. But it’s good to confirm these easy cases as a baseline before making claims about typeclasses.
Typeclasses
C++ generates all of its vtables at compile-time. Haskell seems to create its typeclass dictionaries on the heap at the first function with a typeclass constraint, and then passes a pointer to child calls that use it. This indicates to me two possible performance implications:
- Move the typeclass constraint as high up in your program’s dependency graph as possible, to avoid repeatedly creating typeclass dictionaries.
- Use 6 or fewer functions in a given typeclass, which is the number of 64-bit integer registers available in the x64 System V ABI. Functions 7 or higher require twice as many
mov
instructions.
Here’s a big typeclass to demonstrate these:
Looking at the f4 x
call the instruction I’ve marked with *****
locates its typeclass dictionary entry at [rbx+0x1f]
, with 0x1f = sizeof(pointer) * 4 - 1
. This demonstrates that most typeclass dictionaries in the above example are passed via pointer.
To see that typeclass dictionaries are constructed at the entry point, notice this TC_entry
function in GHC’s intermediate language C--
:
The caller into code with a typeclass constraint needs to provide implementations of every function in the typeclass. The typeclass dictionary is passed via the System V ABI to this entry function, which frees up the registers by allocating heap memory and copying them to it.
Haskell keeps a “heap pointer” for efficient heap allocations, just like a stack pointer is used. If the program is out of heap space, it calls into the garbage collector. Otherwise, it’s a few simple pointer copies followed by a jump into the actual typeclass-using code.
Looking at the disassembly, r12
has the heap pointer. One weird point is that heap references are often “off by 1”. The *****
entries above and below are not aligned to an 8-byte boundary. It doesn’t matter, because -0x3f + 0x1f = 0x20
, which is the 8-byte aligned offset for the 4th typeclass entry f4
.
The typeclass dictionary is very similar to the C++ vtable data structure. It has no offset-to-top member, but I suspect the TC_con_info
serves a similar purpose to the type_info
field.
I would guess that GHC generates these data structures at runtime to prevent a combinatorial explosion of pregenerated typeclass dictionaries when using more advanced typeclass rules.
Specialized Typeclasses
GHC can sometimes specialize a generic typeclass-using function. This causes the typeclass dictionary to be inlined, which removes the need for a separate entry constructor and allocation.
all_tc_x
below is a specialization of all_tc
above.
The specialized version inlines the definitions for the typeclass dictionary, which enables GHC to reduce the entire output to an empty tuple allocation. That’s just 4 instructions, ignoring the FFI wrapper code.
You can write specialized versions like above, or use the SPECIALIZE
pragma to generate them. Using a separate specialized function may make it easier to see that specialization is happening, however.
Conclusion
Both abstract classes and typeclasses carry along an implicit array of function pointers. C++ generates this array at compile-time, while Haskell generates it at runtime. Once generated, it’s passed along as a simple pointer that generic code can index into to find an overridable implementation for a function.
Future programming language implementation articles may cover:
- Multi-parameter typeclasses, typeclass hierarchies, and typeclasses over recursive higher-order types
- Dependent types
- JIT compilation
- Garbage collectors
- Type-checking