[Main] [Projects] [Book Recs] [Talks] [Hire]

Writing Bitcask in Hare

TL;DR Read the source code for https://git.sr.ht/~blainsmith/hare-bitcask.

I have been writing modules for Hare for a while now. It has been a lot of fun to write and helping expand the open source ecosystem of modules for this language has made me feel very fulfilled.

Up until this point I've been mostly focused on small modules, but I really wanted to tackle something much more ambitious. I found that opportunity when I started my Fall 2023 semester at Georgia Tech and joined a Virtically Integrated Projects team with some undergraduate students. Since a lot of my career I have spent in gaming I thought this was perfect to be able to focus an entire semester using a modern language to make a game. I got permission from the professor to use Hare to make a MUD/text adventure accessible online. There will be a blog post about that towards the end of the semester, but this post and work covered below was the result of needing to save game state for players.

As work on the game itself progressed it was finally time to decide how to save game state. At first I considered just writing a file per player to disk based on their chosen player handle and updating that periodically. While that would be good enough for this style of game I wanted to try my hand at making a database from scratch since I have a lot more time this semester.

Since I didn't want to spend a lot of time designing a database itself I wanted to find a pre-existing database that would not only fit my needs for the game, but also make for an interesting experience writing it in Hare. After searching around research papers and articles I finally found the perfect candidate. The Bitcask backend for Riak KV. The design paper was even available cover some of the implementation details.

Bitcask API

I wanted the API to match as closely as possible to the original so what I ended up with was:

export type db = struct { ... };

export fn put(db: *db, key: str, val: []u8) (void | error) = { ... };

export fn get(db: *db, key: str) ([]u8 | void | error) = { ... };

export fn del(db: *db, key: str) (void | error) = { ... };

export fn keys(db: *db) ([]str | void | error) = { ... };

export fn merge(db: *db) (void | error) = { ... };

Most of the functions are self-explanitory. However, merge(...) is used to merge and compact all of the underlying data files on disk to clean up history of old values and deleted keys since the data files are append-only. Without this the files would grow indefinately. Some implementations of Bitcask in other languages might call merge in a background job when data files reach a configurable max size. I chose to omit this functionality for now and put the responsibility on the programmer using this library to call merge(...) at any interval they need to.

Data File Encoding/Decoding

All of the files written to disk following the original design document so in theory this implementation should be able to encode/decide those same files. I have not tested this and there might be some minor differences in field sizes, but that should be pretty straightforward to accomplish. The endian modules make it really easy to perform this work, but I do with there was something like binary.Read(...) and binary.Write(...) for Hare to be able to read/write binary data to a file handle instead of needing to use []u8 as intermediate data types. Maybe it will come in the future.

// Make an array of 2 bytes
let klen: [2]u8 = [0...];

// Encode the length of the key as a u16 into that array
endian::little.putu16(klen, len(key): u16);

// Write the array as a slice to the file handle
io::write(w, klen[0..])?;

Hash Maps

I make use of a few hash maps in order to have quick lookups of keys and data files. Unfortunately, Hare does not natively have hash maps as a built-in data type like other languages do. Fortunately, it was trivial to implment one for the key/values I needed.

// Meta contains information about the record and file id location.
type meta = struct { ... };

// keydir is a hash map of 64 buckets with a str key and a *meta value tuple.
type keydir = [64][](str, *meta);

fn keydir_get(kd: *keydir, key: str) (*meta | void) = {
  let bkt = fnv::string(key) % len(kd);
  for (let i = 0z; i < len(kd[bkt]); i += 1) {
    if (kd[bkt][i].0 == key) {
      return kd[bkt][i].1;
    };
  };
  return;
};

fn keydir_set(kd: *keydir, key: str, val: *meta) void = { ... };

fn keydir_del(kd: *keydir, key: str) void = { ... };

fn keydir_finish(kd: *keydir) void = { ... };

This gives me similar functionality of a Go map that would be map[string]*meta that I can use throughout Bitcask. The keydir_finish function is used to free allocated memory of all of the items in the map and then the actual map itself since I have to allocate and deallocate my own memory. The TYPE_finish(...) pattern you will see a lot in Hare to do exactly that kind of work. The basis of this implementation was inspired by one of Hare's author's, Drew DeVault, public gist he showed me.

What's next?

Now that I have a working database for my game I can move forward with the game itself, but I will certainly come back to Bitcask to add more features from the design paper. Most notably they would be:

Let me know what you think about it and if you have any questions or would like to send me patches feel free to reach out to me using my public inbox at https://lists.sr.ht/~blainsmith/public-inbox or email me directly.