Liam White
Instant, private search-as-you-type using JS and a binary blob

I want to make an autocomplete system for Philomena that enables private search-as-you-type. That is, it never reveals to the server what the user query is. It's obviously possible to do this by sending a list of all tag names to the client, but doing this by sending a JSON array over the wire has some distinct disadvantages:

We can do better. Let's refine our requirements a little bit:

For prefix matching, the obvious choice is a trie. Unfortunately, that doesn't really work out well for us because it requires scanning results in ways that end up creating more garbage than the next solution, which is a list of tags sorted by their name.

With a list of tags sorted by their name, we can implement the prefix match by binary searching for the first tag name "less than" the prefix and collecting results until encountering the first tag "greater than" the prefix (by numeric codepoint).

Because using JSON for this would require deserializing everything into JS objects, we will ship it as a binary blob. This means the overall structure needs to be designed a bit more carefully to accommodate this:

  1. String table
  2. Array of fixed-size tag structs, sorted by name, with a pointer into the string table
  3. Footer indicating the location of the start of the array

32-bit pointers are used for simplicity - 24-bit pointers would work as well, but are more difficult to emit and parse.

The file format looks like this, in pseudocode:

struct tag_name {
  uint8_t name_length;
  uint8_t name[];
};

struct tag {
  uint32_t tag_name_location;
  uint32_t num_uses;
};

struct footer {
  uint32_t tags_start;
  uint32_t num_tags;
};

struct autocomplete_file {
  struct tag_name names[];
  struct tag tags[];
  struct footer footer;
};

And it works!

Instant autocompletion