Arc Forum | Fast immutable data structures in JavaScript

Arc Forum

new | comments | leaders | submit

Fast immutable data structures in JavaScript

4 points by Pauan 3774 days ago | discuss

This is a follow-up to http://arclanguage.org/item?id=18936

For my job, I've spent the past few weeks working on fast immutable data structures in JavaScript.

You can find the results of my work here: https://github.com/Pauan/Immutable

Quick overview:

It's only 14.5 kb.

Dictionaries, Sorted Dictionaries, Sets, Sorted Sets, Lists, Queues, and Stacks. All immutable. All fast.

Dictionaries and Sets can have anything as keys, including mutable objects and immutable objects (Dict, Set, List, etc.)

You can also use SortedDict and SortedSet to define your own custom sorting.

Equality is well defined, and is based on egal[1]. What that means is that mutable objects are only equal if they are exactly the same object, while immutable objects are equal if they have the same value. That means this is true:

  equal(Dict({ foo: 1 }),
        Dict({ foo: 1 }));

Even though they are two different objects, they have the same keys/values, so they are treated as equal. This is the only sane default behavior for equality.

You can easily convert from JavaScript to Immutable and from Immutable to JavaScript:

  var obj  = { foo: 1 };
  var dict = fromJS(obj);
  var obj  = toJS(dict);

You can also use this to convert from one data type to another:

  var dict  = Dict({ foo: 1 });
  var list  = List(dict);
  var stack = Stack(list);

You get all the usual operations:

* has, get, set, remove, modify for Dict

* has, add, remove, union, intersect, disjoint, subtract for Set

* size, has, get, insert, remove, modify, slice, concat for List

* size, peek, push, pop, concat for Queue

* size, peek, push, pop, concat for Stack

Queue and Stack are implemented with ordinary conses. They have O(1) for most operations, and amortized O(1) for Queue.pop.

Dict and Set are implemented with plain old AVL trees, as shown by waterhouse. They have O(log2(n)) behavior for most operations.

List is the most complicated. First off, it uses an AVL tree, but rather than storing a value in each AVL node, it stores a JavaScript array of values.

I did it that way because it's faster to just make a copy of a JavaScript array, for small lists. I found through benchmarking that 125 is the optimal size, so once the JavaScript array reaches 126 values, it will split it into two AVL nodes, each with 63 values.

This means most operations are O(log2(n / 125) + 125). Since each AVL node contains an array with 125 values, there's far fewer AVL nodes, which also means the depth of the tree is lower. And the 125 operations to copy the array are extremely fast. So overall it's a substantial net gain.

Retrieval is O(log2(n / 125)). Looking up an element in a List with 1,000,000 elements only takes 13 operations.

In addition, because inserting at the end of a list is a very common operation, it uses reversed cons cells. A reversed cons cell is a cons cell that goes backwards, so that the head is the last element rather than the first:

  ; the list [1 2 3] would be represented as:
  (cons 3 (cons 2 (cons 1 nil)))

This means inserting at the end of the list is O(1). Once the cons cells reach the 125 limit, they get converted into a JavaScript array and inserted into the AVL tree. So the final result is amortized O(1).

Overall, these tricks make List very very fast. It's slower than mutable JavaScript arrays (obviously), but not by as much as you would think. Check out the "benchmarks" folder to see for yourself.

Also, there's two operations that are ridiculously fast: concat and slice.

Concatenating two Lists has O(125 + log2(n / 125) + log2(min(n / 125, m / 125))) worst-case time. To put that into perspective, concatenating two Lists which both have 1,000,000 elements in them takes only 152 operations, and 125 of those operations are extremely fast.

And slicing is fast too. It's O(log2(n / 125) + 249 + (2 * (m / 125))) worst-case time. If you have a List with 1,000,000 elements, and you want to get a slice that has 999,999 elements in it, it only takes 16,262 operations. Getting a slice with 1 element in it takes 138 operations, and 125 of those operations are extremely fast.

One improvement that I want to make is that the Set operations union, intersect, disjoint, and subtract are O(log2(n) * m). This isn't bad, per se, but I think it's possible to do better.

It's also possible to make Dict and Set faster, by using a non-AVL tree. But I believe the performance gains would be fairly minor, so that's low priority.

I'm currently in the process of changing Tab Organizer to exclusively use these immutable data structures. Overall I've been very pleased with the result: they're so fast that the vast majority of programs wouldn't even notice any speed difference between mutable/immutable.

This also opens up the possibility of using something like React[2] or Mercury[3] for the DOM.

Because I had thought immutable data structures were slow, I decided not to include them in Nulan.

But with this library, I have decided that Nulan will definitely use immutable data structures for everything.

I see very little reason to use mutability anymore, except for the absolute most performance-critical code.

* [1]: http://home.pipeline.com/~hbaker1/ObjectIdentity.html

* [2]: https://facebook.github.io/react/

* [3]: https://github.com/Raynos/mercury