2024-03-13 06:33:51 +00:00
2024-03-24 12:50:57 +00:00
2023-11-13 06:05:36 +00:00
2024-03-24 12:50:57 +00:00
2024-03-24 12:50:57 +00:00
2024-03-24 12:50:57 +00:00
2023-11-14 10:44:52 +00:00
2024-04-03 18:39:29 +01:00
2024-03-24 12:50:57 +00:00
2023-11-13 06:05:36 +00:00
2023-11-13 06:05:36 +00:00
2023-11-13 06:05:36 +00:00

Brolib-sml

Introduction

Standard ML port of this rope implementation.

This particular rope uses the balancing scheme described in the Purely Functional 1-2 Brother Trees paper authored by Ralph Hinze. It tries to keep the number of nodes to a minimum by joining the strings in adjacent leaf nodes, if joining would not be too expensive.

Usage

The two files are rope.sml and tiny_rope.sml.

rope.sml contains a rope that tracks line metadata (which has a small performance and memory penalty). This is useful if you have line-based operations in mind.

tiny_rope.sml doesn't track line metadata, and is useful when line-queries aren't needed.

Except for the line-based operations appendLine and foldLines, all functions are the same between the two (aside from verifyLines which is just for testing purposes).

Examples of usage can be found in examples.sml.

Performance

These two ropes are both quite fast.

I compared the OCaml port with the other text data structures in OCaml, and it beat those handily when processing the datasets from here which just test insertion and deletion. It was also faster at performing substrings than the others.

I don't know other Standard ML libraries to compare it to, but with MLton, this rope implementation beats the fastest ropes in Rust at insertion and deletion quite easily, never going 1 ms in the slowest dataset.

I don't know how to explain this surprising result, but most of the credit must go to the MLton compiler. This result might also be explained by some entirely untested theories that may or may not be true:

  • MLton may have optimised the data set (which is pure Standard ML)
  • These benchmarks have an unfair advantage because the datasets are cache-friendly vectors/arrays.
  • These ropes are likely slower on queries (those Rust ropes use B-Trees which are more cache-friendly).
  • The other ropes may track more metadata (like UTF-8/16/32 indices) which would add take a little more time.

Here are some numbers in nanoseconds, running on a single core with a Raspberry Pi 5 that has 8 GB of RAM:

Dataset rope.sml time tiny_rope.sml time
automerge-paper 10,018 ns 9,726 ns
rustcode 79,896 ns 74,479 ns
sveltecomponent 280,654 ns 250,744 ns
seph-blog1 703,868 ns 589,501 ns

The relevant Rust rope libraries have benchmarks here for reference.

Description
No description provided
Readme 88 MiB
Languages
Standard ML 96%
C 4%