humza/sml-projects

Fork 0

Go to file

Humza Shahid 87d999ba84 update readme with benchmarks

2024-04-03 18:39:29 +01:00

.DS_Store

format rope.sml using smlfmt

2024-03-13 06:33:51 +00:00

.gitignore

add examples of usage

2024-03-24 12:50:57 +00:00

automerge.sml

fix balancing errors

2023-11-13 06:05:36 +00:00

bench.mlb

add examples of usage

2024-03-24 12:50:57 +00:00

examples.mlb

add examples of usage

2024-03-24 12:50:57 +00:00

examples.sml

add examples of usage

2024-03-24 12:50:57 +00:00

LICENSE

license

2023-11-14 10:44:52 +00:00

README.md

update readme with benchmarks

2024-04-03 18:39:29 +01:00

rope.sml

add examples of usage

2024-03-24 12:50:57 +00:00

rust.sml

fix balancing errors

2023-11-13 06:05:36 +00:00

seph.sml

fix balancing errors

2023-11-13 06:05:36 +00:00

svelte.sml

fix balancing errors

2023-11-13 06:05:36 +00:00

tiny_rope.sml

add higher order functions to fold through Rope and TinyRope

2024-03-24 10:06:26 +00:00

utils.sml

change utils.sml to use camelCase (except for svelte_arr/rust_arr/seph_arr/automerge_arr, because I don't want to create a diff for large files

2024-03-14 23:35:13 +00:00

README.md

Brolib-sml

Introduction

Standard ML port of this rope implementation.

This particular rope uses the balancing scheme described in the Purely Functional 1-2 Brother Trees paper authored by Ralph Hinze. It tries to keep the number of nodes to a minimum by joining the strings in adjacent leaf nodes, if joining would not be too expensive.

Usage

The two files are rope.sml and tiny_rope.sml.

rope.sml contains a rope that tracks line metadata (which has a small performance and memory penalty). This is useful if you have line-based operations in mind.

tiny_rope.sml doesn't track line metadata, and is useful when line-queries aren't needed.

Except for the line-based operations appendLine and foldLines, all functions are the same between the two (aside from verifyLines which is just for testing purposes).

Examples of usage can be found in examples.sml.

Performance

These two ropes are both quite fast.

I compared the OCaml port with the other text data structures in OCaml, and it beat those handily when processing the datasets from here which just test insertion and deletion. It was also faster at performing substrings than the others.

I don't know other Standard ML libraries to compare it to, but with MLton, this rope implementation beats the fastest ropes in Rust at insertion and deletion quite easily, never going 1 ms in the slowest dataset.

I don't know how to explain this surprising result, but most of the credit must go to the MLton compiler. This result might also be explained by some entirely untested theories that may or may not be true:

MLton may have optimised the data set (which is pure Standard ML)
These benchmarks have an unfair advantage because the datasets are cache-friendly vectors/arrays.
These ropes are likely slower on queries (those Rust ropes use B-Trees which are more cache-friendly).
The other ropes may track more metadata (like UTF-8/16/32 indices) which would add take a little more time.

Here are some numbers in nanoseconds, running on a single core with a Raspberry Pi 5 that has 8 GB of RAM:

Dataset	rope.sml time	tiny_rope.sml time
automerge-paper	10,018 ns	9,726 ns
rustcode	79,896 ns	74,479 ns
sveltecomponent	280,654 ns	250,744 ns
seph-blog1	703,868 ns	589,501 ns

The relevant Rust rope libraries have benchmarks here for reference.