Add 'string-tries-sml/' from commit 'd056e08ce768e014ab409c7f63e8fd0adfc1dff2'

git-subtree-dir: string-tries-sml
git-subtree-mainline: dba78da7ec
git-subtree-split: d056e08ce7
This commit is contained in:
2026-04-24 00:34:11 +01:00
29 changed files with 934828 additions and 0 deletions

5
string-tries-sml/LICENSE Normal file
View File

@@ -0,0 +1,5 @@
Copyright (C) 2024 by Humza Shahid <humzasaur@gmail.com>
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

150
string-tries-sml/README.md Normal file
View File

@@ -0,0 +1,150 @@
# string-trie
This repository implements a set over strings in Standard ML using a trie/prefix tree.
The signature provided is:
```
signature STRING_SET =
sig
signature STRING_SET =
sig
(* the type of tries *)
type t
(* the empty trie *)
val empty: t
(* StringSet.isEmpty trie
* returns true if the trie is empty *)
val isEmpty: t -> bool
(* StringSet.fromString "hello world"
* creates a trie containing just a string *)
val fromString: string -> t
(* StringSet.exists ("hello world", trie)
* returns true if the key was inserted into the trie *)
val exists: string * t -> bool
(* StringSet.insert ("myNewString", trie)
* inserts a new string into the trie, returning a new trie *)
val insert: string * t -> t
(* StringSet.remove ("stringToRemove", trie)
* removes the key from the trie, returning a new trie *)
val remove: string * t -> t
(* StringSet.getPrefixList ("myPrefix", trie)
* returns a list of all keys matching the specified prefix *)
val getPrefixList: string * t -> string list
(* StringSet.toList trie
* returns a list containing all keys in the trie *)
val toList: t -> string list
(* StringSet.fromList ["hello", "world"]
* returns a trie containing all keys in the string list *)
val fromList: string list -> t
(* StringSet.foldl (fn (key, acc) => String.size key + acc) 0 trie
* folds a value through the trie, from lowest to highest. *)
val foldl: (string * 'b -> 'b) -> 'b -> t -> 'b
(* StringSet.foldlWithPrefix (fn (key, acc) => String.size key + acc) 0 trie "myPrefix"
* folds a value through a subset of the trie containing the specified prefix,
* from lowest to highest. *)
val foldlWithPrefix: (string * 'b -> 'b) -> 'b -> t -> string -> 'b
(* StringSet.foldr (fn (key, acc) => String.size key + acc) 0 trie
* folds a value through the trie, from highest to lowest. *)
val foldr: (string * 'b -> 'b) -> 'b -> t -> 'b
(* StringSet.foldrWithPrefix (fn (key, acc) => String.size key + acc) 0 trie "myPrefix"
* folds a value through a subset of the trie containing the specified prefix,
* from highest to lowest. *)
val foldrWithPrefix: (string * 'b -> 'b) -> 'b -> t -> string -> 'b
end
```
The reason for implementing a new trie specialised to strings rather than using Chris Okasaki's IntMap data structure is to enable prefix searching, where it is possible to get a list of all keys matching a certain prefix.
# Benchmarks
There are a few benchmarks in the `bench` folder, comparing three operations (insertion, lookup and retrieval of keys matching a prefix).
The two data structures compared include:
- An implementation of 1-2 Brother Trees described by Ralf Hinze
- The compressed string tries implemented in this repository, not based on an existing paper
## Insertion
- `bench/insert-string-set`
- 247.5 milliseconds
- `bench/insert-bro-tree`
- 183.9 milliseconds
The `insertion` benchmarks inserting every word from `bench/words.sml` into the respective data structure, in order.
StringSet is 1.3x slower than BroTree here.
## Exists
- `bench/build-exists-string-set`
- 48 milliseconds
- `bench/build-exists-bro-tree`
- 16 milliseconds
These benchmarks involve:
- Inserting every word from `bench/words.sml` to build a data structure with keys to look for
- Then testing to see if every key from `bench/words.sml` exists in the data structure
The reported times only measure the time taken for the second bullet point; the first bullet point was already measured in the `insertion` benchmark.
StringSet is 3x slower than BroTree here.
## Get prefix list
- `bench/build-get-prefix-string-set`
- 310,000 nanoseconds
- `bench/build-get-prefix-bro-tree`
- 3,477,000 nanoseconds
These benchmarks involve:
- Inserting every word from `bench/words.sml` to build a data structure with keys to look for
- Creating a list containing every word in the data structure that starts with "a"
As with the `exists` benchmark, only the time for the second bullet point is measured.
StringSet is 11x faster than BroTree here.
This result shouldn't be a surprise.
A binary tree needs to fold over every node in the tree, checking if the keys in node starts with the prefix. That takes O(n) time.
A trie is smarter about this. It only needs to travel to a specific prefix and get the subtrie for that prefix. Then one can fold over the subtrie rather than the whole trie, which takes much less time.
## Benchmarks conclusion
The benchmarks have a clear similarity to those in [Chris Okasaki's paper on Fast Mergeable Integer Maps](https://ia600204.us.archive.org/0/items/djoyner-papers/SHA256E-s118221--efee082ebebce89bebdbc041ab9bf8cbd2bcb91e48809a204318e1a89bf15435.pdf).
- The insertion and lookup/exists operations are both faster on balanced binary trees
- The trie-specific operation (in this repository: search by prefix, in the paper: merge tries together) is much faster for tries than for binary trees.
Like the paper says, it's probably worth using a trie only if you care about using the trie-specific operation a lot.
The description of Data.IntMap for Haskell seems to disagree with the first bullet point, stating:
> my benchmarks show that it is also (much) faster on insertions and deletions when compared to a generic size-balanced map implementation (see Data.Map).
This statement surprises me. It's not the case that IntMap was faster for insertion and lookup in the aforementioned paper, and an IntMap implementation I coded in F# was also slower for these operations.
I would be interested in whether it is true for Haskell that these operations were faster. Lazy evaluation might help somehow, or the Haskell implementation might use tricks not described in the paper.
# Credits
- The tests in `tests/string-set-tests.sml` were ported from [kpol's Trie data structure in C3](https://github.com/kpol/trie).
- The words.txt dataset in `bench/words.txt` is from [this repository](https://github.com/dwyl/english-words).

View File

@@ -0,0 +1,103 @@
structure BroTree =
struct
(* implementation of 1-2 brother tree ported from:
* https://www.cl.cam.ac.uk/research/hvg/Isabelle/dist/library/HOL/HOL-Data_Structures/document.pdf
* *)
datatype bro =
N0
| N1 of bro
| N2 of bro * string * bro
| L2 of string
| N3 of bro * string * bro * string * bro
val empty = N0
fun n1 bro =
case bro of
L2 str => N2 (N0, str, N0)
| N3 (t1, a1, t2, a2, t3) => N2 (N2 (t1, a1, t2), a2, N1 t3)
| t => N1 t
fun n2Left (t1, str, t2) =
case (t1, str, t2) of
(L2 a1, a2, t) => N3 (N0, a2, N0, a2, t)
| (N3 (t1, a1, t2, a2, t3), a3, N1 t4) =>
N2 (N2 (t1, a1, t2), a2, N2 (t2, a3, t4))
| (N3 (t1, a1, t2, a2, t3), a3, t4) =>
N3 (N2 (t1, a1, t2), a2, N1 t3, a3, t4)
| (t1, a1, t2) => N2 (t1, a1, t2)
fun n2Right (t1, str, t2) =
case (t1, str, t2) of
(t1, a1, L2 a2) => N3 (t1, a2, N0, a2, N0)
| (N1 t1, a1, N3 (t2, a2, t3, a3, t4)) =>
N2 (N2 (t1, a1, t2), a2, N2 (t3, a3, t4))
| (t1, a1, N3 (t2, a2, t3, a3, t4)) =>
N3 (t1, a2, N1 t2, a2, N2 (t3, a3, t4))
| (t1, a1, t2) => N2 (t1, a1, t2)
fun ins (str, tree) =
case tree of
N0 => L2 str
| N1 t => n1 (ins (str, t))
| N2 (l, a, r) =>
if str < a then n2Left (ins (str, l), a, r)
else if str > a then n2Right (l, a, ins (str, r))
else N2 (l, a, r)
| _ => raise Match (*impossible case*)
fun insRoot tree =
case tree of
L2 str => N2 (N0, str, N0)
| N3 (t1, a1, t2, a2, t3) => N2 (N2 (t1, a2, t2), a2, N1 t3)
| tree => tree
fun insert (str, tree) =
insRoot (ins (str, tree))
fun exists (str, tree) =
case tree of
N0 => false
| N1 t => exists (str, t)
| N2 (l, k, r) =>
if str < k then exists (str, l)
else if str > k then exists (str, r)
else true
| _ => raise Match
fun foldr (f, acc, tree: bro) =
case tree of
N0 => acc
| N1 t => foldr (f, acc, t)
| N2 (l, k, r) =>
let
val acc = foldr (f, acc, r)
val acc = f (k, acc)
in
foldr (f, acc, l)
end
| _ => raise Match
fun helpStartsWith (pos, prefix, key) =
if pos = String.size prefix then
true
else
let
val prefixChr = String.sub (prefix, pos)
val keyChr = String.sub (key, pos)
in
if keyChr = prefixChr then helpStartsWith (pos + 1, prefix, key)
else false
end
fun startsWith (prefix, key) =
if String.size prefix > String.size key then false
else helpStartsWith (0, prefix, key)
fun getPrefixList (prefix, tree) =
foldr
( (fn (k, acc) => if startsWith (prefix, k) then k :: acc else acc)
, []
, tree
)
end

Binary file not shown.

View File

@@ -0,0 +1,10 @@
$(SML_LIB)/basis/basis.mlb
ann
"allowVectorExps true"
in
words.sml
end
bro-tree.sml
build-exists-bro-tree.sml

View File

@@ -0,0 +1,33 @@
structure BuildExistsBroTree =
struct
fun helpExists (pos, tree, acc) =
if pos = Vector.length WordsList.words then
acc
else
let
val word = Vector.sub (WordsList.words, pos)
val newAcc = BroTree.exists (word, tree)
val acc = newAcc orelse acc
in
helpExists (pos + 1, tree, acc)
end
fun exists (tree) = helpExists (0, tree, true)
fun main () =
let
val endTree = Vector.foldl BroTree.insert BroTree.empty WordsList.words
val startTime = Time.now ()
val wordsExist = exists endTree
val finishTime = Time.now ()
val searchDuration = Time.- (finishTime, startTime)
val searchDuration = Time.toMilliseconds searchDuration
val searchDuration = LargeInt.toString searchDuration ^ "\n"
in
print searchDuration
end
end
val _ = BuildExistsBroTree.main ()

Binary file not shown.

View File

@@ -0,0 +1,10 @@
$(SML_LIB)/basis/basis.mlb
ann
"allowVectorExps true"
in
words.sml
end
../src/string-set.sml
build-exists-string-set.sml

View File

@@ -0,0 +1,34 @@
structure BuildExistsStringSet =
struct
fun helpExists (pos, trie, acc) =
if pos = Vector.length WordsList.words then
acc
else
let
val word = Vector.sub (WordsList.words, pos)
val newAcc = StringSet.exists (word, trie)
val acc = newAcc orelse acc
in
helpExists (pos + 1, trie, acc)
end
fun exists trie = helpExists (0, trie, true)
fun main () =
let
val endTrie =
Vector.foldl StringSet.insert StringSet.empty WordsList.words
val startTime = Time.now ()
val wordsExist = exists endTrie
val finishTime = Time.now ()
val searchDuration = Time.- (finishTime, startTime)
val searchDuration = Time.toMilliseconds searchDuration
val searchDuration = LargeInt.toString searchDuration ^ "\n"
in
print searchDuration
end
end
val _ = BuildExistsStringSet.main ()

Binary file not shown.

View File

@@ -0,0 +1,10 @@
$(SML_LIB)/basis/basis.mlb
ann
"allowVectorExps true"
in
words.sml
end
bro-tree.sml
build-get-prefix-bro-tree.sml

View File

@@ -0,0 +1,20 @@
structure BuildGetPrefixBroTree =
struct
fun main () =
let
val endTrie =
Vector.foldl BroTree.insert BroTree.empty WordsList.words
val startTime = Time.now ()
val lst = BroTree.getPrefixList ("a", endTrie)
val finishTime = Time.now ()
val searchDuration = Time.- (finishTime, startTime)
val searchDuration = Time.toNanoseconds searchDuration
val searchDuration = LargeInt.toString searchDuration ^ " ns\n"
in
print searchDuration
end
end
val _ = BuildGetPrefixBroTree.main ()

Binary file not shown.

View File

@@ -0,0 +1,10 @@
$(SML_LIB)/basis/basis.mlb
ann
"allowVectorExps true"
in
words.sml
end
../src/string-set.sml
build-get-prefix-string-set.sml

View File

@@ -0,0 +1,20 @@
structure BuildGetPrefixStringSet =
struct
fun main () =
let
val endTrie =
Vector.foldl StringSet.insert StringSet.empty WordsList.words
val startTime = Time.now ()
val lst = StringSet.getPrefixList ("a", endTrie)
val finishTime = Time.now ()
val searchDuration = Time.- (finishTime, startTime)
val searchDuration = Time.toNanoseconds searchDuration
val searchDuration = LargeInt.toString searchDuration ^ " ns\n"
in
print searchDuration
end
end
val _ = BuildGetPrefixStringSet.main ()

BIN
string-tries-sml/bench/conv-words Executable file

Binary file not shown.

View File

@@ -0,0 +1,59 @@
(* generate a words.sml file with a vector of strings,
* from a line-delimited words.txt file *)
val inIo = TextIO.openIn "words.txt"
val outIO = TextIO.openOut "words.sml"
fun consWordChrs (wordChrs, acc) =
case wordChrs of
[] => acc
| _ => (String.concat wordChrs) :: acc
fun helpTokeniseLine (pos, wordChrs, line, acc) =
if pos < 0 then
consWordChrs (wordChrs, acc)
else
let
val chr = String.sub (line, pos)
(* using Char.toString is necessary because it escapes double-quotes.
* Without escaping, there's a chance we will produce an invalid .sml file
* *)
val sChr = Char.toString chr
in
if Char.isPrint chr andalso not (Char.isSpace chr) then
helpTokeniseLine (pos - 1, sChr :: wordChrs, line, acc)
else
helpTokeniseLine (pos - 1, [], line, consWordChrs (wordChrs, acc))
end
fun tokeniseLine (line, acc) =
helpTokeniseLine (String.size line - 1, [], line, acc)
fun readLines (inIo, acc) =
case TextIO.inputLine inIo of
SOME line => readLines (inIo, tokeniseLine (line, acc))
| NONE => List.rev acc
fun writeLines (outIO, lst) =
case lst of
[] => ()
| word :: tl =>
let
val isLast = tl = []
val word = if isLast then "\"" ^ word ^ "\"" else "\"" ^ word ^ "\",\n"
val _ = TextIO.output (outIO, word)
in
writeLines (outIO, tl)
end
fun main () =
let
val lst = readLines (inIo, [])
val _ = TextIO.output
(outIO, "structure WordsList = \nstruct \n val words = #[\n")
val _ = writeLines (outIO, lst)
val _ = TextIO.output (outIO, "]\n end")
in
()
end
val _ = main ()

Binary file not shown.

View File

@@ -0,0 +1,10 @@
$(SML_LIB)/basis/basis.mlb
ann
"allowVectorExps true"
in
words.sml
end
bro-tree.sml
insert-bro-tree.sml

View File

@@ -0,0 +1,11 @@
structure InsertBroTree =
struct
fun main () =
let
val endTree = Vector.foldl BroTree.insert BroTree.empty WordsList.words
in
()
end
end
val _ = InsertBroTree.main ()

Binary file not shown.

View File

@@ -0,0 +1,10 @@
$(SML_LIB)/basis/basis.mlb
ann
"allowVectorExps true"
in
words.sml
end
../src/string-set.sml
insert-string-set.sml

View File

@@ -0,0 +1,11 @@
structure InsertStringSet =
struct
fun main () =
let
val endTrie = Vector.foldl StringSet.insert StringSet.empty WordsList.words
in
()
end
end
val _ = InsertStringSet.main ()

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,805 @@
signature STRING_SET =
sig
(* the type of tries *)
type t
(* the empty trie *)
val empty: t
(* StringSet.isEmpty trie
* returns true if the trie is empty *)
val isEmpty: t -> bool
(* StringSet.fromString "hello world"
* creates a trie containing just a string *)
val fromString: string -> t
(* StringSet.exists ("hello world", trie)
* returns true if the key was inserted into the trie *)
val exists: string * t -> bool
(* StringSet.insert ("myNewString", trie)
* inserts a new string into the trie, returning a new trie *)
val insert: string * t -> t
(* StringSet.remove ("stringToRemove", trie)
* removes the key from the trie, returning a new trie *)
val remove: string * t -> t
(* StringSet.getPrefixList ("myPrefix", trie)
* returns a list of all keys matching the specified prefix *)
val getPrefixList: string * t -> string list
(* StringSet.toList trie
* returns a list containing all keys in the trie *)
val toList: t -> string list
(* StringSet.fromList ["hello", "world"]
* returns a trie containing all keys in the string list *)
val fromList: string list -> t
(* StringSet.foldl (fn (key, acc) => String.size key + acc) 0 trie
* folds a value through the trie, from lowest to highest. *)
val foldl: (string * 'b -> 'b) -> 'b -> t -> 'b
(* StringSet.foldlWithPrefix (fn (key, acc) => String.size key + acc) 0 trie "myPrefix"
* folds a value through a subset of the trie containing the specified prefix,
* from lowest to highest. *)
val foldlWithPrefix: (string * 'b -> 'b) -> 'b -> t -> string -> 'b
(* StringSet.foldr (fn (key, acc) => String.size key + acc) 0 trie
* folds a value through the trie, from highest to lowest. *)
val foldr: (string * 'b -> 'b) -> 'b -> t -> 'b
(* StringSet.foldrWithPrefix (fn (key, acc) => String.size key + acc) 0 trie "myPrefix"
* folds a value through a subset of the trie containing the specified prefix,
* from highest to lowest. *)
val foldrWithPrefix: (string * 'b -> 'b) -> 'b -> t -> string -> 'b
end
structure StringSet: STRING_SET =
struct
datatype t =
CHILDREN of {keys: string vector, children: t vector}
| FOUND_WITH_CHILDREN of {keys: string vector, children: t vector}
| FOUND
val empty =
CHILDREN {keys = Vector.fromList [], children = Vector.fromList []}
fun isEmpty trie =
case trie of
CHILDREN {keys, ...} => Vector.length keys = 0
| _ => false
fun fromString str =
if String.size str > 0 then
CHILDREN
{keys = Vector.fromList [str], children = Vector.fromList [FOUND]}
else
empty
fun helpBinSearch (findChr, keyPos, children, low, high) =
if high >= low then
let
val mid = low + ((high - low) div 2)
val midStr = Vector.sub (children, mid)
val midChr = String.sub (midStr, keyPos)
in
if midChr = findChr then
SOME mid
else if midChr < findChr then
helpBinSearch (findChr, keyPos, children, mid + 1, high)
else
helpBinSearch (findChr, keyPos, children, low, mid - 1)
end
else
NONE
fun findBinSearch (findChr, keyPos, children) =
if Vector.length children > 0 then
helpBinSearch (findChr, keyPos, children, 0, Vector.length children - 1)
else
NONE
datatype search_string_match =
NO_SEARCH_MATCH
| FULL_SEARCH_MATCH
| SEARCH_KEY_CONTAINS_TRIE_KEY
| TRIE_KEY_CONTAINS_SEARCH_KEY
fun searchKeyMatch (searchKey, trieKey, keyPos) =
if
keyPos < String.size searchKey
then
if keyPos < String.size trieKey then
let
val searchChr = String.sub (searchKey, keyPos)
val trieChr = String.sub (trieKey, keyPos)
in
if searchChr = trieChr then
searchKeyMatch (searchKey, trieKey, keyPos + 1)
else
NO_SEARCH_MATCH
end
else
SEARCH_KEY_CONTAINS_TRIE_KEY
else if
keyPos = String.size searchKey
then
if keyPos < String.size trieKey then TRIE_KEY_CONTAINS_SEARCH_KEY
else if keyPos = String.size trieKey then FULL_SEARCH_MATCH
else SEARCH_KEY_CONTAINS_TRIE_KEY
else (* implicit: keyPos > String.size searchKey *) if
keyPos <= String.size trieKey
then
TRIE_KEY_CONTAINS_SEARCH_KEY
else
NO_SEARCH_MATCH
fun isFoundNode node =
case node of
FOUND => true
| FOUND_WITH_CHILDREN _ => true
| CHILDREN _ => false
fun checkChildrenExists (searchKey, keyPos, keys, children) =
let
val findChr = String.sub (searchKey, keyPos)
in
(case findBinSearch (findChr, keyPos, keys) of
SOME idx =>
let
val trieKey = Vector.sub (keys, idx)
in
(case searchKeyMatch (searchKey, trieKey, keyPos + 1) of
NO_SEARCH_MATCH => false
| FULL_SEARCH_MATCH =>
let val trieChild = Vector.sub (children, idx)
in isFoundNode trieChild
end
| SEARCH_KEY_CONTAINS_TRIE_KEY =>
let val trieChild = Vector.sub (children, idx)
in recurseExists (searchKey, String.size trieKey, trieChild)
end
| TRIE_KEY_CONTAINS_SEARCH_KEY => false)
end
| NONE => false)
end
and recurseExists (searchKey, keyPos, trie) =
case trie of
CHILDREN {keys, children} =>
checkChildrenExists (searchKey, keyPos, keys, children)
| FOUND_WITH_CHILDREN {keys, children} =>
checkChildrenExists (searchKey, keyPos, keys, children)
| FOUND =>
(*
* This case should only occur if we recurse in a node
* when there is a partial, but not full, string match.
* This is because of the isFoundNode helper function
* which is called by the parent.
* In other words, only the parent node returns true,
* by checking if the child is a found node and has
* a full string match.
*)
false
fun exists (searchKey, trie) =
if isEmpty trie orelse String.size searchKey = 0 then false
else recurseExists (searchKey, 0, trie)
datatype prefix_result =
PREFIX_FOUND of string * t
| NO_PREFIX_FOUND
| PREFIX_MATCHES_WHOLE_TRIE
fun helpGetPrefixSubtrieChildren (prefix, keyPos, keys, children, trie) =
let
val findChr = String.sub (prefix, keyPos)
in
case findBinSearch (findChr, keyPos, keys) of
SOME idx =>
let
val trieKey = Vector.sub (keys, idx)
in
(case searchKeyMatch (prefix, trieKey, keyPos + 1) of
NO_SEARCH_MATCH => NO_PREFIX_FOUND
| SEARCH_KEY_CONTAINS_TRIE_KEY =>
let
val trieChild = Vector.sub (children, idx)
in
helpGetPrefixSubtrie (prefix, String.size trieKey, trieChild)
end
| FULL_SEARCH_MATCH =>
let val node = Vector.sub (children, idx)
in PREFIX_FOUND (prefix, node)
end
| TRIE_KEY_CONTAINS_SEARCH_KEY =>
let val node = Vector.sub (children, idx)
in PREFIX_FOUND (trieKey, node)
end)
end
| NONE => NO_PREFIX_FOUND
end
and helpGetPrefixSubtrie (prefix, keyPos, trie) =
case trie of
CHILDREN {keys, children} =>
helpGetPrefixSubtrieChildren (prefix, keyPos, keys, children, trie)
| FOUND_WITH_CHILDREN {keys, children} =>
helpGetPrefixSubtrieChildren (prefix, keyPos, keys, children, trie)
| FOUND => NO_PREFIX_FOUND
fun getPrefixSubtrie (prefix, trie) =
if String.size prefix > 0 then
if isEmpty trie then NO_PREFIX_FOUND
else helpGetPrefixSubtrie (prefix, 0, trie)
else
PREFIX_MATCHES_WHOLE_TRIE
fun helpFoldlTrieVector (f, pos, keys, children, acc) =
if pos = Vector.length children then
acc
else
let
val curChild = Vector.sub (children, pos)
val acc = helpFoldl (f, curChild, acc)
val acc =
if isFoundNode curChild then f (Vector.sub (keys, pos), acc) else acc
in
helpFoldlTrieVector (f, pos + 1, keys, children, acc)
end
and helpFoldl (f, trie, acc) =
case trie of
CHILDREN {keys, children} =>
helpFoldlTrieVector (f, 0, keys, children, acc)
| FOUND_WITH_CHILDREN {keys, children} =>
helpFoldlTrieVector (f, 0, keys, children, acc)
| FOUND => acc
fun foldl f initial trie = helpFoldl (f, trie, initial)
fun foldlWithPrefix f initial trie prefix =
case getPrefixSubtrie (prefix, trie) of
PREFIX_FOUND (prefix, subtrie) =>
let val acc = helpFoldl (f, subtrie, initial)
in if isFoundNode subtrie then f (prefix, acc) else acc
end
| NO_PREFIX_FOUND => initial
| PREFIX_MATCHES_WHOLE_TRIE => helpFoldl (f, trie, initial)
fun helpFoldrTrieVector (f, pos, keys, children, acc) =
if pos < 0 then
acc
else
let
val curChild = Vector.sub (children, pos)
val acc = helpFoldr (f, curChild, acc)
val acc =
if isFoundNode curChild then f (Vector.sub (keys, pos), acc) else acc
in
helpFoldrTrieVector (f, pos - 1, keys, children, acc)
end
and helpFoldr (f, trie, acc) =
case trie of
CHILDREN {keys, children} =>
helpFoldrTrieVector (f, Vector.length keys - 1, keys, children, acc)
| FOUND_WITH_CHILDREN {keys, children} =>
helpFoldrTrieVector (f, Vector.length keys - 1, keys, children, acc)
| FOUND => acc
fun foldr f initial trie = helpFoldr (f, trie, initial)
fun foldrWithPrefix f initial trie prefix =
case getPrefixSubtrie (prefix, trie) of
PREFIX_FOUND (prefix, subtrie) =>
let val acc = helpFoldr (f, subtrie, initial)
in if isFoundNode subtrie then f (prefix, acc) else acc
end
| NO_PREFIX_FOUND => initial
| PREFIX_MATCHES_WHOLE_TRIE => helpFoldr (f, trie, initial)
(* recurseHelpGetPrefixList and helpGetPrefixList are basically manually coded
* foldr functions over the trie, applying the accumuluator to every found
* node, from right to left.
* No need to recode it as a usage of a generic foldr function though,
* because lower dispatch cost this way. *)
fun recurseHelpGetPrefixList (pos, keys, children, acc) =
if pos < 0 then
acc
else
let
val curChild = Vector.sub (children, pos)
val acc = helpGetPrefixList (curChild, acc)
val acc =
if isFoundNode curChild then Vector.sub (keys, pos) :: acc else acc
in
recurseHelpGetPrefixList (pos - 1, keys, children, acc)
end
and helpGetPrefixList (trie, acc) =
case trie of
CHILDREN {keys, children} =>
recurseHelpGetPrefixList (Vector.length keys - 1, keys, children, acc)
| FOUND_WITH_CHILDREN {keys, children} =>
recurseHelpGetPrefixList (Vector.length keys - 1, keys, children, acc)
| FOUND => acc
fun getPrefixList (prefix, trie) =
case getPrefixSubtrie (prefix, trie) of
PREFIX_FOUND (prefix, subtrie) =>
let val lst = helpGetPrefixList (subtrie, [])
in if isFoundNode subtrie then prefix :: lst else lst
end
| NO_PREFIX_FOUND => []
| PREFIX_MATCHES_WHOLE_TRIE => helpGetPrefixList (trie, [])
fun toList trie = helpGetPrefixList (trie, [])
datatype insert_string_match =
NO_INSERT_MATCH
(* may need to split string if difference found but prefix matched *)
| DIFFERENCE_FOUND_AT of int
(* may not need to do anything if insert key matched,
* as this is a set where only strings are stored.
* however, if this is a non-found node, then I need to change
* the tag/case. *)
| FULL_INSERT_MATCH
(* if insert key contains trie key, may need to recurse down node *)
| INSERT_KEY_CONTAINS_TRIE_KEY
(* if trie key contains insert key, need to split node *)
| TRIE_KEY_CONTAINS_INSERT_KEY
fun insertKeyMatch (insertKey, trieKey, keyPos) =
if
keyPos < String.size insertKey
then
if keyPos < String.size trieKey then
let
val searchChr = String.sub (insertKey, keyPos)
val trieChr = String.sub (trieKey, keyPos)
in
if searchChr = trieChr then
insertKeyMatch (insertKey, trieKey, keyPos + 1)
else
DIFFERENCE_FOUND_AT keyPos
end
else
INSERT_KEY_CONTAINS_TRIE_KEY
else if
keyPos = String.size insertKey
then
if keyPos < String.size trieKey then TRIE_KEY_CONTAINS_INSERT_KEY
else if keyPos = String.size trieKey then FULL_INSERT_MATCH
else INSERT_KEY_CONTAINS_TRIE_KEY
else (* implicit: keyPos > String.size insertKey *) if
keyPos <= String.size trieKey
then
TRIE_KEY_CONTAINS_INSERT_KEY
else
NO_INSERT_MATCH
datatype insert_bin_search_result =
INSERT_NEW_CHILD of int
| FOUND_INSERT_POS of int
| APPEND_NEW_CHILD
fun linearSearch (findChr, keyPos, idx, children) =
if idx = Vector.length children then
APPEND_NEW_CHILD
else
let
val curStr = Vector.sub (children, idx)
val curChr = String.sub (curStr, keyPos)
in
if curChr > findChr then INSERT_NEW_CHILD idx
else linearSearch (findChr, keyPos, idx + 1, children)
end
fun helpInsertBinSearch (findChr, keyPos, children, low, high) =
let
val mid = low + ((high - low) div 2)
in
if high >= low then
let
val midStr = Vector.sub (children, mid)
val midChr = String.sub (midStr, keyPos)
in
if midChr = findChr then
FOUND_INSERT_POS mid
else if midChr < findChr then
helpInsertBinSearch (findChr, keyPos, children, mid + 1, high)
else
helpInsertBinSearch (findChr, keyPos, children, low, mid - 1)
end
else
linearSearch (findChr, keyPos, if mid >= 0 then mid else 0, children)
end
fun insertBinSearch (findChr, keyPos, children) =
helpInsertBinSearch
(findChr, keyPos, children, 0, Vector.length children - 1)
fun insertDifferenceFoundAt
( insKey
, insIdx
, splitTrieKeyStart
, trieChild
, childKeys
, childChildren
, parentKeys
, parentChildren
, parentConstructor
) =
let
(* child node should always have CHILDREN case,
* because we are splitting prefix into two,
* when neither trieKey nor insKey match the prefix. *)
val childNode = CHILDREN {keys = childKeys, children = childChildren}
val keys =
Vector.mapi
(fn (idx, key) => if idx <> insIdx then key else splitTrieKeyStart)
parentKeys
val children =
Vector.mapi (fn (idx, elt) => if idx <> insIdx then elt else childNode)
parentChildren
in
parentConstructor {keys = keys, children = children}
end
fun insertNewChild (keys, insIdx, insKey, children, constructor) =
let
val newLength = Vector.length keys + 1
val newKeys = Vector.tabulate (newLength, fn idx =>
if idx < insIdx then Vector.sub (keys, idx)
else if idx > insIdx then Vector.sub (keys, idx - 1)
else insKey)
val newChildren = Vector.tabulate (newLength, fn idx =>
if idx < insIdx then Vector.sub (children, idx)
else if idx > insIdx then Vector.sub (children, idx - 1)
else FOUND)
in
constructor {keys = newKeys, children = newChildren}
end
fun appendNewChild (keys, insKey, children, constructor) =
let
val newKeys = Vector.concat [keys, Vector.fromList [insKey]]
val newChildren = Vector.concat [children, Vector.fromList [FOUND]]
in
constructor {keys = newKeys, children = newChildren}
end
fun foundInsertPos (keys, children, keyPos, insKey, insIdx, trie, constructor) =
let
val trieKey = Vector.sub (keys, insIdx)
in
(case insertKeyMatch (insKey, trieKey, keyPos + 1) of
(* may need to split string if difference found but prefix matched *)
DIFFERENCE_FOUND_AT diffIdx =>
let
val splitTrieKeyStart = String.substring (trieKey, 0, diffIdx)
val trieChild = Vector.sub (children, insIdx)
in
if String.sub (trieKey, diffIdx) > String.sub (insKey, diffIdx) then
(* place insKey before trieKey *)
let
val childKeys = Vector.fromList [insKey, trieKey]
val childChildren = Vector.fromList [FOUND, trieChild]
in
insertDifferenceFoundAt
( insKey
, insIdx
, splitTrieKeyStart
, trieChild
, childKeys
, childChildren
, keys
, children
, constructor
)
end
else
(* place trieKey before insKey *)
let
val childKeys = Vector.fromList [trieKey, insKey]
val childChildren = Vector.fromList [trieChild, FOUND]
in
insertDifferenceFoundAt
( insKey
, insIdx
, splitTrieKeyStart
, trieChild
, childKeys
, childChildren
, keys
, children
, constructor
)
end
end
(* may not need to do anything if insert key matched,
* as this is a set where only strings are stored.
* however, if this is a non-found node, then I need to change
* the tag/case. *)
| FULL_INSERT_MATCH =>
(* in case of a full match,
* need to turn child into FOUND_WITH_CHILDREN
* or FOUND case, if it not already either
*)
(case Vector.sub (children, insIdx) of
CHILDREN {keys = childKeys, children = childChildren} =>
let
val newTrieChild =
FOUND_WITH_CHILDREN
{keys = childKeys, children = childChildren}
val newParentChildren =
Vector.mapi
(fn (childIdx, elt) =>
if insIdx <> childIdx then elt else newTrieChild)
children
in
constructor {keys = keys, children = newParentChildren}
end
| _ => trie)
(* if insert key contains trie key, need to recurse down node *)
| INSERT_KEY_CONTAINS_TRIE_KEY =>
let
val trieChild = Vector.sub (children, insIdx)
val newTrieChild =
helpInsert (insKey, String.size trieKey, trieChild)
val newChildren =
Vector.mapi
(fn (idx, elt) => if idx <> insIdx then elt else newTrieChild)
children
in
constructor {keys = keys, children = newChildren}
end
(* if trie key contains insert key, need to split node *)
| TRIE_KEY_CONTAINS_INSERT_KEY =>
let
val trieChild = Vector.sub (children, insIdx)
val newKeys =
Vector.mapi
(fn (idx, key) => if idx <> insIdx then key else insKey) keys
(* newTrieChild should always be FOUND_WITH_CHILDREN,
* because previous part matches insert key,
* and esecond part matches trieKey *)
val newTrieChild = FOUND_WITH_CHILDREN
{ keys = Vector.fromList [trieKey]
, children = Vector.fromList [trieChild]
}
val newChildren =
Vector.mapi
(fn (idx, elt) => if idx <> insIdx then elt else newTrieChild)
children
in
constructor {keys = newKeys, children = newChildren}
end
| NO_INSERT_MATCH => trie)
end
and helpInsertChildren
(insKey, keyPos, keys, children, trie, parentConstructor) =
let
val findChr = String.sub (insKey, keyPos)
in
(case insertBinSearch (findChr, keyPos, keys) of
INSERT_NEW_CHILD insIdx =>
insertNewChild (keys, insIdx, insKey, children, parentConstructor)
| FOUND_INSERT_POS insIdx =>
foundInsertPos
(keys, children, keyPos, insKey, insIdx, trie, parentConstructor)
| APPEND_NEW_CHILD =>
appendNewChild (keys, insKey, children, parentConstructor))
end
and helpInsert (insKey, keyPos, trie) : t =
case trie of
FOUND =>
if keyPos = String.size insKey then
FOUND
else
FOUND_WITH_CHILDREN
{ keys = Vector.fromList [insKey]
, children = Vector.fromList [FOUND]
}
| CHILDREN {keys, children} =>
helpInsertChildren (insKey, keyPos, keys, children, trie, CHILDREN)
| FOUND_WITH_CHILDREN {keys, children} =>
helpInsertChildren
(insKey, keyPos, keys, children, trie, FOUND_WITH_CHILDREN)
fun insert (insKey, trie) =
if String.size insKey > 0 then
if isEmpty trie then fromString insKey else helpInsert (insKey, 0, trie)
else
trie
fun helpAddList (str, acc) = insert (str, acc)
fun addList (lst, trie) =
List.foldl helpAddList trie lst
fun fromList (hd :: tl) =
let val trie = fromString hd
in addList (tl, trie)
end
| fromList ([]) = empty
datatype remove_result = UNCHANGED | MADE_EMPTY | CHANGED of t
(* should be called when there is a FULL_SEARCH_MATCH
* and child is a terminal FOUND node *)
fun removeWhenChildIsMadeEmpty
(idx, keys, children, isFoundWithChildren, parentConstructor) =
(* if child was made empty, then:
* - if the parent only has 1 child, it should be MADE_EMPTY too
* - otherwise, just remove the key and child at this idx from parent
* *)
if Vector.length keys > 1 then
if idx > 0 then
let
val newKeys = Vector.tabulate (Vector.length keys - 1, fn keyIdx =>
Vector.sub (keys, if keyIdx >= idx then keyIdx - 1 else keyIdx))
val newChildren =
Vector.tabulate (Vector.length keys - 1, fn childIdx =>
Vector.sub
(children, if childIdx >= idx then childIdx - 1 else childIdx))
val newNode =
parentConstructor {keys = newKeys, children = newChildren}
in
CHANGED newNode
end
else
(* if idx = 0, then have to slice first element off from vector *)
let
val keySlice = VectorSlice.slice (keys, 1, SOME
(Vector.length keys - 1))
val newKeys = VectorSlice.vector keySlice
val childrenSlice = VectorSlice.slice (children, 1, SOME
(Vector.length children - 1))
val newChildren = VectorSlice.vector childrenSlice
val newNode =
parentConstructor {keys = newKeys, children = newChildren}
in
CHANGED newNode
end
else (* if the caller was from the FOUND_WITH_CHILDREN case,
* then, instead of deleting this node entirely,
* we only delete this node's children, and mark this node
* as found.
* However, in general case where this is a CHILDREN node
* whose key was not inserted into the trie,
* we should fully delete this node as well. *) if isFoundWithChildren then
CHANGED FOUND
else
MADE_EMPTY
(* should be called when searchKeyMatch returns FULL_MATCH
*in helpRemove function *)
fun removeWhenFullMatch
(idx, keys, children, isFoundWithChildren, parentConstructor) =
(* matching over the child at this idx *)
case Vector.sub (children, idx) of
(* CHILDREN is a not-found case, so have to leave parent unchanged
* as there is no key to delete. *)
CHILDREN _ => UNCHANGED
(* FOUND_WITH_CHILDREN is a found case containing links to other nodes
* so we just need to change the tag from FOUND_WITH_CHILDREN to CHILDREN *)
| FOUND_WITH_CHILDREN {keys = childKeys, children = childChildren} =>
let
val newChild = CHILDREN {keys = childKeys, children = childChildren}
val newParentChildren =
Vector.mapi
(fn (mapIdx, elt) => if mapIdx <> idx then elt else newChild)
children
val newParent =
parentConstructor {keys = keys, children = newParentChildren}
in
CHANGED newParent
end
| FOUND =>
removeWhenChildIsMadeEmpty
(idx, keys, children, isFoundWithChildren, parentConstructor)
fun removeWhenSearchKeyContainsTrieKey
(childResult, idx, keys, children, isFoundWithChildren, parentConstructor) =
case childResult of
(* if result is UNCHANGED, let UNCHANGED bubble to the top.
* At the top, can return same trie given as input as there was no
* change. *)
UNCHANGED => UNCHANGED
(* if child was changed, allocate new vector where child at this idx
* is updated with newChild, and use it in parent node. *)
| CHANGED newChild =>
let
val newChildren =
Vector.mapi
(fn (childIdx, elt) => if idx <> childIdx then elt else newChild)
children
val newNode = parentConstructor {keys = keys, children = newChildren}
in
CHANGED newNode
end
| MADE_EMPTY =>
removeWhenChildIsMadeEmpty
(idx, keys, children, isFoundWithChildren, parentConstructor)
fun helpRemoveChildren
(removeKey, keyPos, keys, children, isFoundWithChildren, parentConstructor) =
let
val findChr = String.sub (removeKey, keyPos)
in
(case findBinSearch (findChr, keyPos, keys) of
SOME idx =>
let
val trieKey = Vector.sub (keys, idx)
in
(case searchKeyMatch (removeKey, trieKey, keyPos + 1) of
(* no search match means nothing to delete *)
NO_SEARCH_MATCH => UNCHANGED
| FULL_SEARCH_MATCH =>
removeWhenFullMatch
( idx
, keys
, children
, isFoundWithChildren
, parentConstructor
)
| SEARCH_KEY_CONTAINS_TRIE_KEY =>
removeWhenSearchKeyContainsTrieKey
( helpRemove
( removeKey
, String.size trieKey
, Vector.sub (children, idx)
)
, idx
, keys
, children
, isFoundWithChildren
, parentConstructor
)
| TRIE_KEY_CONTAINS_SEARCH_KEY => UNCHANGED)
end
| NONE => UNCHANGED)
end
and helpRemove (removeKey, keyPos, trie) =
case trie of
CHILDREN {keys, children} =>
helpRemoveChildren (removeKey, keyPos, keys, children, false, CHILDREN)
| FOUND_WITH_CHILDREN {keys, children} =>
helpRemoveChildren
(removeKey, keyPos, keys, children, true, FOUND_WITH_CHILDREN)
| FOUND =>
(*
* This case should only occur if we recurse in a node
* when there is a partial, but not full, string match.
* Since there was no full string match,
* key doesn't exist in trie and so we must leave it unchanged.
*)
UNCHANGED
fun remove (removeKey, trie) =
if String.size removeKey = 0 orelse isEmpty trie then
trie
else
case helpRemove (removeKey, 0, trie) of
CHANGED trie => trie
| MADE_EMPTY => empty
| UNCHANGED => trie
end

View File

@@ -0,0 +1,202 @@
(* An attempt at implementing terneary search trees,
* except that each node contains a full string instead of a char *)
structure TernaryStringSet =
struct
datatype t =
NODE of {left: t, key: string, follow: t, right: t}
(* follow is if insString contains nodeString *)
| FOUND_NODE of {left: t, key: string, follow: t, right: t}
| LEAF of string
| EMPTY
val empty = EMPTY
fun isEmpty t = t = EMPTY
fun fromString str =
if String.size str > 0 then LEAF str else EMPTY
val nodeStringContainsInsString = ~1
val insStringContainsNodeString = ~2
fun getBreakPos (pos, insString, nodeString) =
if pos = String.size insString andalso pos = String.size nodeString then
pos
else if pos = String.size insString then
(* if nodeString contains insString, return ~1 *)
nodeStringContainsInsString
else if pos = String.size nodeString then
(* if insString contains nodeString, return ~2 *)
insStringContainsNodeString
else
let
val insChr = String.sub (insString, pos)
val nodeChr = String.sub (nodeString, pos)
in
if insChr = nodeChr then
(* continue *)
getBreakPos (pos + 1, insString, nodeString)
else
(* return break position *)
pos
end
fun helpIns (pos, insString, t) =
case t of
NODE {left, key = nodeString, follow, right} =>
let
val breakPos = getBreakPos (pos, insString, nodeString)
in
if breakPos = String.size insString then
(* change node tag *)
FOUND_NODE
{left = left, key = nodeString, follow = follow, right = right}
else if breakPos = nodeStringContainsInsString then
(* add new node at current position,
* making current node a child of the new node *)
FOUND_NODE
{left = EMPTY, key = insString, follow = t, right = EMPTY}
else if breakPos = insStringContainsNodeString then
(* follow *)
let
val follow = helpIns (breakPos, insString, follow)
in
NODE
{left = left, key = nodeString, follow = follow, right = right}
end
else
(* we have a difference: do we want to go left or right? *)
let
val insChr = String.sub (insString, breakPos)
val nodeChr = String.sub (nodeString, breakPos)
in
if insChr < nodeChr then
let
val left = helpIns (breakPos, insString, left)
in
NODE
{ left = left
, key = nodeString
, follow = follow
, right = right
}
end
else
let
val right = helpIns (breakPos, insString, right)
in
NODE
{ left = left
, key = nodeString
, follow = follow
, right = right
}
end
end
end
| FOUND_NODE {left, key = nodeString, follow, right} =>
let
val breakPos = getBreakPos (pos, insString, nodeString)
in
if breakPos = String.size insString then
(* return original tree as nothing to do, since key already exists *)
t
else if breakPos = nodeStringContainsInsString then
(* add new node at current position,
* making current node a child of the new node *)
FOUND_NODE
{left = EMPTY, key = insString, follow = t, right = EMPTY}
else if breakPos = insStringContainsNodeString then
(* follow *)
let
val follow = helpIns (breakPos, insString, follow)
in
FOUND_NODE
{left = left, key = nodeString, follow = follow, right = right}
end
else
(* we have a difference: do we want to go left or right? *)
let
val insChr = String.sub (insString, breakPos)
val nodeChr = String.sub (nodeString, breakPos)
in
if insChr < nodeChr then
let
val left = helpIns (breakPos, insString, left)
in
FOUND_NODE
{ left = left
, key = nodeString
, follow = follow
, right = right
}
end
else
let
val right = helpIns (breakPos, insString, right)
in
FOUND_NODE
{ left = left
, key = nodeString
, follow = follow
, right = right
}
end
end
end
| EMPTY => LEAF insString
| LEAF nodeString =>
let
val breakPos = getBreakPos (pos, insString, nodeString)
in
if breakPos = String.size insString then
(* no change as user tried to insert string that already exists *)
t
else if breakPos = nodeStringContainsInsString then
(* add new node at current position,
* making current node a child of the new node *)
FOUND_NODE
{left = EMPTY, key = insString, follow = t, right = EMPTY}
else if breakPos = insStringContainsNodeString then
(* transform this node to a FOUND_NODE,
* and add a new LEAF at follow position *)
let
val newLeaf = LEAF insString
in
FOUND_NODE
{ key = nodeString
, follow = newLeaf
, left = EMPTY
, right = EMPTY
}
end
else
(* we have a difference: break string and add a new NODE.
* The way we insert has implications for searching.
* Search should descend down on all three children
* if searchKey contains nodeKey.
* *)
let
val breakString = String.substring (insString, 0, breakPos)
val insChr = String.sub (insString, breakPos)
val nodeChr = String.sub (nodeString, breakPos)
in
if insChr < nodeChr then
(* insString on left *)
NODE
{ left = LEAF insString
, right = t
, key = breakString
, follow = EMPTY
}
else
(* insString on right *)
NODE
{ left = t
, right = LEAF insString
, key = breakString
, follow = EMPTY
}
end
end
end

View File

@@ -0,0 +1,4 @@
$(SML_LIB)/basis/basis.mlb
../src/string-set.sml
string-set-tests.sml

View File

@@ -0,0 +1,209 @@
structure StringSetTests =
struct
fun assertTrue (isTrue, msg) =
if isTrue then ()
else
(print (msg ^ "\n"); raise Empty)
fun assertFalse (isFalse, msg) = assertTrue (not isFalse, msg)
(* below tests ported from https://github.com/kpol/trie/blob/master/src/KTrie.Tests/TrieTests.cs *)
fun testExists () =
let
val trie = StringSet.fromList ["abc", "abde"]
val _ = assertTrue (StringSet.exists ("abc", trie), "abc should exist")
val _ = assertTrue (StringSet.exists ("abde", trie), "abde should exist")
val _ = assertFalse (StringSet.exists ("a", trie), "a should not exist")
val _ = assertFalse (StringSet.exists ("ab", trie), "ab should not exist")
val _ = assertFalse (StringSet.exists ("abcd", trie), "abcd should not exist")
val _ = assertFalse (StringSet.exists ("abce", trie), "abce should not exist")
val _ = assertFalse (StringSet.exists ("x", trie), "x should not exist")
in
print "StringSet.exists passed all tests\n"
end
fun testGetPrefixList () =
let
val trie = StringSet.fromList ["abc", "abde", "abx", "abxx"]
val aMatches = StringSet.getPrefixList ("a", trie) = ["abc", "abde", "abx", "abxx"]
val _ = assertTrue (aMatches, "a matches")
val xMatches = StringSet.getPrefixList ("x", trie) = []
val _ = assertTrue (xMatches, "x matches")
val abMatches = StringSet.getPrefixList ("ab", trie) = ["abc", "abde", "abx", "abxx"]
val _ = assertTrue (abMatches, "ab matches")
val abcMatches = StringSet.getPrefixList ("abc", trie) = ["abc"]
val _ = assertTrue (abcMatches, "abc matches")
val abccMatches = StringSet.getPrefixList ("abcc", trie) = []
val _ = assertTrue (abccMatches, "abcc matches")
val abxMatches = StringSet.getPrefixList ("abx", trie) = ["abx", "abxx"]
val _ = assertTrue (abxMatches, "abx matches")
in
print "StringSet.getPrefixList passed all tests\n"
end
fun remove1 () =
let
val trie = StringSet.fromList ["a", "ab", "abc"]
val _ = assertTrue (StringSet.exists ("a", trie), "a exists before remove1")
val trie = StringSet.remove ("a", trie)
val _ = assertFalse (StringSet.exists ("a", trie), "a does not exist after remove1")
val _ = assertTrue (StringSet.exists ("ab", trie), "ab still exists after remove1")
val _ = assertTrue (StringSet.exists ("abc", trie), "abc still exists after remove1")
in
print "StringSet.remove: passed remove1\n"
end
fun remove2 () =
let
val trie = StringSet.fromList ["a", "ab", "abc", "abd"]
val _ = assertTrue (StringSet.exists ("ab", trie), "ab exists before remove2")
val trie = StringSet.remove ("ab", trie)
val _ = assertFalse (StringSet.exists ("ab", trie), "ab no longer exists after remove2")
val _ = assertTrue (StringSet.exists ("a", trie), "remove2 contains a")
val _ = assertTrue (StringSet.exists ("abc", trie), "remove2 contains abc")
val _ = assertTrue (StringSet.exists ("abd", trie), "remove2 contains abd")
in
print "StringSet.remove: passed remove2\n"
end
fun remove3 () =
let
val trie = StringSet.fromList ["abc"]
val _ = assertTrue (StringSet.exists ("abc", trie), "abc exists before remove3")
val trie = StringSet.remove ("abc", trie)
val _ = assertFalse (StringSet.exists ("abc", trie), "abc no longer exists after remove3")
val _ = assertTrue (StringSet.isEmpty trie, "trie is empty after remove3")
val trie = StringSet.insert ("abc", trie)
val _ = assertTrue (StringSet.exists ("abc", trie), "abc exists after insertion in remove3")
in
print "StringSet.remove: passed remove3\n"
end
fun remove4 () =
let
val trie = StringSet.fromList ["abc", "abcd"]
val _ = assertTrue (StringSet.exists ("abc", trie), "abc exists before remove4")
val trie = StringSet.remove ("abc", trie)
val _ = assertTrue (StringSet.exists ("abcd", trie), "abcd still exists after removing abc in remove4")
val _ = assertFalse (StringSet.exists ("abc", trie), "abc doesn't exist after remove in remove4")
in
print "StringSet.remove: passed remove4\n"
end
fun remove5 () =
let
val trie = StringSet.fromList ["abc", "ab", "ade", "abcde", "x"]
val trie2 = StringSet.remove ("xy", trie)
val _ = assertTrue (trie = trie2, "removing key (xy) which doesn't exist in trie returns same trie")
val trie3 = StringSet.remove ("abcd", trie)
val _ = assertTrue (trie = trie3, "removing key (abcd) which doesn't exist in trie returns same trie")
val _ = assertTrue (StringSet.exists ("abcde", trie), "abcde exists before remove in remove5")
val trie = StringSet.remove ("abcde", trie)
val _ = assertFalse (StringSet.exists ("abcde", trie), "abcde does not exist after remove in remove5")
val _ = assertTrue (StringSet.exists ("x", trie), "x exists before remove in remove5")
val trie = StringSet.remove ("x", trie)
val _ = assertFalse (StringSet.exists ("x", trie), "x does not exist after remove in remove5")
val _ = assertTrue (StringSet.exists ("abc", trie), "abc exists before remove in remove5")
val trie = StringSet.remove ("abc", trie)
val _ = assertFalse (StringSet.exists ("abc", trie), "abc does not exist after remove in remove5")
val _ = assertTrue (StringSet.exists ("ab", trie), "trie still contains ab after removals in remove5")
val _ = assertTrue (StringSet.exists ("ade", trie), "trie still contains ade after removals in remove5")
in
print "StringSet.remove: passed remove5\n"
end
fun insert1 () =
let
val trie = StringSet.empty
val _ = assertFalse (StringSet.exists ("abc", trie), "abc does not exist before it is added to trie")
val trie = StringSet.insert ("abc", trie)
val _ = assertTrue (StringSet.exists ("abc", trie), "abc exists after being addedd to trie")
in
print "StringSet.insert: passed insert1\n"
end
fun insert2 () =
let
val trie = StringSet.fromList ["abc"]
val trie2 = StringSet.insert ("abc", trie)
val _ = assertTrue (trie = trie2, "trie handles duplicate insertion by returning same trie")
in
print "StringSet.insert: passed insert2\n"
end
fun insert3 () =
let
val trie = StringSet.empty
val _ = assertFalse (StringSet.exists ("abcd", trie), "empty trie does not contain abcd")
val trie = StringSet.insert ("abcd", trie)
val _ = assertTrue (StringSet.exists ("abcd", trie), "abcd exists after inserting it into empty trie")
val _ = assertFalse (StringSet.exists ("abc", trie), "abc does not exist before inserting it")
val trie = StringSet.insert ("abc", trie)
val _ = assertTrue (StringSet.exists ("abc", trie), "abc exists after inserting it into trie")
val trie = StringSet.remove ("abc", trie)
val _ = assertFalse (StringSet.exists ("abc", trie), "abc does not exist after removing it from trie")
in
print "StringSet.insert: passed insert3\n"
end
fun getPrefixList () =
let
val trie = StringSet.fromList ["z", "xc", "x", "zzz", "abc", "abcde"]
val lst = StringSet.getPrefixList ("ab", trie)
val _ = assertTrue (lst = ["abc", "abcde"], "prefix list contains abc and abcde when prefix is 'ab'")
in
print "StringSet.getPrefixList: passed\n"
end
fun run () =
let
val _ = testExists ()
val _ = testGetPrefixList ()
val _ = remove1 ()
val _ = remove2 ()
val _ = remove3 ()
val _ = remove4 ()
val _ = remove5 ()
val _ = insert1 ()
val _ = insert2 ()
val _ = insert3 ()
val _ = getPrefixList ()
in
()
end
end
val _ = StringSetTests.run ()