diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..ccb043b --- /dev/null +++ b/LICENSE @@ -0,0 +1,5 @@ +Copyright (C) 2024 by Humza Shahid + +Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted. + +THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..530f18a --- /dev/null +++ b/README.md @@ -0,0 +1,53 @@ +# string-trie + +This repository implements a set over strings in Standard ML using a trie/prefix tree. + +The signature provided is: + +``` +signature STRING_SET = +sig + (* the type of tries *) + type t + + (* the empty trie *) + val empty: t + + (* returns true if the trie is empty *) + val isEmpty: t -> bool + + (* creates a trie containing just a string *) + val fromString: string -> t + + (* returns true if the key was inserted into the trie *) + val exists: string * t -> bool + + (* inserts a new string into the trie, returning a new trie *) + val insert: string * t -> t + + (* removes the key from the trie, returning a new trie *) + val remove: string * t -> t + + (* returns a list of all keys matching the specified prefix *) + val getPrefixList: string * t -> string list + + (* returns a list containing all keys in the trie *) + val toList: t -> string list + + (* returns a trie containing all keys in the string list *) + val fromList: string list -> t +end +``` + +The reason for implementing a new trie specialised to strings rather than using Chris Okasaki's IntMap data structure is to enable prefix searching, where it is possible to get a list of all keys matching a certain prefix. + +# To-do + +- [ ] Add `foldl`, `foldr`, `foldlWithPrefix`, `foldrWithPrefix` functions to string set +- [ ] Benchmarks (possibly comparing to a set of strings in a balanced binary tree) +- [ ] Use unrolled linked list with zipper in trie, to limit size of vector as allocating large vectors repeatedly is expensive +- [ ] Implement StringMap, containing both keys and values + +# Credits + +The tests in `tests/string-set-tests.sml` were ported from [kpol's Trie data structure in C3](https://github.com/kpol/trie), although this is not true for the files in the `src/` directory. diff --git a/src/string-set.sml b/src/string-set.sml index 8c3be90..3d31ea1 100644 --- a/src/string-set.sml +++ b/src/string-set.sml @@ -1,24 +1,34 @@ signature STRING_SET = sig + (* the type of tries *) type t + (* the empty trie *) val empty: t + (* returns true if the trie is empty *) val isEmpty: t -> bool + (* creates a trie containing just a string *) val fromString: string -> t + (* returns true if the key was inserted into the trie *) val exists: string * t -> bool - val getPrefixList: string * t -> string list - + (* inserts a new string into the trie, returning a new trie *) val insert: string * t -> t + (* removes the key from the trie, returning a new trie *) val remove: string * t -> t - val fromList: string list -> t + (* returns a list of all keys matching the specified prefix *) + val getPrefixList: string * t -> string list + (* returns a list containing all keys in the trie *) val toList: t -> string list + + (* returns a trie containing all keys in the string list *) + val fromList: string list -> t end structure StringSet: STRING_SET = @@ -558,13 +568,6 @@ struct end | fromList ([]) = empty - (* - * todo: - * - Add removal functionality to remove a key from the list, - * or to mark it is non-found if the key is a prefix - * of other children. - *) - datatype remove_result = UNCHANGED | MADE_EMPTY