sml-projects

Author	SHA1	Message	Date
Humza Shahid	060df2745a	fix bugs: only wildcard and character-class-negation should check to see if curChr is an endmarker	2025-10-07 14:30:23 +01:00
Humza Shahid	c62e234d00	change dfa-gen to a functor, and use functor to instantiate different structures	2025-10-07 14:05:45 +01:00
Humza Shahid	075fec02be	handle edge case in char range: escaped char followed by another escaped char	2025-10-07 12:28:14 +01:00
Humza Shahid	4dfee016eb	handle edge case in char-range: in a range like a-z, the second character may be an escape sequence, and we need to handle that case if so	2025-10-07 12:13:41 +01:00
Humza Shahid	44c2fbb3c7	handle character ranges like a-z in character class and negated character class	2025-10-07 09:48:10 +01:00
Humza Shahid	3d4dbdda69	done adding functionality for parsing character classes	2025-10-07 09:31:24 +01:00
Humza Shahid	8eed2ef51a	add support for [^negated_character_classes], although we don't parse them yet	2025-10-07 08:57:01 +01:00
Humza Shahid	d6142285da	add handling for [character class] type (but note that we don't parse a character class yet)	2025-10-07 08:51:46 +01:00
Humza Shahid	56658a4a70	only convert char to int in dfa-gen.sml's 'convertChar' loop	2025-10-07 08:43:15 +01:00
Humza Shahid	ad92dadd34	pull in new version of brolib-sml, which handles edge case for LineGap.delete	2025-10-06 22:58:09 +01:00
Humza Shahid	71786a494c	fix minor bug with escape sequences: we should pattern match on an unescaped char, and we should return an escaped char. For example, it makes sense to pattern match on plain unescaped /home/humza/Downloads/sml/shf/temp.txt"n" and return /home/humza/Downloads/sml/shf/temp.txt"\n". This is because user inputs escape-chars as a two-char sequence, prepended by a backslash \ character	2025-10-06 21:58:50 +01:00
Humza Shahid	cc5c0bf95c	implement escape sequences for regex	2025-10-06 21:44:57 +01:00
Humza Shahid	0bcb3e1dfc	done implementing '+' and '?' regex operators	2025-10-06 20:41:52 +01:00
Humza Shahid	dcd930f855	begin expanding ? and + regex symbols, which we can represent using a combination of the others	2025-10-06 17:08:57 +01:00
Humza Shahid	9dd44c8eca	fix implementation of ZERO_OR_MORE (Kleene star) in dfa-gen.sml	2025-10-06 14:45:28 +01:00
Humza Shahid	2779b61c1f	amendment to 'lastpos' function: if right child is not nullable, then get lastpos of right child, or else get union of them both	2025-10-06 13:50:34 +01:00
Humza Shahid	e05c690548	fix bug with implementation of wildcard: we don't want to match a wildcard if the character we are getting follow-positions for has an ASCII code of 0, because we are using that as an endmarker	2025-10-06 12:12:23 +01:00
Humza Shahid	ea01f1689c	fix bug in search-list.sml: when we find a match, we should start 1 idx after the end position of the match	2025-10-06 11:58:03 +01:00
Humza Shahid	cca2602429	fix bug in implementation of DFA algorithm: we need to add an end marker, and this will be used to tell us whether we have reached the final state in the DFA	2025-10-06 11:49:10 +01:00
Humza Shahid	3f30d49420	progress using dfa for searching	2025-10-06 09:55:05 +01:00
Humza Shahid	626aa0a860	add utility functions for using generated dfa	2025-10-06 09:06:04 +01:00
Humza Shahid	f554c0db29	change 'dtran' set to only contain integers indicating the index from dstates to transition to on char	2025-10-06 08:21:04 +01:00
Humza Shahid	a3287e71b9	take care of todo note addressing efficiency: don't update dtran vector on each 'convertChar' loop, but accumulate set and then append set to end of dtran at end of 'convertChar' loop	2025-10-06 08:11:30 +01:00
Humza Shahid	6ae38189cf	previously, dtran was a {states: int list, transitions: set} record, but because the states are the exact same as the information in dstates (at same position too), we changed dtran to contain only the transitions	2025-10-06 07:53:05 +01:00
Humza Shahid	c995d3cdf7	if we encounter an empty state when getting follow positions, skip to next char	2025-10-06 07:44:46 +01:00
Humza Shahid	303bcdf23d	fix type errors	2025-10-05 20:27:48 +01:00
Humza Shahid	988ef22e75	first pass implementing 'convertChar' function	2025-10-05 20:19:26 +01:00
Humza Shahid	ecdf642f13	progress with 'get-follow-positions-of-each-char' loop	2025-10-05 15:31:11 +01:00
Humza Shahid	01fed05c87	remove functions which will soon be dead code, and cause code which uses them to be stubbed out	2025-10-05 14:45:36 +01:00
Humza Shahid	d3795c771a	implement a function which descends down to a particular position, and then computes followpos: there were previously two separate functions performing these two tasks	2025-10-05 12:04:20 +01:00
Humza Shahid	7e2021be24	tiny changes to dfa-gen.sml to make it more presentable when asking for advice	2025-10-03 07:29:28 +01:00
Humza Shahid	0696d7ed52	make 'dstates' in 'Nfa.ToDfa.convert' function a vector, rather than a list, and make sure we append to the end each time we add	2025-10-03 05:54:48 +01:00
Humza Shahid	2de40a09c7	add code to get all transitions in DFA	2025-10-03 05:17:13 +01:00
Humza Shahid	ff80db1176	progress implementing conversion of regex to a DFA	2025-10-02 13:54:59 +01:00
Humza Shahid	1c107b0d72	add function to get path to a particular position, for the sake of finding followPpos of a particular node	2025-10-02 05:00:23 +01:00
Humza Shahid	dfb9153896	annotate CONCAT and ALTERNATION nodes with max states of left and right position during parsing. This makes it easier to find a given state.	2025-10-02 04:34:16 +01:00
Humza Shahid	b9c20c43aa	fix parse error with stateNum numbering: should only increment stateNum in computeAtom function, and never anywhere else	2025-10-01 14:19:27 +01:00
Humza Shahid	b3f56dfaff	add implementation for followpos	2025-10-01 14:10:40 +01:00
Humza Shahid	61f839641f	fix implementation of 'lastpos', which should return the lastpos of the right child in a CONCAT node, if the right child is not nullable, or else should return the union of lastpos for the left and right child both	2025-10-01 14:06:41 +01:00
Humza Shahid	dddb459d93	remove todo note which has become outdates as of the previous commit	2025-10-01 13:49:02 +01:00
Humza Shahid	169dcb5bf2	fix regex parsing by not considering grouping parens as an operator	2025-10-01 13:48:18 +01:00
Humza Shahid	6a98cddebe	added functions to compute firstpos, lastpos and nullable	2025-10-01 12:36:26 +01:00
Humza Shahid	7347437f17	change representation of alternation nodes and concatenation nodes to use tuples instead of lists, as the conventional algorithms use this representation	2025-10-01 12:17:35 +01:00
Humza Shahid	fd0ce5b22a	add function to compute if a given node is nullable	2025-10-01 11:52:45 +01:00
Humza Shahid	9584bca7ee	when parsing NFA, label position of leaves (each leaf is either a CHAR_LITERAL or a WILDCARD)	2025-10-01 11:23:41 +01:00
Humza Shahid	31f70a6748	remove nfa-matching code for the moment, and parse a simple regex tree without state information	2025-10-01 10:48:45 +01:00
Humza Shahid	774dba5c19	find bug and comment on it. We currently assume the first character in an NFA string is a CHAR_LITERAL, but it can be anything else, including a WILDCARD operator; we have to check what the chr is and decide. We probably want to take care of this later, so added a todo-note.	2025-09-30 14:25:45 +01:00
Humza Shahid	934fa729a9	parse and interpret wildcard character which is a dot .	2025-09-30 14:10:49 +01:00
Humza Shahid	b52b5ff28c	parse wildcard . character for NFA too	2025-09-30 14:05:39 +01:00
Humza Shahid	5fa784b4c6	refactor nfa.sml so that lists in CONCAT and ALTERNATION cases don't need the state to be tupled with the regex	2025-09-30 13:52:35 +01:00

... 4 5 6 7 8 ...

895 Commits