Commit Graph

784 Commits

Author SHA1 Message Date
cc5c0bf95c implement escape sequences for regex 2025-10-06 21:44:57 +01:00
0bcb3e1dfc done implementing '+' and '?' regex operators 2025-10-06 20:41:52 +01:00
dcd930f855 begin expanding ? and + regex symbols, which we can represent using a combination of the others 2025-10-06 17:08:57 +01:00
9dd44c8eca fix implementation of ZERO_OR_MORE (Kleene star) in dfa-gen.sml 2025-10-06 14:45:28 +01:00
2779b61c1f amendment to 'lastpos' function: if right child is not nullable, then get lastpos of right child, or else get union of them both 2025-10-06 13:50:34 +01:00
e05c690548 fix bug with implementation of wildcard: we don't want to match a wildcard if the character we are getting follow-positions for has an ASCII code of 0, because we are using that as an endmarker 2025-10-06 12:12:23 +01:00
ea01f1689c fix bug in search-list.sml: when we find a match, we should start 1 idx after the end position of the match 2025-10-06 11:58:03 +01:00
cca2602429 fix bug in implementation of DFA algorithm: we need to add an end marker, and this will be used to tell us whether we have reached the final state in the DFA 2025-10-06 11:49:10 +01:00
3f30d49420 progress using dfa for searching 2025-10-06 09:55:05 +01:00
626aa0a860 add utility functions for using generated dfa 2025-10-06 09:06:04 +01:00
f554c0db29 change 'dtran' set to only contain integers indicating the index from dstates to transition to on char 2025-10-06 08:21:04 +01:00
a3287e71b9 take care of todo note addressing efficiency: don't update dtran vector on each 'convertChar' loop, but accumulate set and then append set to end of dtran at end of 'convertChar' loop 2025-10-06 08:11:30 +01:00
6ae38189cf previously, dtran was a {states: int list, transitions: set} record, but because the states are the exact same as the information in dstates (at same position too), we changed dtran to contain only the transitions 2025-10-06 07:53:05 +01:00
c995d3cdf7 if we encounter an empty state when getting follow positions, skip to next char 2025-10-06 07:44:46 +01:00
303bcdf23d fix type errors 2025-10-05 20:27:48 +01:00
988ef22e75 first pass implementing 'convertChar' function 2025-10-05 20:19:26 +01:00
ecdf642f13 progress with 'get-follow-positions-of-each-char' loop 2025-10-05 15:31:11 +01:00
01fed05c87 remove functions which will soon be dead code, and cause code which uses them to be stubbed out 2025-10-05 14:45:36 +01:00
d3795c771a implement a function which descends down to a particular position, and then computes followpos: there were previously two separate functions performing these two tasks 2025-10-05 12:04:20 +01:00
7e2021be24 tiny changes to dfa-gen.sml to make it more presentable when asking for advice 2025-10-03 07:29:28 +01:00
0696d7ed52 make 'dstates' in 'Nfa.ToDfa.convert' function a vector, rather than a list, and make sure we append to the end each time we add 2025-10-03 05:54:48 +01:00
2de40a09c7 add code to get all transitions in DFA 2025-10-03 05:17:13 +01:00
ff80db1176 progress implementing conversion of regex to a DFA 2025-10-02 13:54:59 +01:00
1c107b0d72 add function to get path to a particular position, for the sake of finding followPpos of a particular node 2025-10-02 05:00:23 +01:00
dfb9153896 annotate CONCAT and ALTERNATION nodes with max states of left and right position during parsing. This makes it easier to find a given state. 2025-10-02 04:34:16 +01:00
b9c20c43aa fix parse error with stateNum numbering: should only increment stateNum in computeAtom function, and never anywhere else 2025-10-01 14:19:27 +01:00
b3f56dfaff add implementation for followpos 2025-10-01 14:10:40 +01:00
61f839641f fix implementation of 'lastpos', which should return the lastpos of the right child in a CONCAT node, if the right child is not nullable, or else should return the union of lastpos for the left and right child both 2025-10-01 14:06:41 +01:00
dddb459d93 remove todo note which has become outdates as of the previous commit 2025-10-01 13:49:02 +01:00
169dcb5bf2 fix regex parsing by not considering grouping parens as an operator 2025-10-01 13:48:18 +01:00
6a98cddebe added functions to compute firstpos, lastpos and nullable 2025-10-01 12:36:26 +01:00
7347437f17 change representation of alternation nodes and concatenation nodes to use tuples instead of lists, as the conventional algorithms use this representation 2025-10-01 12:17:35 +01:00
fd0ce5b22a add function to compute if a given node is nullable 2025-10-01 11:52:45 +01:00
9584bca7ee when parsing NFA, label position of leaves (each leaf is either a CHAR_LITERAL or a WILDCARD) 2025-10-01 11:23:41 +01:00
31f70a6748 remove nfa-matching code for the moment, and parse a simple regex tree without state information 2025-10-01 10:48:45 +01:00
774dba5c19 find bug and comment on it. We currently assume the first character in an NFA string is a CHAR_LITERAL, but it can be anything else, including a WILDCARD operator; we have to check what the chr is and decide. We probably want to take care of this later, so added a todo-note. 2025-09-30 14:25:45 +01:00
934fa729a9 parse and interpret wildcard character which is a dot . 2025-09-30 14:10:49 +01:00
b52b5ff28c parse wildcard . character for NFA too 2025-09-30 14:05:39 +01:00
5fa784b4c6 refactor nfa.sml so that lists in CONCAT and ALTERNATION cases don't need the state to be tupled with the regex 2025-09-30 13:52:35 +01:00
45fbd85183 move buffer around when calling 'SearchList.buildRange' 2025-09-30 05:40:57 +01:00
e03eecf940 use LineGap.sub instead of LineGap.substring, as the former function is now fixed 2025-09-30 05:30:11 +01:00
b35d045a09 fix bugs in implementation for 'Nfa.getMatchesInRange' 2025-09-29 22:57:19 +01:00
d37e510b24 progress fixing backtracking 2025-09-29 21:29:03 +01:00
64c16a7c25 fix bug with shadowing 'finishIdx' value, when we still wanted access to both the previous and the new 'finishIdx' 2025-09-29 21:21:06 +01:00
df78e20cb7 fix bug in 'Nfa.getMatches' loop function: when we find that this state is valid, continue loop from 'finishIdx + 1'. 2025-09-29 21:07:02 +01:00
665497cf46 fix all remaining type errors 2025-09-29 15:06:33 +01:00
fd321c2f14 fix some type errors 2025-09-29 15:02:40 +01:00
8f49cdca13 fix type errors in normal-mode-text-builder.sml 2025-09-29 14:55:20 +01:00
d44799a794 fix some type errors in the code 2025-09-29 14:49:50 +01:00
8ba16daf7a add function to persistent-vector.sml to check if we are in a specific range 2025-09-29 14:29:43 +01:00