f085860f20
begin making changes to return a parse error if regex string contains an end marker
Humza Shahid2025-10-07 14:36:35 +01:00
060df2745a
fix bugs: only wildcard and character-class-negation should check to see if curChr is an endmarker
Humza Shahid2025-10-07 14:30:23 +01:00
c62e234d00
change dfa-gen to a functor, and use functor to instantiate different structures
Humza Shahid2025-10-07 14:05:45 +01:00
075fec02be
handle edge case in char range: escaped char followed by another escaped char
Humza Shahid2025-10-07 12:28:14 +01:00
4dfee016eb
handle edge case in char-range: in a range like a-z, the second character may be an escape sequence, and we need to handle that case if so
Humza Shahid2025-10-07 12:13:41 +01:00
44c2fbb3c7
handle character ranges like a-z in character class and negated character class
Humza Shahid2025-10-07 09:48:10 +01:00
3d4dbdda69
done adding functionality for parsing character classes
Humza Shahid2025-10-07 09:31:24 +01:00
8eed2ef51a
add support for [^negated_character_classes], although we don't parse them yet
Humza Shahid2025-10-07 08:57:01 +01:00
d6142285da
add handling for [character class] type (but note that we don't parse a character class yet)
Humza Shahid2025-10-07 08:51:46 +01:00
56658a4a70
only convert char to int in dfa-gen.sml's 'convertChar' loop
Humza Shahid2025-10-07 08:43:15 +01:00
ad92dadd34
pull in new version of brolib-sml, which handles edge case for LineGap.delete
Humza Shahid2025-10-06 22:58:09 +01:00
56a469e578
handle edge case in line_gap.sml when deleting to the left: we sometimes need to delete to the end of the string, so add a branch handling that case
Humza Shahid2025-10-06 22:56:18 +01:00
71786a494c
fix minor bug with escape sequences: we should pattern match on an unescaped char, and we should return an escaped char. For example, it makes sense to pattern match on plain unescaped /home/humza/Downloads/sml/shf/temp.txt"n" and return /home/humza/Downloads/sml/shf/temp.txt"\n". This is because user inputs escape-chars as a two-char sequence, prepended by a backslash \ character
Humza Shahid2025-10-06 21:58:50 +01:00
dcd930f855
begin expanding ? and + regex symbols, which we can represent using a combination of the others
Humza Shahid2025-10-06 17:08:57 +01:00
9dd44c8eca
fix implementation of ZERO_OR_MORE (Kleene star) in dfa-gen.sml
Humza Shahid2025-10-06 14:45:28 +01:00
2779b61c1f
amendment to 'lastpos' function: if right child is not nullable, then get lastpos of right child, or else get union of them both
Humza Shahid2025-10-06 13:50:34 +01:00
e05c690548
fix bug with implementation of wildcard: we don't want to match a wildcard if the character we are getting follow-positions for has an ASCII code of 0, because we are using that as an endmarker
Humza Shahid2025-10-06 12:12:23 +01:00
ea01f1689c
fix bug in search-list.sml: when we find a match, we should start 1 idx after the end position of the match
Humza Shahid2025-10-06 11:58:03 +01:00
cca2602429
fix bug in implementation of DFA algorithm: we need to add an end marker, and this will be used to tell us whether we have reached the final state in the DFA
Humza Shahid2025-10-06 11:49:10 +01:00
626aa0a860
add utility functions for using generated dfa
Humza Shahid2025-10-06 09:06:04 +01:00
f554c0db29
change 'dtran' set to only contain integers indicating the index from dstates to transition to on char
Humza Shahid2025-10-06 08:21:04 +01:00
a3287e71b9
take care of todo note addressing efficiency: don't update dtran vector on each 'convertChar' loop, but accumulate set and then append set to end of dtran at end of 'convertChar' loop
Humza Shahid2025-10-06 08:11:30 +01:00
6ae38189cf
previously, dtran was a {states: int list, transitions: set} record, but because the states are the exact same as the information in dstates (at same position too), we changed dtran to contain only the transitions
Humza Shahid2025-10-06 07:53:05 +01:00
c995d3cdf7
if we encounter an empty state when getting follow positions, skip to next char
Humza Shahid2025-10-06 07:44:46 +01:00
988ef22e75
first pass implementing 'convertChar' function
Humza Shahid2025-10-05 20:19:26 +01:00
ecdf642f13
progress with 'get-follow-positions-of-each-char' loop
Humza Shahid2025-10-05 15:31:11 +01:00
01fed05c87
remove functions which will soon be dead code, and cause code which uses them to be stubbed out
Humza Shahid2025-10-05 14:45:36 +01:00
d3795c771a
implement a function which descends down to a particular position, and then computes followpos: there were previously two separate functions performing these two tasks
Humza Shahid2025-10-05 12:04:20 +01:00
7e2021be24
tiny changes to dfa-gen.sml to make it more presentable when asking for advice
Humza Shahid2025-10-03 07:29:28 +01:00
0696d7ed52
make 'dstates' in 'Nfa.ToDfa.convert' function a vector, rather than a list, and make sure we append to the end each time we add
Humza Shahid2025-10-03 05:54:48 +01:00
2de40a09c7
add code to get all transitions in DFA
Humza Shahid2025-10-03 05:17:13 +01:00
ff80db1176
progress implementing conversion of regex to a DFA
Humza Shahid2025-10-02 13:54:59 +01:00
1c107b0d72
add function to get path to a particular position, for the sake of finding followPpos of a particular node
Humza Shahid2025-10-02 05:00:23 +01:00
dfb9153896
annotate CONCAT and ALTERNATION nodes with max states of left and right position during parsing. This makes it easier to find a given state.
Humza Shahid2025-10-02 04:34:16 +01:00
b9c20c43aa
fix parse error with stateNum numbering: should only increment stateNum in computeAtom function, and never anywhere else
Humza Shahid2025-10-01 14:19:27 +01:00
61f839641f
fix implementation of 'lastpos', which should return the lastpos of the right child in a CONCAT node, if the right child is not nullable, or else should return the union of lastpos for the left and right child both
Humza Shahid2025-10-01 14:06:41 +01:00
dddb459d93
remove todo note which has become outdates as of the previous commit
Humza Shahid2025-10-01 13:49:02 +01:00
169dcb5bf2
fix regex parsing by not considering grouping parens as an operator
Humza Shahid2025-10-01 13:48:18 +01:00
6a98cddebe
added functions to compute firstpos, lastpos and nullable
Humza Shahid2025-10-01 12:36:26 +01:00
7347437f17
change representation of alternation nodes and concatenation nodes to use tuples instead of lists, as the conventional algorithms use this representation
Humza Shahid2025-10-01 12:17:35 +01:00
fd0ce5b22a
add function to compute if a given node is nullable
Humza Shahid2025-10-01 11:52:45 +01:00
9584bca7ee
when parsing NFA, label position of leaves (each leaf is either a CHAR_LITERAL or a WILDCARD)
Humza Shahid2025-10-01 11:23:41 +01:00
31f70a6748
remove nfa-matching code for the moment, and parse a simple regex tree without state information
Humza Shahid2025-10-01 10:48:45 +01:00
774dba5c19
find bug and comment on it. We currently assume the first character in an NFA string is a CHAR_LITERAL, but it can be anything else, including a WILDCARD operator; we have to check what the chr is and decide. We probably want to take care of this later, so added a todo-note.
Humza Shahid2025-09-30 14:25:45 +01:00
934fa729a9
parse and interpret wildcard character which is a dot .
Humza Shahid2025-09-30 14:10:49 +01:00
b52b5ff28c
parse wildcard . character for NFA too
Humza Shahid2025-09-30 14:05:39 +01:00
5fa784b4c6
refactor nfa.sml so that lists in CONCAT and ALTERNATION cases don't need the state to be tupled with the regex
Humza Shahid2025-09-30 13:52:35 +01:00
45fbd85183
move buffer around when calling 'SearchList.buildRange'
Humza Shahid2025-09-30 05:40:57 +01:00
e03eecf940
use LineGap.sub instead of LineGap.substring, as the former function is now fixed
Humza Shahid2025-09-30 05:30:11 +01:00
265e6e1a90
fix bugs in 'LineGap.subRight' (we were not passing nextIdx in recursion properly)
Humza Shahid2025-09-30 05:23:31 +01:00
b35d045a09
fix bugs in implementation for 'Nfa.getMatchesInRange'
Humza Shahid2025-09-29 22:57:19 +01:00
14bb447289
fix known errors in LineGap.sub function
Humza Shahid2025-09-29 22:29:28 +01:00
863b4ba47b
do not require pattern matching head when in subRight/subLeft loop, but only require that in some cases
Humza Shahid2025-09-29 22:13:03 +01:00
6de33a65c2
fix minor type error introduced in line_gap.sml in last commit (was returning an integer instead of a char)
Humza Shahid2025-09-29 22:03:36 +01:00
f4422cc36c
add function to line_gap.sml to retrieve a single specific char
Humza Shahid2025-09-29 21:56:39 +01:00
64c16a7c25
fix bug with shadowing 'finishIdx' value, when we still wanted access to both the previous and the new 'finishIdx'
Humza Shahid2025-09-29 21:21:06 +01:00
df78e20cb7
fix bug in 'Nfa.getMatches' loop function: when we find that this state is valid, continue loop from 'finishIdx + 1'.
Humza Shahid2025-09-29 21:07:02 +01:00
8ba16daf7a
add function to persistent-vector.sml to check if we are in a specific range
Humza Shahid2025-09-29 14:29:43 +01:00
13ccdbb202
return PersistentVector.t when building search-list/executing nfa, because we don't want to use a simple flat vector for the search list now
Humza Shahid2025-09-29 14:02:07 +01:00
6d2b43606f
when parsing a string into an NFA, return an option type if the syntax is invalid
Humza Shahid2025-09-29 13:34:55 +01:00
7dc94632d6
fix backtracking bug in 'Nfa.getMatchesInRange' (we were passing the wrong value instead of 'strIdx' in the recursive call to the loop function)
Humza Shahid2025-09-29 13:13:14 +01:00
b6720ed5f1
first pass of 'get matches in range from nfa' functionality
Humza Shahid2025-09-29 12:18:45 +01:00
8d29bfab78
adjust nfa to return all matches in string, instead of just testing for one match and then returning true
Humza Shahid2025-09-29 10:28:03 +01:00
6b7485f753
change NFA interpreter slightly so that, if we see that a match is invalid at some place, we check in the next place to see if it is valid later in the string
Humza Shahid2025-09-29 02:00:04 +01:00
f8b707de20
interpret concatenation and alternation in nfa
Humza Shahid2025-09-29 01:45:28 +01:00
e01712a065
progress interpreting alternation in nfa
Humza Shahid2025-09-29 01:06:15 +01:00
5234338e25
small change similar to previous commit: in search-list.sml's 'backtrackFull' function, always check if the position is at the correct string before checking if we are at the place where the search should continue
Humza Shahid2025-09-27 14:47:24 +01:00
d01a1367ae
add test for 'dw' case: when we use 'dw' on last word in buffer, and there is no newline after last word, we delete last word fully
Humza Shahid2025-09-27 13:09:18 +01:00
d9380bcb64
pass regression test by modifying 'SearchList.backtrackRange' function. The modification that worked was swapping two if-statements around: first we check if the string position is 0 (and loop to check the previous string if so); in the else case, we check if the searchPos <= 1 (which signals for us to exit backtracking). Swapping the order of the if-statements means that, when we exit the loop, we always exit with string that is at this position.
Humza Shahid2025-09-27 12:40:28 +01:00
39db9c652e
add new test where we receive an exception when deleting while there is a search
Humza Shahid2025-09-27 12:31:29 +01:00
cd31bdd0d5
add tests for 'dW' motion, which are same as tests for 'dw' motion but testing for WORD instead of word where possible
Humza Shahid2025-09-27 07:14:26 +01:00
88a1489a54
pass failing test case for 'dw'. When we delete to the end of the file and the position the cursor was previously at no longer exists, move cursor to last valid character in file.
Humza Shahid2025-09-26 07:46:42 +01:00
9e0f62d142
add another test for 'dw' motion when deleting in the second of three words (this one passes)
Humza Shahid2025-09-26 07:32:35 +01:00
b31d7650a8
change the way we calculate the newCursorIdx when we delete using the 'dk' motion while on the last line. We go to the buffer's last line and find the first column. This more directly expresses what we want and now passes the failing unit test for 'dk'.
Humza Shahid2025-09-25 14:59:29 +01:00
05abecc70d
pass a failing test for 'dk' motion by decrementing newCursorLineNumber if the end of the deletion range is on a newline, and it is also the last char in the buffer
Humza Shahid2025-09-25 14:44:39 +01:00
1494d5c356
add two new unit tests for 'dk' motion
Humza Shahid2025-09-25 13:58:56 +01:00
bf55373f6d
fix 'MakeNormalDelete.deleteLineBack' test, which failed because we were using the wrong way to check if cursor is currently at the start of the line. We checked '(endOfLine = cursorIdx) which works in most cases to verify that the current position is a newline, but fails when the cursor is at the last non-newline character of the line. This is fixed by being more precise and calling 'Cursor.isCursorAtStartOfLine' to check directly if the cursor is currently at the end of the line.
Humza Shahid2025-09-25 10:49:20 +01:00