Commit Graph

  • f085860f20 begin making changes to return a parse error if regex string contains an end marker Humza Shahid 2025-10-07 14:36:35 +01:00
  • 060df2745a fix bugs: only wildcard and character-class-negation should check to see if curChr is an endmarker Humza Shahid 2025-10-07 14:30:23 +01:00
  • c62e234d00 change dfa-gen to a functor, and use functor to instantiate different structures Humza Shahid 2025-10-07 14:05:45 +01:00
  • 075fec02be handle edge case in char range: escaped char followed by another escaped char Humza Shahid 2025-10-07 12:28:14 +01:00
  • 4dfee016eb handle edge case in char-range: in a range like a-z, the second character may be an escape sequence, and we need to handle that case if so Humza Shahid 2025-10-07 12:13:41 +01:00
  • 44c2fbb3c7 handle character ranges like a-z in character class and negated character class Humza Shahid 2025-10-07 09:48:10 +01:00
  • 3d4dbdda69 done adding functionality for parsing character classes Humza Shahid 2025-10-07 09:31:24 +01:00
  • 8eed2ef51a add support for [^negated_character_classes], although we don't parse them yet Humza Shahid 2025-10-07 08:57:01 +01:00
  • d6142285da add handling for [character class] type (but note that we don't parse a character class yet) Humza Shahid 2025-10-07 08:51:46 +01:00
  • 56658a4a70 only convert char to int in dfa-gen.sml's 'convertChar' loop Humza Shahid 2025-10-07 08:43:15 +01:00
  • ad92dadd34 pull in new version of brolib-sml, which handles edge case for LineGap.delete Humza Shahid 2025-10-06 22:58:09 +01:00
  • 56a469e578 handle edge case in line_gap.sml when deleting to the left: we sometimes need to delete to the end of the string, so add a branch handling that case Humza Shahid 2025-10-06 22:56:18 +01:00
  • 71786a494c fix minor bug with escape sequences: we should pattern match on an unescaped char, and we should return an escaped char. For example, it makes sense to pattern match on plain unescaped /home/humza/Downloads/sml/shf/temp.txt"n" and return /home/humza/Downloads/sml/shf/temp.txt"\n". This is because user inputs escape-chars as a two-char sequence, prepended by a backslash \ character Humza Shahid 2025-10-06 21:58:50 +01:00
  • cc5c0bf95c implement escape sequences for regex Humza Shahid 2025-10-06 21:44:57 +01:00
  • 0bcb3e1dfc done implementing '+' and '?' regex operators Humza Shahid 2025-10-06 20:41:52 +01:00
  • dcd930f855 begin expanding ? and + regex symbols, which we can represent using a combination of the others Humza Shahid 2025-10-06 17:08:57 +01:00
  • 9dd44c8eca fix implementation of ZERO_OR_MORE (Kleene star) in dfa-gen.sml Humza Shahid 2025-10-06 14:45:28 +01:00
  • 2779b61c1f amendment to 'lastpos' function: if right child is not nullable, then get lastpos of right child, or else get union of them both Humza Shahid 2025-10-06 13:50:34 +01:00
  • e05c690548 fix bug with implementation of wildcard: we don't want to match a wildcard if the character we are getting follow-positions for has an ASCII code of 0, because we are using that as an endmarker Humza Shahid 2025-10-06 12:12:23 +01:00
  • ea01f1689c fix bug in search-list.sml: when we find a match, we should start 1 idx after the end position of the match Humza Shahid 2025-10-06 11:58:03 +01:00
  • cca2602429 fix bug in implementation of DFA algorithm: we need to add an end marker, and this will be used to tell us whether we have reached the final state in the DFA Humza Shahid 2025-10-06 11:49:10 +01:00
  • 3f30d49420 progress using dfa for searching Humza Shahid 2025-10-06 09:55:05 +01:00
  • 626aa0a860 add utility functions for using generated dfa Humza Shahid 2025-10-06 09:06:04 +01:00
  • f554c0db29 change 'dtran' set to only contain integers indicating the index from dstates to transition to on char Humza Shahid 2025-10-06 08:21:04 +01:00
  • a3287e71b9 take care of todo note addressing efficiency: don't update dtran vector on each 'convertChar' loop, but accumulate set and then append set to end of dtran at end of 'convertChar' loop Humza Shahid 2025-10-06 08:11:30 +01:00
  • 6ae38189cf previously, dtran was a {states: int list, transitions: set} record, but because the states are the exact same as the information in dstates (at same position too), we changed dtran to contain only the transitions Humza Shahid 2025-10-06 07:53:05 +01:00
  • c995d3cdf7 if we encounter an empty state when getting follow positions, skip to next char Humza Shahid 2025-10-06 07:44:46 +01:00
  • 303bcdf23d fix type errors Humza Shahid 2025-10-05 20:27:48 +01:00
  • 988ef22e75 first pass implementing 'convertChar' function Humza Shahid 2025-10-05 20:19:26 +01:00
  • ecdf642f13 progress with 'get-follow-positions-of-each-char' loop Humza Shahid 2025-10-05 15:31:11 +01:00
  • 01fed05c87 remove functions which will soon be dead code, and cause code which uses them to be stubbed out Humza Shahid 2025-10-05 14:45:36 +01:00
  • d3795c771a implement a function which descends down to a particular position, and then computes followpos: there were previously two separate functions performing these two tasks Humza Shahid 2025-10-05 12:04:20 +01:00
  • 7e2021be24 tiny changes to dfa-gen.sml to make it more presentable when asking for advice Humza Shahid 2025-10-03 07:29:28 +01:00
  • 0696d7ed52 make 'dstates' in 'Nfa.ToDfa.convert' function a vector, rather than a list, and make sure we append to the end each time we add Humza Shahid 2025-10-03 05:54:48 +01:00
  • 2de40a09c7 add code to get all transitions in DFA Humza Shahid 2025-10-03 05:17:13 +01:00
  • ff80db1176 progress implementing conversion of regex to a DFA Humza Shahid 2025-10-02 13:54:59 +01:00
  • 1c107b0d72 add function to get path to a particular position, for the sake of finding followPpos of a particular node Humza Shahid 2025-10-02 05:00:23 +01:00
  • dfb9153896 annotate CONCAT and ALTERNATION nodes with max states of left and right position during parsing. This makes it easier to find a given state. Humza Shahid 2025-10-02 04:34:16 +01:00
  • b9c20c43aa fix parse error with stateNum numbering: should only increment stateNum in computeAtom function, and never anywhere else Humza Shahid 2025-10-01 14:19:27 +01:00
  • b3f56dfaff add implementation for followpos Humza Shahid 2025-10-01 14:10:40 +01:00
  • 61f839641f fix implementation of 'lastpos', which should return the lastpos of the right child in a CONCAT node, if the right child is not nullable, or else should return the union of lastpos for the left and right child both Humza Shahid 2025-10-01 14:06:41 +01:00
  • dddb459d93 remove todo note which has become outdates as of the previous commit Humza Shahid 2025-10-01 13:49:02 +01:00
  • 169dcb5bf2 fix regex parsing by not considering grouping parens as an operator Humza Shahid 2025-10-01 13:48:18 +01:00
  • 6a98cddebe added functions to compute firstpos, lastpos and nullable Humza Shahid 2025-10-01 12:36:26 +01:00
  • 7347437f17 change representation of alternation nodes and concatenation nodes to use tuples instead of lists, as the conventional algorithms use this representation Humza Shahid 2025-10-01 12:17:35 +01:00
  • fd0ce5b22a add function to compute if a given node is nullable Humza Shahid 2025-10-01 11:52:45 +01:00
  • 9584bca7ee when parsing NFA, label position of leaves (each leaf is either a CHAR_LITERAL or a WILDCARD) Humza Shahid 2025-10-01 11:23:41 +01:00
  • 31f70a6748 remove nfa-matching code for the moment, and parse a simple regex tree without state information Humza Shahid 2025-10-01 10:48:45 +01:00
  • 774dba5c19 find bug and comment on it. We currently assume the first character in an NFA string is a CHAR_LITERAL, but it can be anything else, including a WILDCARD operator; we have to check what the chr is and decide. We probably want to take care of this later, so added a todo-note. Humza Shahid 2025-09-30 14:25:45 +01:00
  • 934fa729a9 parse and interpret wildcard character which is a dot . Humza Shahid 2025-09-30 14:10:49 +01:00
  • b52b5ff28c parse wildcard . character for NFA too Humza Shahid 2025-09-30 14:05:39 +01:00
  • 5fa784b4c6 refactor nfa.sml so that lists in CONCAT and ALTERNATION cases don't need the state to be tupled with the regex Humza Shahid 2025-09-30 13:52:35 +01:00
  • 45fbd85183 move buffer around when calling 'SearchList.buildRange' Humza Shahid 2025-09-30 05:40:57 +01:00
  • e03eecf940 use LineGap.sub instead of LineGap.substring, as the former function is now fixed Humza Shahid 2025-09-30 05:30:11 +01:00
  • 265e6e1a90 fix bugs in 'LineGap.subRight' (we were not passing nextIdx in recursion properly) Humza Shahid 2025-09-30 05:23:31 +01:00
  • b35d045a09 fix bugs in implementation for 'Nfa.getMatchesInRange' Humza Shahid 2025-09-29 22:57:19 +01:00
  • 14bb447289 fix known errors in LineGap.sub function Humza Shahid 2025-09-29 22:29:28 +01:00
  • 863b4ba47b do not require pattern matching head when in subRight/subLeft loop, but only require that in some cases Humza Shahid 2025-09-29 22:13:03 +01:00
  • 6de33a65c2 fix minor type error introduced in line_gap.sml in last commit (was returning an integer instead of a char) Humza Shahid 2025-09-29 22:03:36 +01:00
  • f4422cc36c add function to line_gap.sml to retrieve a single specific char Humza Shahid 2025-09-29 21:56:39 +01:00
  • d37e510b24 progress fixing backtracking Humza Shahid 2025-09-29 21:29:03 +01:00
  • 64c16a7c25 fix bug with shadowing 'finishIdx' value, when we still wanted access to both the previous and the new 'finishIdx' Humza Shahid 2025-09-29 21:21:06 +01:00
  • df78e20cb7 fix bug in 'Nfa.getMatches' loop function: when we find that this state is valid, continue loop from 'finishIdx + 1'. Humza Shahid 2025-09-29 21:07:02 +01:00
  • 665497cf46 fix all remaining type errors Humza Shahid 2025-09-29 15:06:33 +01:00
  • fd321c2f14 fix some type errors Humza Shahid 2025-09-29 15:02:40 +01:00
  • 8f49cdca13 fix type errors in normal-mode-text-builder.sml Humza Shahid 2025-09-29 14:55:20 +01:00
  • d44799a794 fix some type errors in the code Humza Shahid 2025-09-29 14:49:50 +01:00
  • 8ba16daf7a add function to persistent-vector.sml to check if we are in a specific range Humza Shahid 2025-09-29 14:29:43 +01:00
  • 13ccdbb202 return PersistentVector.t when building search-list/executing nfa, because we don't want to use a simple flat vector for the search list now Humza Shahid 2025-09-29 14:02:07 +01:00
  • 6d2b43606f when parsing a string into an NFA, return an option type if the syntax is invalid Humza Shahid 2025-09-29 13:34:55 +01:00
  • 7dc94632d6 fix backtracking bug in 'Nfa.getMatchesInRange' (we were passing the wrong value instead of 'strIdx' in the recursive call to the loop function) Humza Shahid 2025-09-29 13:13:14 +01:00
  • b6720ed5f1 first pass of 'get matches in range from nfa' functionality Humza Shahid 2025-09-29 12:18:45 +01:00
  • 8d29bfab78 adjust nfa to return all matches in string, instead of just testing for one match and then returning true Humza Shahid 2025-09-29 10:28:03 +01:00
  • f52a8306ea add comments to ongiong NFA implementation Humza Shahid 2025-09-29 08:33:10 +01:00
  • 6b7485f753 change NFA interpreter slightly so that, if we see that a match is invalid at some place, we check in the next place to see if it is valid later in the string Humza Shahid 2025-09-29 02:00:04 +01:00
  • f8b707de20 interpret concatenation and alternation in nfa Humza Shahid 2025-09-29 01:45:28 +01:00
  • e01712a065 progress interpreting alternation in nfa Humza Shahid 2025-09-29 01:06:15 +01:00
  • d9720c5643 begin adding interpretation for NFA Humza Shahid 2025-09-29 00:46:05 +01:00
  • d75b1a18ff flatten repeated concatenations and alternations into a single list when possible Humza Shahid 2025-09-28 22:23:48 +01:00
  • 032ca56bbf add initial implementation of compiling a regex string to an NFA Humza Shahid 2025-09-28 22:01:44 +01:00
  • 64678bf68e add tests for 'dE' motion Humza Shahid 2025-09-27 15:40:26 +01:00
  • 5234338e25 small change similar to previous commit: in search-list.sml's 'backtrackFull' function, always check if the position is at the correct string before checking if we are at the place where the search should continue Humza Shahid 2025-09-27 14:47:24 +01:00
  • d01a1367ae add test for 'dw' case: when we use 'dw' on last word in buffer, and there is no newline after last word, we delete last word fully Humza Shahid 2025-09-27 13:09:18 +01:00
  • d9380bcb64 pass regression test by modifying 'SearchList.backtrackRange' function. The modification that worked was swapping two if-statements around: first we check if the string position is 0 (and loop to check the previous string if so); in the else case, we check if the searchPos <= 1 (which signals for us to exit backtracking). Swapping the order of the if-statements means that, when we exit the loop, we always exit with string that is at this position. Humza Shahid 2025-09-27 12:40:28 +01:00
  • 39db9c652e add new test where we receive an exception when deleting while there is a search Humza Shahid 2025-09-27 12:31:29 +01:00
  • 0b490b00bb add tests for 'de' motion Humza Shahid 2025-09-27 10:02:05 +01:00
  • 8ad5cc77c3 change colour of text in search bar as well Humza Shahid 2025-09-27 08:35:52 +01:00
  • 5e9872e4d6 better visual positioning for cursor Humza Shahid 2025-09-27 08:14:22 +01:00
  • 2c388899ca use different colours for program Humza Shahid 2025-09-27 08:00:21 +01:00
  • cd31bdd0d5 add tests for 'dW' motion, which are same as tests for 'dw' motion but testing for WORD instead of word where possible Humza Shahid 2025-09-27 07:14:26 +01:00
  • 074ba2bcde done adding tests for 'dw' motion Humza Shahid 2025-09-26 08:21:54 +01:00
  • 5e1e66a977 add another test for 'dw' motion Humza Shahid 2025-09-26 07:58:23 +01:00
  • 88a1489a54 pass failing test case for 'dw'. When we delete to the end of the file and the position the cursor was previously at no longer exists, move cursor to last valid character in file. Humza Shahid 2025-09-26 07:46:42 +01:00
  • 9e0f62d142 add another test for 'dw' motion when deleting in the second of three words (this one passes) Humza Shahid 2025-09-26 07:32:35 +01:00
  • 5503b8ebda add failing test for 'dw' motion Humza Shahid 2025-09-26 05:27:48 +01:00
  • b31d7650a8 change the way we calculate the newCursorIdx when we delete using the 'dk' motion while on the last line. We go to the buffer's last line and find the first column. This more directly expresses what we want and now passes the failing unit test for 'dk'. Humza Shahid 2025-09-25 14:59:29 +01:00
  • 05abecc70d pass a failing test for 'dk' motion by decrementing newCursorLineNumber if the end of the deletion range is on a newline, and it is also the last char in the buffer Humza Shahid 2025-09-25 14:44:39 +01:00
  • 1494d5c356 add two new unit tests for 'dk' motion Humza Shahid 2025-09-25 13:58:56 +01:00
  • 6b0149162f a bit of formatting Humza Shahid 2025-09-25 10:50:02 +01:00
  • bf55373f6d fix 'MakeNormalDelete.deleteLineBack' test, which failed because we were using the wrong way to check if cursor is currently at the start of the line. We checked '(endOfLine = cursorIdx) which works in most cases to verify that the current position is a newline, but fails when the cursor is at the last non-newline character of the line. This is fixed by being more precise and calling 'Cursor.isCursorAtStartOfLine' to check directly if the cursor is currently at the end of the line. Humza Shahid 2025-09-25 10:49:20 +01:00