Commit Graph

95 Commits

Author SHA1 Message Date
df7669b065 progress in changing functions to use 'PersistentVector.delete' so that search list is incremental and not rebuilt from scratch after each deletion 2026-02-06 08:52:11 +00:00
c6dee6e9f9 implement function that deletes from both LineGap.t and SearchList, maintaining an exact match between both 2026-01-18 09:59:00 +00:00
111e0cf66d remove usage of concurrent ml, deciding that we prefer to run everything in the main thread instead 2025-10-17 23:08:16 +01:00
22a8b807d2 handle edge case when building dfa from a string, where an exception was raised when our search regex contains an alternation where the second alternation is a substring of the first alternation, and add a test for it to make sure that it passes 2025-10-14 02:24:45 +01:00
ca2c2f438c when adding to followset in ONE_OR_MORE case, make sure we add the child to the followset as well 2025-10-12 00:32:55 +01:00
ce3470e612 fix bug in regex-test: dfa-gen.sml should add the position of the endMarker to the followSet as well 2025-10-12 00:22:14 +01:00
7f1f1f7bdc at end of char loop, track if length of dstate changed. If it did not, that means that we have encountered a loop that is at the end; thus, we should add the endMarker 2025-10-11 13:39:28 +01:00
b2931753d0 make dfa-gen.sml compile again, with parity before reimplementing it 2025-10-11 13:23:44 +01:00
96f0afc2b2 attempt at fixing dfa-gen to convert properly 2025-10-11 11:32:30 +01:00
a44afca40b checkpoint for reimplementing dfa-gen.sml 2025-10-10 11:54:34 +01:00
5a43954aef checkpoint 2025-10-10 04:59:32 +01:00
244d0ce26d begin attempt to compute followpos properly 2025-10-10 04:44:18 +01:00
bdfca17b5a implement function to insert a list to a pos 2025-10-10 04:00:34 +01:00
58c3e65fdd add list of follows to leaves in regex parse tree (only changed data type; need to populate follows list later) 2025-10-10 03:49:09 +01:00
108a30ea79 add utility function to insert from a list into a set 2025-10-10 03:29:52 +01:00
88eb30dbf2 done caching firstpos and lastpos, and using the cached data 2025-10-10 01:56:54 +01:00
6e646bdffa begin computing firstpos and lastpos during parsing 2025-10-10 01:43:24 +01:00
3197315478 fix failing tests for escaping regex metacharacters 2025-10-09 06:22:21 +01:00
a5fec6f1a2 add tests for escape sequences 2025-10-09 06:06:07 +01:00
250ae239be begin adding tests for regex 2025-10-09 05:34:32 +01:00
0de7a9278a progress implementing help-prev-match for vector 2025-10-08 10:27:19 +01:00
3b823d7ae6 delete 'nextMatch' function in search-list.sml, and refactor other code to use alternative function 2025-10-08 08:16:20 +01:00
108e021fdb log an exception if search-thread encounters a failure 2025-10-08 06:51:52 +01:00
3c2e5812cd reimplement function to search through text from scratch 2025-10-08 06:35:49 +01:00
06106f5de8 remove 'searchString' field from app_type, because the same role is fulfilled by new 'dfa' field 2025-10-08 05:40:29 +01:00
8857f49537 pass DFA to 'SearchList.buildRange' function, so that we don't need to parse search string into DFA each time 2025-10-08 05:20:33 +01:00
7a72bc2ed1 done with allowing different types of endMarkers 2025-10-07 14:44:40 +01:00
f085860f20 begin making changes to return a parse error if regex string contains an end marker 2025-10-07 14:36:35 +01:00
060df2745a fix bugs: only wildcard and character-class-negation should check to see if curChr is an endmarker 2025-10-07 14:30:23 +01:00
c62e234d00 change dfa-gen to a functor, and use functor to instantiate different structures 2025-10-07 14:05:45 +01:00
075fec02be handle edge case in char range: escaped char followed by another escaped char 2025-10-07 12:28:14 +01:00
4dfee016eb handle edge case in char-range: in a range like a-z, the second character may be an escape sequence, and we need to handle that case if so 2025-10-07 12:13:41 +01:00
44c2fbb3c7 handle character ranges like a-z in character class and negated character class 2025-10-07 09:48:10 +01:00
3d4dbdda69 done adding functionality for parsing character classes 2025-10-07 09:31:24 +01:00
8eed2ef51a add support for [^negated_character_classes], although we don't parse them yet 2025-10-07 08:57:01 +01:00
d6142285da add handling for [character class] type (but note that we don't parse a character class yet) 2025-10-07 08:51:46 +01:00
56658a4a70 only convert char to int in dfa-gen.sml's 'convertChar' loop 2025-10-07 08:43:15 +01:00
71786a494c fix minor bug with escape sequences: we should pattern match on an unescaped char, and we should return an escaped char. For example, it makes sense to pattern match on plain unescaped /home/humza/Downloads/sml/shf/temp.txt"n" and return /home/humza/Downloads/sml/shf/temp.txt"\n". This is because user inputs escape-chars as a two-char sequence, prepended by a backslash \ character 2025-10-06 21:58:50 +01:00
cc5c0bf95c implement escape sequences for regex 2025-10-06 21:44:57 +01:00
0bcb3e1dfc done implementing '+' and '?' regex operators 2025-10-06 20:41:52 +01:00
dcd930f855 begin expanding ? and + regex symbols, which we can represent using a combination of the others 2025-10-06 17:08:57 +01:00
9dd44c8eca fix implementation of ZERO_OR_MORE (Kleene star) in dfa-gen.sml 2025-10-06 14:45:28 +01:00
2779b61c1f amendment to 'lastpos' function: if right child is not nullable, then get lastpos of right child, or else get union of them both 2025-10-06 13:50:34 +01:00
e05c690548 fix bug with implementation of wildcard: we don't want to match a wildcard if the character we are getting follow-positions for has an ASCII code of 0, because we are using that as an endmarker 2025-10-06 12:12:23 +01:00
ea01f1689c fix bug in search-list.sml: when we find a match, we should start 1 idx after the end position of the match 2025-10-06 11:58:03 +01:00
cca2602429 fix bug in implementation of DFA algorithm: we need to add an end marker, and this will be used to tell us whether we have reached the final state in the DFA 2025-10-06 11:49:10 +01:00
3f30d49420 progress using dfa for searching 2025-10-06 09:55:05 +01:00
626aa0a860 add utility functions for using generated dfa 2025-10-06 09:06:04 +01:00
f554c0db29 change 'dtran' set to only contain integers indicating the index from dstates to transition to on char 2025-10-06 08:21:04 +01:00
a3287e71b9 take care of todo note addressing efficiency: don't update dtran vector on each 'convertChar' loop, but accumulate set and then append set to end of dtran at end of 'convertChar' loop 2025-10-06 08:11:30 +01:00