|
|
111e0cf66d
|
remove usage of concurrent ml, deciding that we prefer to run everything in the main thread instead
|
2025-10-17 23:08:16 +01:00 |
|
|
|
22a8b807d2
|
handle edge case when building dfa from a string, where an exception was raised when our search regex contains an alternation where the second alternation is a substring of the first alternation, and add a test for it to make sure that it passes
|
2025-10-14 02:24:45 +01:00 |
|
|
|
ca2c2f438c
|
when adding to followset in ONE_OR_MORE case, make sure we add the child to the followset as well
|
2025-10-12 00:32:55 +01:00 |
|
|
|
ce3470e612
|
fix bug in regex-test: dfa-gen.sml should add the position of the endMarker to the followSet as well
|
2025-10-12 00:22:14 +01:00 |
|
|
|
7f1f1f7bdc
|
at end of char loop, track if length of dstate changed. If it did not, that means that we have encountered a loop that is at the end; thus, we should add the endMarker
|
2025-10-11 13:39:28 +01:00 |
|
|
|
b2931753d0
|
make dfa-gen.sml compile again, with parity before reimplementing it
|
2025-10-11 13:23:44 +01:00 |
|
|
|
96f0afc2b2
|
attempt at fixing dfa-gen to convert properly
|
2025-10-11 11:32:30 +01:00 |
|
|
|
a44afca40b
|
checkpoint for reimplementing dfa-gen.sml
|
2025-10-10 11:54:34 +01:00 |
|
|
|
5a43954aef
|
checkpoint
|
2025-10-10 04:59:32 +01:00 |
|
|
|
244d0ce26d
|
begin attempt to compute followpos properly
|
2025-10-10 04:44:18 +01:00 |
|
|
|
bdfca17b5a
|
implement function to insert a list to a pos
|
2025-10-10 04:00:34 +01:00 |
|
|
|
58c3e65fdd
|
add list of follows to leaves in regex parse tree (only changed data type; need to populate follows list later)
|
2025-10-10 03:49:09 +01:00 |
|
|
|
108a30ea79
|
add utility function to insert from a list into a set
|
2025-10-10 03:29:52 +01:00 |
|
|
|
88eb30dbf2
|
done caching firstpos and lastpos, and using the cached data
|
2025-10-10 01:56:54 +01:00 |
|
|
|
6e646bdffa
|
begin computing firstpos and lastpos during parsing
|
2025-10-10 01:43:24 +01:00 |
|
|
|
3197315478
|
fix failing tests for escaping regex metacharacters
|
2025-10-09 06:22:21 +01:00 |
|
|
|
a5fec6f1a2
|
add tests for escape sequences
|
2025-10-09 06:06:07 +01:00 |
|
|
|
250ae239be
|
begin adding tests for regex
|
2025-10-09 05:34:32 +01:00 |
|
|
|
0de7a9278a
|
progress implementing help-prev-match for vector
|
2025-10-08 10:27:19 +01:00 |
|
|
|
3b823d7ae6
|
delete 'nextMatch' function in search-list.sml, and refactor other code to use alternative function
|
2025-10-08 08:16:20 +01:00 |
|
|
|
108e021fdb
|
log an exception if search-thread encounters a failure
|
2025-10-08 06:51:52 +01:00 |
|
|
|
3c2e5812cd
|
reimplement function to search through text from scratch
|
2025-10-08 06:35:49 +01:00 |
|
|
|
06106f5de8
|
remove 'searchString' field from app_type, because the same role is fulfilled by new 'dfa' field
|
2025-10-08 05:40:29 +01:00 |
|
|
|
8857f49537
|
pass DFA to 'SearchList.buildRange' function, so that we don't need to parse search string into DFA each time
|
2025-10-08 05:20:33 +01:00 |
|
|
|
7a72bc2ed1
|
done with allowing different types of endMarkers
|
2025-10-07 14:44:40 +01:00 |
|
|
|
f085860f20
|
begin making changes to return a parse error if regex string contains an end marker
|
2025-10-07 14:36:35 +01:00 |
|
|
|
060df2745a
|
fix bugs: only wildcard and character-class-negation should check to see if curChr is an endmarker
|
2025-10-07 14:30:23 +01:00 |
|
|
|
c62e234d00
|
change dfa-gen to a functor, and use functor to instantiate different structures
|
2025-10-07 14:05:45 +01:00 |
|
|
|
075fec02be
|
handle edge case in char range: escaped char followed by another escaped char
|
2025-10-07 12:28:14 +01:00 |
|
|
|
4dfee016eb
|
handle edge case in char-range: in a range like a-z, the second character may be an escape sequence, and we need to handle that case if so
|
2025-10-07 12:13:41 +01:00 |
|
|
|
44c2fbb3c7
|
handle character ranges like a-z in character class and negated character class
|
2025-10-07 09:48:10 +01:00 |
|
|
|
3d4dbdda69
|
done adding functionality for parsing character classes
|
2025-10-07 09:31:24 +01:00 |
|
|
|
8eed2ef51a
|
add support for [^negated_character_classes], although we don't parse them yet
|
2025-10-07 08:57:01 +01:00 |
|
|
|
d6142285da
|
add handling for [character class] type (but note that we don't parse a character class yet)
|
2025-10-07 08:51:46 +01:00 |
|
|
|
56658a4a70
|
only convert char to int in dfa-gen.sml's 'convertChar' loop
|
2025-10-07 08:43:15 +01:00 |
|
|
|
71786a494c
|
fix minor bug with escape sequences: we should pattern match on an unescaped char, and we should return an escaped char. For example, it makes sense to pattern match on plain unescaped /home/humza/Downloads/sml/shf/temp.txt"n" and return /home/humza/Downloads/sml/shf/temp.txt"\n". This is because user inputs escape-chars as a two-char sequence, prepended by a backslash \ character
|
2025-10-06 21:58:50 +01:00 |
|
|
|
cc5c0bf95c
|
implement escape sequences for regex
|
2025-10-06 21:44:57 +01:00 |
|
|
|
0bcb3e1dfc
|
done implementing '+' and '?' regex operators
|
2025-10-06 20:41:52 +01:00 |
|
|
|
dcd930f855
|
begin expanding ? and + regex symbols, which we can represent using a combination of the others
|
2025-10-06 17:08:57 +01:00 |
|
|
|
9dd44c8eca
|
fix implementation of ZERO_OR_MORE (Kleene star) in dfa-gen.sml
|
2025-10-06 14:45:28 +01:00 |
|
|
|
2779b61c1f
|
amendment to 'lastpos' function: if right child is not nullable, then get lastpos of right child, or else get union of them both
|
2025-10-06 13:50:34 +01:00 |
|
|
|
e05c690548
|
fix bug with implementation of wildcard: we don't want to match a wildcard if the character we are getting follow-positions for has an ASCII code of 0, because we are using that as an endmarker
|
2025-10-06 12:12:23 +01:00 |
|
|
|
ea01f1689c
|
fix bug in search-list.sml: when we find a match, we should start 1 idx after the end position of the match
|
2025-10-06 11:58:03 +01:00 |
|
|
|
cca2602429
|
fix bug in implementation of DFA algorithm: we need to add an end marker, and this will be used to tell us whether we have reached the final state in the DFA
|
2025-10-06 11:49:10 +01:00 |
|
|
|
3f30d49420
|
progress using dfa for searching
|
2025-10-06 09:55:05 +01:00 |
|
|
|
626aa0a860
|
add utility functions for using generated dfa
|
2025-10-06 09:06:04 +01:00 |
|
|
|
f554c0db29
|
change 'dtran' set to only contain integers indicating the index from dstates to transition to on char
|
2025-10-06 08:21:04 +01:00 |
|
|
|
a3287e71b9
|
take care of todo note addressing efficiency: don't update dtran vector on each 'convertChar' loop, but accumulate set and then append set to end of dtran at end of 'convertChar' loop
|
2025-10-06 08:11:30 +01:00 |
|
|
|
6ae38189cf
|
previously, dtran was a {states: int list, transitions: set} record, but because the states are the exact same as the information in dstates (at same position too), we changed dtran to contain only the transitions
|
2025-10-06 07:53:05 +01:00 |
|
|
|
c995d3cdf7
|
if we encounter an empty state when getting follow positions, skip to next char
|
2025-10-06 07:44:46 +01:00 |
|