Finite State Automata and Lexing-Parsing
Finite State Machines & lexing-parsing
My descriptor language parser applies top-down recursive descent using a custom implementation of regular expression.
Patterns can be specified with a subset of the Basic Regular Expression language (BRE) and are adapted to a node representation.
var nfa = new Fsa(); nfa.Build("(a|b|c){1,3}", (int)Token1); nfa.Build("int|long", (int)Token2); nfa.Build("[A-Za-z0-9]+", (int)Token3); ...
Furthermore, several Nondeterministic Finite Automata (NFAs) can be formed from several regex expressions to match a corresponding set of tokens in a language.
All possible tokens which can be matched by the state machine are knit into one NFA, which can be used to match tokens as-is (using non-deterministic breadth-first evaluation of the NFA)
var (token, match) = nfa.Search(source, offset);
or further optimized using DFA conversion and equivalence partioning.
var dfa = nfa.ConvertToDfa().MinimizeDfa(); var (token, match) = dfa.Search(source, offset);
Usage of the machine will resemble that of tools such as (F)lex. Matched tokens can be passed off to a parsing algorithm (e.g., LALR(1)), or a developer may opt to implement recursive descent parsing by manually analyzing tokens.
Perhaps the token "if" is found, so the following code is invoked to match a statement in source:
var stream = new TokenStream() { Grammar = dfa, Source = inputSource }; stream.Seek(startOffset); var ifDto = new IfDto(); // "if" "(" {predicate expression} ")" if (stream.Poll() != (int)Token.If) { throw new Exception("Expected 'if'"); } if (stream.Poll() != (int)Token.LParens) { throw new Exception("Expected '('"); } ifDto.Predicate = ExpressionGrammar.Match(stream); if (stream.Poll() != (int)RParens) { throw new Exception("Expected ')'"); } // "{" {statement} [, ...] "}" | {statement} ifDto.Body = StatementGrammar.MatchBody(stream); // ... [ "else" {failure body} ]
More grammar can follow at any level of complexity supported by the chosen parsing methodology.