Zach Goethel

Finite State Automata and Lexing-Parsing

Finite State Machines & lexing-parsing


(above: construction and minimization of a finite state automaton; View Full Size )

My descriptor language parser applies top-down recursive descent using a custom implementation of regular expression.

Patterns can be specified with a subset of the Basic Regular Expression language (BRE) and are adapted to a node representation.

var nfa = new Fsa();
nfa.Build("(a|b|c){1,3}", (int)Token1);
nfa.Build("int|long",     (int)Token2);
nfa.Build("[A-Za-z0-9]+", (int)Token3);
...

Furthermore, several Nondeterministic Finite Automata (NFAs) can be formed from several regex expressions to match a corresponding set of tokens in a language.

All possible tokens which can be matched by the state machine are knit into one NFA, which can be used to match tokens as-is (using non-deterministic breadth-first evaluation of the NFA)

var (token, match) = nfa.Search(source, offset);

or further optimized using DFA conversion and equivalence partioning.

var dfa = nfa.ConvertToDfa().MinimizeDfa();
var (token, match) = dfa.Search(source, offset);

Usage of the machine will resemble that of tools such as (F)lex. Matched tokens can be passed off to a parsing algorithm (e.g., LALR(1)), or a developer may opt to implement recursive descent parsing by manually analyzing tokens.

Perhaps the token "if" is found, so the following code is invoked to match a statement in source:

Fsa.*.cs—Implementation of regular expression, minimization, and lexing
TokenStream.cs—Wrapper for searching an input source text token-by-token

var stream = new TokenStream()
{
	Grammar = dfa,
	Source = inputSource
};
stream.Seek(startOffset);
var ifDto = new IfDto();

// "if" "(" {predicate expression} ")"
if (stream.Poll() != (int)Token.If)
{
	throw new Exception("Expected 'if'");
}
if (stream.Poll() != (int)Token.LParens)
{
	throw new Exception("Expected '('");
}
ifDto.Predicate = ExpressionGrammar.Match(stream);
if (stream.Poll() != (int)RParens)
{
	throw new Exception("Expected ')'");
}

// "{" {statement} [, ...] "}" | {statement}
ifDto.Body = StatementGrammar.MatchBody(stream);

// ... [ "else" {failure body} ]

More grammar can follow at any level of complexity supported by the chosen parsing methodology.

Created by Zach Goethel. Have a wonderful day!