re_graph.pl - Graph regular expression
re_graph.pl [-d] [-f re_file] [-P] [regular expression] [regular expression] [...]
The re_graph.pl program graphs regular expressions. The guts of the regular expression engine is a simple state machine. The various states and operations in the regular expression parser can be displayed using a surprisingly simple diagram.
A few notes on what you are looking at:
The nodes Start and Stop denote the beginning and end of the regular expression.
The solid squares denote atoms. Lines indicate the next state. When a line splits, the state machine will take the top line first. If it's path is blocked it will backup and take the next lower line. This is repeated until it finds a path to the end or all paths are exhausted.
Brown boxes indicate a grouping operation, i.e. ().
Green boxes indicate a zero with test. The state machine will perform the test inside the box before moving ahead.
For more information, see the tutorial below.
Note: If no regular expressions are specified, a list consisting of the items in the tutorial is loaded.
When pressed it creates a file called tmp.ps which should contain a postscript version of your graph. (It has a few problems that prevent the output from being what you want.)
This tutorial shows what happens when a set of sample regular expressions are graphed. This set of regular expressions closely follows the Chapter 4 of ``Perl for C Programmers'' by Steve Oualline.
The set of regular expressions used for this tutorial is:
test ^Test ^ *# ^[ \t]*# ^\s*# ([^#]*)(#.*) a|b ^(([^#"]*|".*")*)(#.*) ^((?:[^#"]*|".*?")*)(#.*) ^((?:[^#"]*|".*?(?<!\\)")*)(#.*)
Let's take a look at the graphs produced by these expressions.
The ``Start'' node indicates the start of the regular expression.
The ``EXACT <test>'' node tells the engine that the text must match the text ``test'' exactly.
The ``END'' node indicates the end of the regular expression. If you reach the ``END'' node, a successful match was found.
Flow is a straight line from ``Start'', through the ``EXACT'' check, to end.
The way the state machine works it that it starts at ``Start'' and works it's way through the nodes. You'll notice that between ``BOL'' and ``EXACT < >'' there's a fork in the road.
The state machine will take the top branch if possible. So if the next character is a space, the system will take the top branch and match the ``EXACT < >'' node. If not, the bottom branch is taken and we wind up at the ``EXACT <#>'' node.
If there's no path to the ``END'', then we don't have a match.
The other change is that we use the expression [^#], which matches anything except a hash mark. Perl changes this to a ``ANYOF'' clauses which matches all characters except the single one we don't want.
Note: This ANYOF node overflows the size of the box. This is a know bug.
This expression adds nested grouping, and some additional stuff that we've seen before.
(In future versions of this graphing tool we will graph the invisible group operator. We just did figure out how to do it yet.)
Also notice the use of the ``*?'' operator. Remember when going through the nodes, when a branch is encountered, the system will try and take the top one first.
Basically the clause in question looks for a double quoted string (``xxx''), but will ignore a double quote it's escaped (``xxx\''yyy``).
This will not graph all the regular expressions. Some of the more advanced features of the engine are just not handled.
We currently ``graph'' the ``group, no $1'' (?:..) operator by displaying nothing. A box should be put around the expression.
The boxes drawn by the program are a fixed with not related to the size of the text they contain. Text can easily overflow the box.
The system is UNIX/Linux specific. This is caused by only one small section of code should anyone want to port this to a braindamaged operating system.
Better use of color can be made. Specifically all the nodes do not have to be green. Come to think of it they call don't have to be rectangles either.
Sometimes the lines connecting one section to another take some strange twists.
Horizontal scrolling of the regular expression could be better.
I would like to connect this to the execution of the regular expression engine so we can see the thing in action.
Licensed under the GPL. (See the end of the source file for a copy).
Steve Oualline (oualline@www.oualline.com)