If you want really to understand how the GNU Pascal language front-end works internally and perhaps want to improve the compiler, it is important that you understand GPC's internal data structures.
The data structure used by the language front-end to hold all information about your Pascal program are the so-called “tree nodes”. (Well, it needn't be Pascal source – tree nodes are language independent.) The tree nodes are kind of objects, connected to each other via pointers. Since the GNU compiler is written in C and was created at a time where nobody really thought about object-oriented programming languages yet, a lot of effort has been taken to create these “objects” in C.
Here is an extract from the “object hierarchy”. Omissions are
marked with “...”; nodes in parentheses are “abstract”: They
are never instantiated and aren't really defined. They only appear
here to clarify the structure of the tree node hierarchy. The
complete list is in ../tree.def
; additional information can
be found in ../tree.h
.
(tree_node) | |--- ERROR_MARK { enables GPC to continue after an error } | |--- (identifier) | | | |--- IDENTIFIER_NODE | | | \--- OP_IDENTIFIER | |--- TREE_LIST { a list of nodes, also used as a | general-purpose "container object" } | |--- TREE_VEC | |--- BLOCK | |--- (type) { information about types } | | | |--- VOID_TYPE | | | |--- INTEGER_TYPE | ... | | | |--- RECORD_TYPE | | | |--- FUNCTION_TYPE | | | \--- LANG_TYPE { for language-specific extensions } | |--- INTEGER_CST { an integer constant } | |--- REAL_CST | |--- STRING_CST | |--- COMPLEX_CST | |--- (declaration) | | | |--- FUNCTION_DECL | ... | | | |--- TYPE_DECL | | | \--- VAR_DECL | |--- (reference) | | | |--- COMPONENT_REF | ... | | | \--- ARRAY_REF | |--- CONSTRUCTOR | \--- (expression) | |--- MODIFY_EXPR { assignment } | |--- PLUS_EXPR { addition } ... | |--- CALL_EXPR { procedure/function call } | |--- GOTO_EXPR | \--- LOOP_EXPR { for all loops }
Most of these tree nodes – struct variables in fact – contain
pointers to other tree nodes. A TREE_LIST
for instance has a
TREE_VALUE
and a TREE_PURPOSE
slot which can contain
arbitrary data; a third pointer TREE_CHAIN
points to the next
TREE_LIST
node and thus allows us to create linked lists of
tree nodes.
One example: When GPC reads the list of identifiers in a variable declaration
var Foo, Bar, Baz: Integer;
the parser creates a chain of TREE_LIST
s whose
TREE_VALUE
s hold IDENTIFIER_NODE
s for the identifiers
Foo
, Bar
, and Baz
. The function
declare_variables()
(declared in declarations.c
) gets
this tree list as a parameter, does some magic, and finally passes a
chain of VAR_DECL
nodes to the back-end.
The VAR_DECL
nodes in turn have a pointer TREE_TYPE
which holds a _TYPE
node – an INTEGER_TYPE
node in
the example above. Having this, GPC can do type-checking when a
variable is referenced.
For another example, let's look at the following statement:
Baz := Foo + Bar;
Here the parser creates a MODIFY_EXPR
tree node. This node
has two pointers, TREE_OPERAND[0]
which holds a
representation of Baz
, a VAR_DECL
node, and
TREE_OPERAND[1]
which holds a representation of the sum
Foo + Bar
. The sum in turn is represented as a
PLUS_EXPR
tree node whose TREE_OPERAND[0]
is the
VAR_DECL
node Foo
, and whose TREE_OPERAND[1]
is
the VAR_DECL
node Bar
. Passing this (the
MODIFY_EXPR
node) to the back-end results in assembler code
for the assignment.
If you want to have a closer look at these tree nodes, write a line
{$debug-tree FooBar}
into your program with FooBar
being some identifier in your program. This tells GPC to output the
contents of the IDENTIFIER_NODE
to the standard error file
handle in human-readable form.
While hacking and debugging GPC, you will also wish to have a look
at these tree nodes in other cases. Use the debug_tree()
function to do so. (In fact {$debug-tree FooBar}
does
nothing else than to debug_tree()
the
IDENTIFIER_NODE
of the Foobar
identifier node – note
the capitalization of the first character in the internal
representation.)