So we've seen that the job of the parsing stage is to reduce a program to a tree structure, and each node of the tree represents an operation. In this chapter, we'll look more closely at those operations: what they are, how they're coded, and how they fit together.
Just AVs and HVs are "extensions" of the basic SV structure, there are a number of different "flavours" of ops, built on a basic OP structure; you can find this structure defined as BASEOP in op.h:
OP* op_next; OP* op_sibling; OP* (CPERLscope(*op_ppaddr))(pTHX); PADOFFSET op_targ; OPCODE op_type; U16 op_seq; U8 op_flags; U8 op_private;
Some of these fields are easy to explain, so we'll deal with them now.
The op_next field is a pointer to the next op which needs to be executed. We'll see later, in Section 6.1.3, how the "thread of execution" is derived from the tree.
op_ppaddr is the address of the C function which carries out this particular operation. It's stored here so that our main execution code can simply dereference the function pointer and jump to it, instead of having to perform a lookup.
Each unique operation has a different number; this can be found in the enum in opnames.h:
typedef enum opcode { OP_NULL, /* 0 */ OP_STUB, /* 1 */ OP_SCALAR, /* 2 */ OP_PUSHMARK, /* 3 */ OP_WANTARRAY, /* 4 */ OP_CONST, /* 5 */ OP_GVSV, /* 6 */ OP_GV, /* 7 */ ... };The number of the operation to perform is stored in the op_type field. We'll examine some of the more interesting operations in Section 6.1.1.
op_flags is a set of flags generic to all ops; op_private stores flags which are specific to the type of op. For instance, the repeat op which implements the x operator has the flag OPpREPEAT_DOLIST set when it's repeating a list rather than a string. This flag only makes sense for that particular operation, so is stored in op_private. Private flags have the OPp prefix, and public flags begin with OPf.
op_seq is a sequence number allocated by the optimizer. It allows for, for instance, correct scoping of lexical variables by storing the sequence numbers of the beginning and end of scope operations inside the pad.
As for the remaining fields, we'll examine op_sibling in Section 6.1.2 and op_targ in Section 6.4
Perl has currently 351 different operations, implementing all the built-in functions and operators, as well as the more structural operations required internally - entering and leaving a scope, compiling regular expressions and so on.
The array PL_op_desc in opcode.h describes each operation: it may be easier to follow the data from which this table is generated, at the end of opcode.pl. We'll take a longer look at that file later on in this chapter.
Many of the operators are familiar from Perl-space, such as concat and splice, but some are used purely internally: for instance, one of the most common, gvsv fetches a scalar variable; enter and leave are block control operators, and so on.
There are a number of different "flavours" of op structure, related to the arguments of an operator and how it fits together with other ops in the op tree. For instance, scalar is a unary operator, a UNOP. This extends the basic op structure above with a link to the argument:
struct unop { BASEOP OP * op_first; };Binary operators, such as i_add, (integer addition) have both a first and a last:
struct binop { BASEOP OP * op_first; OP * op_last; };
List operators are more interesting; they too have a first and a last, but they also have some ops in the middle, too. This is where op_sibling above comes in; it connects ops "sibling" ops on the same level in a list. For instance, look at the following code and the graph of its op tree:
open FILE, "foo"; print FILE "hi\n"; close FILE;
The dashed lines represent op_sibling connections. The root operator of every program is the list operator leave, and its children are the statements in the program, separated by nextstate (next statement) operators. open is also a list operator, as is print. The first child of print is pushmark, which puts a mark on the stack (see Section 6.2.1) so that Perl knows how many arguments on the stack belong to print. The rv2gv turns a reference to the filehandle FILE into a GV, so that print can print to it, and the final child is the constant "hi\n".
Some operators hold information about the program; these are COPs, or "code operators". Their definition is in cop.h:
struct cop { BASEOP char * cop_label; /* label for this construct */ #ifdef USE_ITHREADS char * cop_stashpv; /* package line was compiled in */ char * cop_file; /* file name the following line # is from */ #else HV * cop_stash; /* package line was compiled in */ GV * cop_filegv; /* file the following line # is from */ #endif U32 cop_seq; /* parse sequence number */ I32 cop_arybase; /* array base this line was compiled with */ line_t cop_line; /* line # of this command */ SV * cop_warnings; /* lexical warnings bitmask */ SV * cop_io; /* lexical IO defaults */ };COPs are inserted between every statement; they contain the label (for goto, next and so on) of the statement, the file name, package and line number of the statement and lexical hints such as the current value of $[, warnings and IO settings. Note that this doesn't contain the current CV or the padlist - these are kept on a special stack called the "context stack".
The final type of op is the null op: any op with type zero means that a previous op has been optimized away; we'll look at how this is done later in this chapter, but for now, you should skip over the null op when you see it in op trees.
We've so far seen a little of how the op tree is connected together with op_first, op_last, op_sibling, and so on. Now we'll look at how the tree gets manufactured, as how it gets executed.
After our investigation of the parser in the previous chapter, it should now be straightforward to see how the op tree is created. The parser calls routines in op.c which create the op structures, passing ops further "down" the parse tree as arguments. This threads together a tree as shown in the diagram above. For comparison, here is the what the example in that chapter ($a = $b + $c) really looks like as an op tree:
The second thread through the op tree, indicated by the dotted line in our diagrams, is the execution order. This is the order in which Perl must actually perform the operations in order to run the program. The main loop of Perl is very, very simple, and you can see it in run.c:
while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { PERL_ASYNC_CHECK(); }That's it. That's all the Perl interpreter is. PL_op represents the op that's currently being executed. Perl calls the function pointer for that op and expects another op to be returned; this return value is then set to PL_op, which is executed in turn. Since everything apart from conditional operators (for obvious reasons) just return PL_op->op_next, the execution order through a program can be found by chasing the trail of op_next pointers from the start node to the root.
We can trace the execution order in several ways: if Perl is built with debugging, then we can say
perl -Dt -e 'open ...'
Alternatively, and perhaps more simply, the compiler module B::Terse (see Chapter 7) has an option to print the execution order, -exec. For instance, in our "open-print-close" example above, the execution order is:
% perl -MO=Terse,-exec -e 'open FILE, "foo"; ...' OP (0x8111510) enter COP (0x81121c8) nextstate OP (0x8186f30) pushmark SVOP (0x8186fe0) gv GV (0x8111bd8) *FILE SVOP (0x8186f10) const PV (0x810dd98) "foo" LISTOP (0x810a170) open [1] COP (0x81114d0) nextstate OP (0x81114b0) pushmark SVOP (0x8118318) gv GV (0x8111bd8) *FILE UNOP (0x8111468) rv2gv SVOP (0x8111448) const PV (0x8111bfc) "hi\n" LISTOP (0x8111488) print COP (0x8111fe0) nextstate SVOP (0x8111fc0) gv GV (0x8111bd8) *FILE UNOP (0x8111fa0) close LISTOP (0x8111420) leave [1]This program, just like every other program, starts with the enter and nextstate ops to enter a scope and begin a new statement respectively. Then a mark is placed on the argument stack: marks represent the start of a set of arguments, and a list operator can retrieve all the arguments by pushing values off the stack until it finds a mark. Hence, we're notifying Perl of the beginning of the arguments to the open operator.
The arguments in this case are merely the file handle to be opened and the file name; after operators put these two arguments on the stack, open can be called. This is the end of the first statement.
Next, the arguments to print begin. This is slightly more tricky, because while open can only take a true filehandle, print may take any sort of reference. Hence, gv returns the GV and then this is turned into the appropriate filehandle type by the rv2gv operator. After the filehandle come the arguments to be printed; in this case, a constant ("hi\n"). Now all the arguments have been placed on the stack, print can be called. This is the end of the second statement.
Finally, a filehandle is put on the stack and closed. Note that at this point, the connections between the operators - unary, binary, etc. - are not important; all manipulation of values comes not by looking at the children of the operators but by looking at the stack. The types of op are important for the construction of the tree in "tree order", but the stack and the op_next pointers are the only important things for the execution of the tree in execution order.
How is the execution order determined? The function linklist in op.c takes care of threading the op_next pointers in prefix order. It does so by recursively applying the following rule:
If there is a child for the current operator, visit the child first, then its siblings, then the current op.