4.3. More Complex Types

Sometimes the information provided in an ordinary SV, HV or AV isn't enough for what Perl needs to do. For instance, how does one represent objects? What about tied variables? In this section, we'll look at some of the complications of the basic SV types.

The entirety of this section should be considered advanced material; it will not be covered in the course. Readers following the course should skip to the next section, Section 4.4 and study this in their own time.

4.3.1. Objects

Objects are represented relatively simply. As we know from ordinary Perl programming, an object is a reference to some data which happens to know which package it's in. In the definitions of AVs and HVs above, we saw the line

    HV*     xmg_stash;  /* class package */
	
As we'll see in Section 4.3.4, packages are known as "stashes" internally and are represented by hashes. The xmg_stash field in AVs and HVs is used to store a pointer to the stash which "owns" the value.

Hence, in the case of an object which is an array reference, the dump looks like this:

% perl -MDevel::Peek -e '$a=bless [1,2]; Dump($a)'                                                          (1)
SV = RV(0x81586d4) at 0x815b7a0
  REFCNT = 1
  FLAGS = (ROK)
  RV = 0x8151b0c
  SV = PVAV(0x8153074) at 0x8151b0c
    REFCNT = 1
    FLAGS = (OBJECT)                                       (2)
    IV = 0
    NV = 0
    STASH = 0x8151a34	"main"                               (3)
    ARRAY = 0x815fcf8
    FILL = 1
    MAX = 1
    ARYLEN = 0x0
    FLAGS = (REAL)
    Elt No. 0
    SV = IV(0x815833c) at 0x8151bc0
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 1
    Elt No. 1
    SV = IV(0x8158340) at 0x8151c44
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 2
	
(1)
We create an array reference and bless it into the main package.
(2)
The OBJECT flag is turned on to signify that this SV is an object.
(3)
And now we have a pointer to the appropriate stash in the STASH field.

4.3.2. Magic

This works for AVs and HVs which have a STASH field, but what about ordinary scalars? There is an additional, more complex type of scalar, which can hold both stash information and also permits us to hang additional, miscellaneous information onto the SV. This miscellaneous information is called "magic", (partially because it allows for clever things to happen, and partially because nobody really knows how it works) and the complex SV structure is a PVMG. We can create a PVMG by blessing a scalar reference:

% perl -MDevel::Peek -le '$b="hi";$a=bless \$b, main; print Dump($a)'
SV = RV(0x8106ca4) at 0x810586c
  REFCNT = 1
  FLAGS = (ROK)
  RV = 0x81058c0
  SV = PVMG(0x810e628) at 0x81058c0
    REFCNT = 2
    FLAGS = (OBJECT,POK,pPOK)
    IV = 0
    NV = 0
    PV = 0x80ff698 "hi"\0
    CUR = 2
    LEN = 3
    STASH = 0x80f1388	"main"
As you can see, this is similar to the PVNV structure we saw in Section 4.1.5, with the addition of the STASH field. There's also another field, which we can see if we look at the definition of xpvmg:
struct xpvmg {
    char *  xpv_pv;     /* pointer to malloced string */
    STRLEN  xpv_cur;    /* length of xpv_pv as a C string */
    STRLEN  xpv_len;    /* allocated size */
    IV      xiv_iv;     /* integer value or pv offset */
    NV      xnv_nv;     /* numeric value, if any */
    MAGIC*  xmg_magic;  /* linked list of magicalness */
    HV*     xmg_stash;  /* class package */
};
The xmg_magic field provides us with somewhere to put a magic structure. What's a magic structure, then? For this, we need to look in mg.h:

struct magic {
    MAGIC*  mg_moremagic;                                  (1)
    MGVTBL* mg_virtual; /* pointer to magic functions */   (2)
    U16     mg_private;                                    (3)
    char    mg_type;                                       (4)
    U8      mg_flags;                                      (5)
    SV*     mg_obj;                                        (6)
    char*   mg_ptr;                                        (7)
    I32     mg_len;                                        (8)
};
(1)
First, we have a link to another magic structure: this creates a linked list, allowing us to hang multiple pieces of magic off a single SV.
(2)
The magic virtual table is a list of functions which should be called to perform particular operations on behalf of the SV. For instance, a tied variable will automagically call the C function magic_getpack when its value is being retrieved. (This function will, in turn, call the FETCH method on the appropriate object.)

The magic virtual tables are provided by Perl - they're in perl.h and all begin PL_vtbl_. For instance, the virtual table for %ENV is PL_vtbl_env, and the table for individual elements of the %ENV hash is PL_vtbl_envelem.

In theory, you can create your own virtual tables by providing functions to fill the mgvtbl struct in mg.h, to allow for really bizarre behaviour to be triggered by accesses to your SVs. In practice, nobody really does that, although it's conceivable that you can improve the speed of pure-C tied variables that way. See also the discussion of "U" magic in Section 4.3.3 below.

(3)
This is a storage area for data private to this piece of magic. The Perl core doesn't use this, but you can if you're building your own magic types. For instance, you can use it as a "signature" to ensure that this magic was created by your extension, not by some other module.
(4)
Magic comes in a number of varieties: as well as providing for tied variables, magic propagates taintedness, makes special variables such as %ENV and %SIG work, and allows for special things to happen when expressions like substr($a,0,10) or $#array are assigned to.

Each of these different types of magic have a different "code letter" - the letters in use are shown in perlguts.

(5)
There are only four flags in use for magic; the most important is MGf_REFCOUNTED, which is set if mg_obj had its reference count increased when it was added to the magic structure.
(6)
This is another storage area; it's normally used to point to the object of a tied variable, so that tied functions can be located.
(7)
The pointer field is set when you add magic to an SV with the sv_magic function. (see below) You can put anything you like here, but it's typically the name of the variable. Built-in magical virtual table functions such as magic_get check this to process Perl's special variables.
(8)
This is the length of the string in mg_ptr.

What happens when the value of an SV with magic is retrieved? Firstly, a function should call SvGETMAGIC(sv) to cause any magic to be performed. This in turn calls mg_get which walks over the linked list of magic. For each piece of magic, it looks in the magic virtual table, and calls the magical "get" function if there is one.

Let's assume that we're dealing with one of Perl's special variables, which has only one piece of magic, "\0" magic. The appropriate magic virtual table for "\0" magic is PL_vtbl_sv, which is defined as follows: (in perl.h)

EXT MGVTBL PL_vtbl_sv = {MEMBER_TO_FPTR(Perl_magic_get),
                         MEMBER_TO_FPTR(Perl_magic_set),
                         MEMBER_TO_FPTR(Perl_magic_len),
                         0,      0};
	
Magic virtual tables have five elements, as seen in mg.h:
struct mgvtbl {
    int     (CPERLscope(*svt_get))  (pTHX_ SV *sv, MAGIC* mg);
    int     (CPERLscope(*svt_set))  (pTHX_ SV *sv, MAGIC* mg);
    U32     (CPERLscope(*svt_len))  (pTHX_ SV *sv, MAGIC* mg);
    int     (CPERLscope(*svt_clear))(pTHX_ SV *sv, MAGIC* mg);
    int     (CPERLscope(*svt_free)) (pTHX_ SV *sv, MAGIC* mg);
};
So the above virtual table means "call Perl_magic_set when we want to get the value of this SV; call Perl_magic_set when we want to set it; call Perl_magic_len when we want to find its length; do nothing if we want to clear it or when it is freed from memory."

In this case, we are getting the value, so magic_get is called. [1] This function looks at the value of mg_ptr, which, as noted above, is often the name of the variable. Depending on the name of the variable, it determines what to do: for instance, if mg_ptr is "!", then the current value of the C variable errno is retrieved.

A similar process is performed by SvSETMAGIC(sv) to call functions that need to be called when the value of an SV changes.

4.3.3. Tied Variables

Tied arrays and hashes are implementing by adding type "P" magic to their AVs and HVs; individual elements of the arrays and hashes have "p" magic. Tied scalars and filehandles have type "q" magic. The virtual tables for, for instance, "p" magic scalars look like this:

EXT MGVTBL PL_vtbl_packelem =   {MEMBER_TO_FPTR(Perl_magic_getpack),
                                 MEMBER_TO_FPTR(Perl_magic_setpack),
                                 0,  
                                 MEMBER_TO_FPTR(Perl_magic_clearpack),
                                 0}
That's to say, the function magic_getpack is called when the value of an element of a tied array or hash is retrieved. This function in turn performs a FETCH method call on the object stored in mg_obj.

We can invent our own "pseudo-tied" variables, using the user-defined "U" magic. "U" magic only works on scalars, and allows us to call a function when the value of the scalar is got or set. The virtual table for "U" magic scalars is as follows:

EXT MGVTBL PL_vtbl_uvar =   {MEMBER_TO_FPTR(Perl_magic_getuvar),
                             MEMBER_TO_FPTR(Perl_magic_setuvar),
                             0,  0,  0};
As you should by now expect, these functions are called when the value of the scalar is accessed. They in turn call our user-defined functions. But how do we tell them what our functions are? In this case, we pass a pointer to a special structure in the mg_ptr field; the structure is defined in perl.h, and looks like this:
struct ufuncs {
    I32 (*uf_val)(IV, SV*);
    I32 (*uf_set)(IV, SV*);
    IV uf_index;
};
Here are our two function pointers: uf_val is called with the value of uf_index and the scalar when the value is sought, and uf_set is called with the same parameters when it is set.

Hence, the following code allows us to emulate $!:

I32 get_errno(IV index, SV* sv) {
    sv_setiv(sv, errno);
}

I32 set_errno(IV index, SV* sv) {
    errno = SvIV(sv); /* Some Cs don't like us setting errno, but hey */
}

struct ufuncs uf;

/* This is XS code */

void
magicify(sv)
    SV *sv;
CODE:
    uf.uf_val = &get_errno;
    uf.uf_set = &set_errno;
    uf.uf_index = 0;
    sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf));

If you need any more flexibility than that, it's time to look into "~" magic.

4.3.4. Globs and Stashes

SVs that represent variables are kept in the symbol table; as you'll know from your Perl programming, the symbol table starts at %main:: and is an ordinary Perl hash, with the package and variable names as hash keys. But what are the hash values? Let's have a look:

% perl -le '$a=5; print ${main::}{a}'
*main::a
Well, that doesn't tell us very much - at first sight it just looks like an ordinary string. But if we use Devel::Peek on it, we find it's actually something else - a glob, or GV:

% perl -MDevel::Peek -e '$a=5; Dump ${main::}{a}'
SV = PVGV(0x80fe3e0) at 0x80fb3ec
  REFCNT = 2
  FLAGS = (GMG,SMG)                                        (1)
  IV = 0
  NV = 0
  MAGIC = 0x80fea50
    MG_VIRTUAL = &PL_vtbl_glob                             (1)
    MG_TYPE = '*'
    MG_OBJ = 0x80fb3ec                                     (2)
    MG_LEN = 1
    MG_PTR = 0x81081d8 "a"
  NAME = "a"                                               (3)
  NAMELEN = 1
  GvSTASH = 0x80f1388	"main"                               (4)
  GP = 0x80ff2b0                                           (5)
    SV = 0x810592c                                         (6)
    REFCNT = 1                                             (7)
    IO = 0x0                                               (8)
    FORM = 0x0                                             (8)
    AV = 0x0                                               (8)
    HV = 0x0                                               (8)
    CV = 0x0                                               (8)
    CVGEN = 0x0                                            (9)
    GPFLAGS = 0x0                                          (10)
    LINE = 1
    FILE = "-e"
    FLAGS = 0x0
    EGV = 0x80fb3ec	"a"
	
(1)
Globs have get and set magic to handle glob aliasing as well as the conversion to strings we saw above.
(2)
The glob's magic object points back to the GV itself, so that the magic functions can easily access it.
(3)
The "name" is simply the variable's unqualified name; this is combined with the "stash" below to make up the "full name".
(4)
The stash itself is a pointer to the hash in which this glob is contained.
(5)
This structure, a GP structure, actually holds the symbol table entry. It's separated out so that, in the case of aliased globs, multiple GVs can point to the same GP.
(6)
As we know, globs have several different "slots", for scalars, arrays, hashes and so on. This is the scalar slot, which is a pointer to an SV.
(7)
The GP is refcounted because we need to know how many GVs point to it, so it can be safely destroyed when no longer needed.
(8)
The other slots are a filehandle, a form, an array, a hash and a code value. (see Section 4.3.5)
(9)
This stores the "age" of the code value. Every time a subroutine is defined, Perl increments the variable PL_sub_generation. This can be used as a way of checking the method cache: if the current value of PL_sub_generation is equal to the one stored in a GP, this GP is still valid.
(10)
The GP's flags are currently unused.

Symbol tables are considered some of the hairiest voodoo in the Perl internals.

From C, the variable PL_defstash is the HV representing the main:: stash; PL_curstash contains the current package's stash.

4.3.5. Code Values

The final data type we will examine is the CV, a code value used for storing subroutines. Both Perl and XSUB subroutines are stored in CVs, and blocks are also stored in CVs. The CV structure can be found in cv.h:

struct xpvcv {
    char *  xpv_pv;     /* pointer to malloced string */
    STRLEN  xpv_cur;    /* length of xp_pv as a C string */
    STRLEN  xpv_len;    /* allocated size */
    IV      xof_off;    /* integer value */
    NV      xnv_nv;     /* numeric value, if any */
    MAGIC*  xmg_magic;  /* magic for scalar array */
    HV*     xmg_stash;  /* class package */

    HV *    xcv_stash;                                     (1)
    OP *    xcv_start;                                     (2)
    OP *    xcv_root;                                      (2)
    void    (*xcv_xsub) (pTHXo_ CV*);                      (3)
    ANY     xcv_xsubany;                                   (4)
    GV *    xcv_gv;                                        (5)
    char *  xcv_file;                                      (6)
    long    xcv_depth;  /* >= 2 indicates recursive call */(7)
    AV *    xcv_padlist;                                   (8)
    CV *    xcv_outside;                                   (9)
#ifdef USE_THREADS
    perl_mutex *xcv_mutexp;                                (10)
    struct perl_thread *xcv_owner;  /* current owner thread */(10)
#endif /* USE_THREADS */
    cv_flags_t  xcv_flags;                                 (10)
}
(1)
Although it might look like this provides the CV's stash, it is important to note that this is a pointer to the stash in which the CV was compiled; for instance, given
package First;
sub Second::mysub { ...}
then xcv_stash points to First::. This is why, for instance,
package One;
$x = "In One";
package Two;
$x = "In Two";
sub One::test { print $x }
package main;
One::test();
will print "In Two".
(2)
For a subroutine defined in Perl, these two pointers hold the start and the root of the compiled op tree; this will be further in Chapter 6.
(3)
For an XSUB, on the other hand, this field contains a function pointer pointing to the C function implementing the subroutine.
(4)
This is how constant subroutines are implemented: Perl can arrange for the SV representing the constant to be returned by a constant XS routine, which is hung here.
(5)
This simply holds a pointer to the glob by which the subroutine was defined.
(6)
This stores the name of the file in which the subroutine was defined. For an XSUB, this will be the .c file rather than the .xs file.
(7)
This is a counter which is incremented each time the subroutine is entered and decremented when it is left; this allows Perl to keep track of recursive calls to a subroutine.
(8)
Explained below, xcv_padlist, the pad list, contains the lexical variables declared in a subroutine or code block.
(9)
Consider the following code:
{
   my $x = 0;
   sub counter { return ++$x; }
}
When inside counter, where does Perl "get" the SV $x from? It's not a global, so it doesn't live in a stash. It's not declared in counter, so it doesn't belong in counter's pad list. It actually belong to the pad list for the CV "outside" of counter. To enable Perl to get at these variables and also at lexicals used in closures, each CV contains a pointer to CV of the enclosing scope.

4.3.6. Lexical Variables

Global variables live, as we've seen, in symbol tables or "stashes". Lexical variables, on the other hand, are tied to blocks rather than packages, and so are stored inside the CV representing their enclosing block.

As mentioned briefly above, the xcv_padlist element holds a pointer to an AV. This array, the padlist, contains the names and values of lexicals in the current code block. Again, a diagram is the best way to demonstrate this:

The first element of the padlist - called the "padname" - is an array containing the names of the variables, and the other elements are lists of the current values of those variables. Why do we have several lists of current values? Because a CV may be entered several times - for instance, when a subroutine recurses. Having, essentially, a stack of frames ensures that we can restore the previous values when a recursive call ends. Hence, the current values of lexical variables are stored in the last element on the padlist.

From inside perl, you can get at the current pad as PL_curpad. Note that this is the pad itself, not the padlist. To get the padlist, you need to perform some awkwardness:

I32 cxix    = dopoptosub(cxstack_ix) /* cxstack_ix is a macro */
AV* padlist = cxix ? CvPADLIST(cxstadck[ix].blk_sub.cv) : PL_comppadlist;
We'll visit pads again when we look at operator targets in Section 6.4.

Notes

[1]

We'll see later that Perl uses the Perl_ prefix internally for function names, but that prefix can be omitted inside the Perl core. Hence, we'll call Perl_magic_get "magic_get".