Sometimes the information provided in an ordinary SV, HV or AV isn't enough for what Perl needs to do. For instance, how does one represent objects? What about tied variables? In this section, we'll look at some of the complications of the basic SV types.
The entirety of this section should be considered advanced material; it will not be covered in the course. Readers following the course should skip to the next section, Section 4.4 and study this in their own time.
Objects are represented relatively simply. As we know from ordinary Perl programming, an object is a reference to some data which happens to know which package it's in. In the definitions of AVs and HVs above, we saw the line
HV* xmg_stash; /* class package */As we'll see in Section 4.3.4, packages are known as "stashes" internally and are represented by hashes. The xmg_stash field in AVs and HVs is used to store a pointer to the stash which "owns" the value.
Hence, in the case of an object which is an array reference, the dump looks like this:
% perl -MDevel::Peek -e '$a=bless [1,2]; Dump($a)' SV = RV(0x81586d4) at 0x815b7a0 REFCNT = 1 FLAGS = (ROK) RV = 0x8151b0c SV = PVAV(0x8153074) at 0x8151b0c REFCNT = 1 FLAGS = (OBJECT) IV = 0 NV = 0 STASH = 0x8151a34 "main" ARRAY = 0x815fcf8 FILL = 1 MAX = 1 ARYLEN = 0x0 FLAGS = (REAL) Elt No. 0 SV = IV(0x815833c) at 0x8151bc0 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1 Elt No. 1 SV = IV(0x8158340) at 0x8151c44 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 2
This works for AVs and HVs which have a STASH field, but what about ordinary scalars? There is an additional, more complex type of scalar, which can hold both stash information and also permits us to hang additional, miscellaneous information onto the SV. This miscellaneous information is called "magic", (partially because it allows for clever things to happen, and partially because nobody really knows how it works) and the complex SV structure is a PVMG. We can create a PVMG by blessing a scalar reference:
% perl -MDevel::Peek -le '$b="hi";$a=bless \$b, main; print Dump($a)' SV = RV(0x8106ca4) at 0x810586c REFCNT = 1 FLAGS = (ROK) RV = 0x81058c0 SV = PVMG(0x810e628) at 0x81058c0 REFCNT = 2 FLAGS = (OBJECT,POK,pPOK) IV = 0 NV = 0 PV = 0x80ff698 "hi"\0 CUR = 2 LEN = 3 STASH = 0x80f1388 "main"As you can see, this is similar to the PVNV structure we saw in Section 4.1.5, with the addition of the STASH field. There's also another field, which we can see if we look at the definition of xpvmg:
struct xpvmg { char * xpv_pv; /* pointer to malloced string */ STRLEN xpv_cur; /* length of xpv_pv as a C string */ STRLEN xpv_len; /* allocated size */ IV xiv_iv; /* integer value or pv offset */ NV xnv_nv; /* numeric value, if any */ MAGIC* xmg_magic; /* linked list of magicalness */ HV* xmg_stash; /* class package */ };The xmg_magic field provides us with somewhere to put a magic structure. What's a magic structure, then? For this, we need to look in mg.h:
struct magic { MAGIC* mg_moremagic; MGVTBL* mg_virtual; /* pointer to magic functions */ U16 mg_private; char mg_type; U8 mg_flags; SV* mg_obj; char* mg_ptr; I32 mg_len; };
The magic virtual tables are provided by Perl - they're in perl.h and all begin PL_vtbl_. For instance, the virtual table for %ENV is PL_vtbl_env, and the table for individual elements of the %ENV hash is PL_vtbl_envelem.
In theory, you can create your own virtual tables by providing functions to fill the mgvtbl struct in mg.h, to allow for really bizarre behaviour to be triggered by accesses to your SVs. In practice, nobody really does that, although it's conceivable that you can improve the speed of pure-C tied variables that way. See also the discussion of "U" magic in Section 4.3.3 below.
Each of these different types of magic have a different "code letter" - the letters in use are shown in perlguts.
What happens when the value of an SV with magic is retrieved? Firstly, a function should call SvGETMAGIC(sv) to cause any magic to be performed. This in turn calls mg_get which walks over the linked list of magic. For each piece of magic, it looks in the magic virtual table, and calls the magical "get" function if there is one.
Let's assume that we're dealing with one of Perl's special variables, which has only one piece of magic, "\0" magic. The appropriate magic virtual table for "\0" magic is PL_vtbl_sv, which is defined as follows: (in perl.h)
EXT MGVTBL PL_vtbl_sv = {MEMBER_TO_FPTR(Perl_magic_get), MEMBER_TO_FPTR(Perl_magic_set), MEMBER_TO_FPTR(Perl_magic_len), 0, 0};Magic virtual tables have five elements, as seen in mg.h:
struct mgvtbl { int (CPERLscope(*svt_get)) (pTHX_ SV *sv, MAGIC* mg); int (CPERLscope(*svt_set)) (pTHX_ SV *sv, MAGIC* mg); U32 (CPERLscope(*svt_len)) (pTHX_ SV *sv, MAGIC* mg); int (CPERLscope(*svt_clear))(pTHX_ SV *sv, MAGIC* mg); int (CPERLscope(*svt_free)) (pTHX_ SV *sv, MAGIC* mg); };So the above virtual table means "call Perl_magic_set when we want to get the value of this SV; call Perl_magic_set when we want to set it; call Perl_magic_len when we want to find its length; do nothing if we want to clear it or when it is freed from memory."
In this case, we are getting the value, so magic_get is called. [1] This function looks at the value of mg_ptr, which, as noted above, is often the name of the variable. Depending on the name of the variable, it determines what to do: for instance, if mg_ptr is "!", then the current value of the C variable errno is retrieved.
A similar process is performed by SvSETMAGIC(sv) to call functions that need to be called when the value of an SV changes.
Tied arrays and hashes are implementing by adding type "P" magic to their AVs and HVs; individual elements of the arrays and hashes have "p" magic. Tied scalars and filehandles have type "q" magic. The virtual tables for, for instance, "p" magic scalars look like this:
EXT MGVTBL PL_vtbl_packelem = {MEMBER_TO_FPTR(Perl_magic_getpack), MEMBER_TO_FPTR(Perl_magic_setpack), 0, MEMBER_TO_FPTR(Perl_magic_clearpack), 0}That's to say, the function magic_getpack is called when the value of an element of a tied array or hash is retrieved. This function in turn performs a FETCH method call on the object stored in mg_obj.
We can invent our own "pseudo-tied" variables, using the user-defined "U" magic. "U" magic only works on scalars, and allows us to call a function when the value of the scalar is got or set. The virtual table for "U" magic scalars is as follows:
EXT MGVTBL PL_vtbl_uvar = {MEMBER_TO_FPTR(Perl_magic_getuvar), MEMBER_TO_FPTR(Perl_magic_setuvar), 0, 0, 0};As you should by now expect, these functions are called when the value of the scalar is accessed. They in turn call our user-defined functions. But how do we tell them what our functions are? In this case, we pass a pointer to a special structure in the mg_ptr field; the structure is defined in perl.h, and looks like this:
struct ufuncs { I32 (*uf_val)(IV, SV*); I32 (*uf_set)(IV, SV*); IV uf_index; };Here are our two function pointers: uf_val is called with the value of uf_index and the scalar when the value is sought, and uf_set is called with the same parameters when it is set.
Hence, the following code allows us to emulate $!:
I32 get_errno(IV index, SV* sv) { sv_setiv(sv, errno); } I32 set_errno(IV index, SV* sv) { errno = SvIV(sv); /* Some Cs don't like us setting errno, but hey */ } struct ufuncs uf; /* This is XS code */ void magicify(sv) SV *sv; CODE: uf.uf_val = &get_errno; uf.uf_set = &set_errno; uf.uf_index = 0; sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf));
If you need any more flexibility than that, it's time to look into "~" magic.
SVs that represent variables are kept in the symbol table; as you'll know from your Perl programming, the symbol table starts at %main:: and is an ordinary Perl hash, with the package and variable names as hash keys. But what are the hash values? Let's have a look:
% perl -le '$a=5; print ${main::}{a}' *main::aWell, that doesn't tell us very much - at first sight it just looks like an ordinary string. But if we use Devel::Peek on it, we find it's actually something else - a glob, or GV:
% perl -MDevel::Peek -e '$a=5; Dump ${main::}{a}' SV = PVGV(0x80fe3e0) at 0x80fb3ec REFCNT = 2 FLAGS = (GMG,SMG) IV = 0 NV = 0 MAGIC = 0x80fea50 MG_VIRTUAL = &PL_vtbl_glob MG_TYPE = '*' MG_OBJ = 0x80fb3ec MG_LEN = 1 MG_PTR = 0x81081d8 "a" NAME = "a" NAMELEN = 1 GvSTASH = 0x80f1388 "main" GP = 0x80ff2b0 SV = 0x810592c REFCNT = 1 IO = 0x0 FORM = 0x0 AV = 0x0 HV = 0x0 CV = 0x0 CVGEN = 0x0 GPFLAGS = 0x0 LINE = 1 FILE = "-e" FLAGS = 0x0 EGV = 0x80fb3ec "a"
Symbol tables are considered some of the hairiest voodoo in
the Perl internals.
From C, the variable PL_defstash is the
HV representing the main:: stash;
PL_curstash contains the current
package's stash.
The final data type we will examine is the CV, a code value used for storing subroutines. Both Perl and XSUB subroutines are stored in CVs, and blocks are also stored in CVs. The CV structure can be found in cv.h:
struct xpvcv { char * xpv_pv; /* pointer to malloced string */ STRLEN xpv_cur; /* length of xp_pv as a C string */ STRLEN xpv_len; /* allocated size */ IV xof_off; /* integer value */ NV xnv_nv; /* numeric value, if any */ MAGIC* xmg_magic; /* magic for scalar array */ HV* xmg_stash; /* class package */ HV * xcv_stash; OP * xcv_start; OP * xcv_root; void (*xcv_xsub) (pTHXo_ CV*); ANY xcv_xsubany; GV * xcv_gv; char * xcv_file; long xcv_depth; /* >= 2 indicates recursive call */ AV * xcv_padlist; CV * xcv_outside; #ifdef USE_THREADS perl_mutex *xcv_mutexp; struct perl_thread *xcv_owner; /* current owner thread */ #endif /* USE_THREADS */ cv_flags_t xcv_flags; }
package First; sub Second::mysub { ...}then xcv_stash points to First::. This is why, for instance,
package One; $x = "In One"; package Two; $x = "In Two"; sub One::test { print $x } package main; One::test();will print "In Two".
{ my $x = 0; sub counter { return ++$x; } }When inside counter, where does Perl "get" the SV $x from? It's not a global, so it doesn't live in a stash. It's not declared in counter, so it doesn't belong in counter's pad list. It actually belong to the pad list for the CV "outside" of counter. To enable Perl to get at these variables and also at lexicals used in closures, each CV contains a pointer to CV of the enclosing scope.
Global variables live, as we've seen, in symbol tables or "stashes". Lexical variables, on the other hand, are tied to blocks rather than packages, and so are stored inside the CV representing their enclosing block.
As mentioned briefly above, the xcv_padlist element holds a pointer to an AV. This array, the padlist, contains the names and values of lexicals in the current code block. Again, a diagram is the best way to demonstrate this:
The first element of the padlist - called the "padname" - is an array containing the names of the variables, and the other elements are lists of the current values of those variables. Why do we have several lists of current values? Because a CV may be entered several times - for instance, when a subroutine recurses. Having, essentially, a stack of frames ensures that we can restore the previous values when a recursive call ends. Hence, the current values of lexical variables are stored in the last element on the padlist.
From inside perl, you can get at the current pad as PL_curpad. Note that this is the pad itself, not the padlist. To get the padlist, you need to perform some awkwardness:
I32 cxix = dopoptosub(cxstack_ix) /* cxstack_ix is a macro */ AV* padlist = cxix ? CvPADLIST(cxstadck[ix].blk_sub.cv) : PL_comppadlist;We'll visit pads again when we look at operator targets in Section 6.4.
[1] | We'll see later that Perl uses the Perl_ prefix internally for function names, but that prefix can be omitted inside the Perl core. Hence, we'll call Perl_magic_get "magic_get". |