Chapter 4. Internal Variables

Perl's variables are a lot more flexible than C's - C is a strongly-typed language, whereas Perl is weakly typed. This means that Perl's variables may be used as strings, as integers, as floating point values, at will.

Hence, when we're representing values inside Perl, we need to implement some special types. This chapter will examine how scalars, arrays and hashes are represented inside the interpreter.

4.1. Basic SVs

SV stands for Scalar Value, and it's the basic form of representing a scalar. There are several different types of SV, but all of them have certain features in common.

4.1.1. Basics of an SV

Let's take a look at the definition of the SV type, in sv.h in the Perl core:

struct STRUCT_SV {
    void*   sv_any;     /* pointer to something */
    U32     sv_refcnt;  /* how many references to us */
    U32     sv_flags;   /* what we are */
	};

Every scalar, array and hash that Perl knows about has these three fields: "something", a reference count, and a set of flags. Let's examine these separately:

4.1.1.1. sv_any

This field allows us to connect another structure to the SV. This is the mechanism by which we can change between representing an integer, a string, and so on. The function inside the Perl core which does the change is called sv_upgrade.

As its name implies, this changing is a one-way process; there is no corresponding sv_downgrade. This is for efficiency: we don't want to be switching types every time an SV is used in a different context, first as a number, then a string, then a number again and so on.

Hence the structures we will meet get progressively more complex, building on each other: we will see an integer type, a string type, and then a type which can hold both a string and an integer, and so on.

4.1.1.2. Reference Counts

Perl uses reference counts to determine when values are no longer used. For instance, consider the following two pieces of code:

{
  my $a;
  $a = 3;
}
Here, the integer value 3, an SV, is assigned to a variable. Remember that variables are simply names for values: if we look up $a, we find the value 3. Hence, $a refers to the value. At this point, the value has a reference count of 1.

At the closing brace, the variable $a goes out of scope; that is to say, the name is destroyed, and the reference to the value 3 is broken. The value's reference count therefore decreases, becoming zero.

Once an SV has a reference count of zero, it is no longer in use and its memory can be freed.

Now our second piece of code:

my $b;
{
  my $a;
  $a = 3;
  $b = \$a;
}
In this case, once we assign a reference to the value into $b, the reference count of our value (the integer 3) increases to 2, as now two variables are able to reach the value.

When the scope ends, the value's reference count decreases as before because $a no longer refers to it. However, even though one name is destroyed, another name, $b, still refers to the value - hence, the resulting reference count is now 1.

Once the variable $b goes out of scope, or a different value is assigned to it, the reference count will fall to zero and the SV will be freed.

4.1.1.3. Flags

The final field in the SV structure is a flag field. The most important flags are stored in the bottom two bits, which are used to hold the SV's type - that is, the type of structure which is being attached to the sv_any field.

The second most important flags are those which tell us how much of the information in the structure is relevant. For instance, we previously mentioned that one of the structures can hold both an integer and a string. We could also say that it has an integer "slot" and a string "slot". However, if we alter the value in the integer slot, Perl does not change the value in the string slot; it simply unsets the flag which says that we may use the contents of that slot:

$a = 3;            # Type: Integer            | Flags: Can use integer
... if $a eq "3";  # Type: Integer and String | Flags: Can use integer,
                                              |        can use string
$a++;              # Type: Integer and String | Flags: Can use integer
	  

We'll see more detailed examples of this later on. First, though let's examine the various types that can be stored in an SV.

4.1.2. References

A reference, or RV, is simply a C pointer to another SV, as its definition shows:

struct xrv {
    SV *    xrv_rv;     /* pointer to another SV */
}

Hence, the Perl statement $a = \$b is equivalent to the C statements:

sv_upgrade(a, SVt_RV); /* Make sure a is an RV */
a->sv_any->xrv_rv = b;
	  

However, the SV fields are hidden behind macros, so an XS programmer or porter would write the above as:

sv_upgrade(a, SVt_RV); /* Make sure a is an RV */
SvRV(a) = b;

4.1.3. Integers

Perl's integer type is not necessarily a C int; it's called an IV, or Integer Value. The difference is that an IV is guaranteed to hold a pointer.

Perl uses the macros PTR2INT and INT2PTR to convert between pointers and IVs. The size guarantee means that, for instance, the following code will produce an IV:

	      $a = \1;
	      $a--;    # Reference (pointer) converted to an integer
	    

Let's now have a look at an SV structure containing an IV: the SvIV structure. The core module Devel::Peek allows us to examine a value from the C perspective:

% perl  -MDevel::Peek -le '$a=10; Dump($a)'
SV = IV(0x81559b0) at 0x81584f0                            (1)
  REFCNT = 1                                               (2)
  FLAGS = (IOK,pIOK)                                       (3)
  IV = 10                                                  (4)

	  
(1)
The first line tells us that this SV is of type SvIV. The SV has a memory location of 0x814584f0, and sv_any points to an IV at memory location 0x81559b0.
(2)
The value has only one reference to it at the moment, the fact that it is stored in $a.
(3)
Devel::Peek converts the flags from a simple integer to a symbolic form: it tells us that the IOK and pIOK flags are set. IOK means that the value in the IV slot is OK to be used.

What about pIOK? pIOK means that the IV slot represents the underlying ("p" for "private") data. If, for instance, the SV is tied, then we may not use the "10" that is in the IV slot - we must call the appropriate FETCH routine to get the value - so IOK is not set. The "10", however, is private data, only available to the tying mechanism, so pIOK is set.

(4)
This shows the IV slot with its value, the "10" which we assigned to $a's SV.

There's also a sub-type of IVs called UVs which Perl uses where possible; these are the unsigned counterparts of IVs. The flag IsUV is used to signal that a value in an IV slot is actually an unsigned value.

4.1.4. Strings

The next class we'll look at are strings. We can't call them "String Values", because the SV abbreviation is already taken; instead, remembering that a string is a pointer to an array of characters, and that the entry in the string slot is going to be that pointer, we call strings "PVs": Pointer Values

It's here that we start to see combination types: as well as the SvPV type, we have a SvPVIV which has string and integer slots.

Before we get into that, though, let us examine the SvPV structure, again from sv.h:

struct xpv {
    char *  xpv_pv;     /* pointer to malloced string */
    STRLEN  xpv_cur;    /* length of xpv_pv as a C string */
    STRLEN  xpv_len;    /* allocated size */
};
	

C's strings have a fixed size, but Perl must dynamically resize its strings whenever the data going into the string exceeds the currently allocated size. Hence, Perl holds both the length of the current contents and the maximum length available before a resize must occur. As with SVs, allocated memory for a string only increases, as the following example shows:

% perl -MDevel::Peek -le '$a="abc"; Dump($a);print; 
$a="abcde"; Dump($a);print; $a="a"; Dump($a)'
SV = PV(0x814ee44) at 0x8158520                            (1)
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x815c548 "abc"\0                                   (2)
  CUR = 3                                                  (3)
  LEN = 4                                                  (4)

SV = PV(0x814ee44) at 0x8158520                            (5)
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x815c548 "abcde"\0
  CUR = 5
  LEN = 6

SV = PV(0x814ee44) at 0x8158520                            (6)
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x815c548 "a"\0
  CUR = 1
  LEN = 6
(1)
This time, we have a SV whose sv_any points to an SvPV structure at address 0x814ee44
(2)
The actual pointer, the string, lives at address 0x815c548, and contains the text "abc". As this is an ordinary C string, it's terminated with a null character.
(3)
x SvCUR is the length of the string, as would be returned by strlen. In this case, it is 3 - the null terminator is not counted.
(4)
However, it is counted for the purposes of allocation: we have allocated 4 bytes to store the string, as reflected by SvLEN.
(5)
So what happens if we lengthen the string? As the new length is more than the available space, we need to extend the string.

The macro SvGROW is responsible for extending strings to a specified length. It's defined in terms of the function sv_grow which takes care of memory reallocation:

#define SvGROW(sv,len) (SvLEN(sv) < (len) ? sv_grow(sv,len) : 
		    SvPVX(sv))

After growing the string to accomodate the new value, the value is assigned and the CUR and LEN information updated. As you can see, the SV and the SvPV structures stay at the same address, and, in this case, the string pointer itself has remained at the same address.
(6)
And what if we shrink the string? Perl does not give up any memory: you can see that LEN is the same as it was before. Perl does this for efficiency: if it reallocated storage every time a string changed length, it would spent most of its time in memory management!

Now let's see what happens if we use a value as number and string, taking the example in Section 4.1.1.3:

% perl -Ilib -MDevel::Peek -le '$a=3; Dump($a);print; 
$a eq "3"; Dump($a);print; $a++; Dump($a)'
SV = IV(0x81559d8) at 0x8158518
  REFCNT = 1
  FLAGS = (IOK,pIOK)
  IV = 3

SV = PVIV(0x814f278) at 0x8158518                          (1)
  REFCNT = 1
  FLAGS = (IOK,POK,pIOK,pPOK)
  IV = 3
  PV = 0x8160350 "3"\0
  CUR = 1
  LEN = 2

SV = PVIV(0x814f278) at 0x8158518                          (2)
  REFCNT = 1
  FLAGS = (IOK,pIOK)
  IV = 4
  PV = 0x8160350 "3"\0
  CUR = 1
  LEN = 2
(1)
In order to perform the string comparison, Perl needs to get a string value. It calls SvPV, the ordinary macro for getting the string value from an SV. PV notices that we don't have a valid PV slot, so upgrades the SV to a SvPVIV. It also converts the number "3" to a string representation, and sets CUR and LEN appropriately. Because the values in both the IV and PV slots are available for use, both IOK and POK flags are turned on.
(2)
When we change the integer value of the SV by incrementing it by one, Perl updates the value in the IV slot. Since the value in the PV slot is invalidated, the POK flag is turned off. Perl does not remove the value from the PV slot, nor does it downgrade to an SvIV because we may use the SV as a string again at a later time.

There's one slight twist here: if you ask Perl to remove some characters from the beginning of the string, it performs a (rather ugly) optimization called "The Offset Hack". It stores the number of characters to remove (the offset) in the IV slot, and turns on the OOK (offset OK) flag. The pointer of the PV is advanced by the offset, and the CUR and LEN fields are decreased by that many. As far as C is concerned the string starts at the new position; it's only when the memory is being released that the real start of the string is important.

4.1.5. Floating point numbers

Finally, we have floating point types, or NVs: Numeric Values. Like IVs, NVs are guaranteed to be able to hold a pointer. The SvNV structure is very like the corresponding SvIV:

% perl -MDevel::Peek -le '$a=0.5; Dump($a);'
SV = NV(0x815d058) at 0x81584e8
  REFCNT = 1
  FLAGS = (NOK,pNOK)
  NV = 0.5

However, the combined structure, SvPVNV has slots for floats, integers and strings:

% perl -MDevel::Peek -le '$a="1"; $a+=0.5; Dump($a);'
SV = PVNV(0x814f9c0) at 0x81584f0
  REFCNT = 1
  FLAGS = (NOK,pNOK)
  IV = 0
  NV = 1.5
  PV = 0x815b5c0 "1"\0
  CUR = 1
  LEN = 2