2012/04/07

use Perl; Guide to references Part 2

This is part two in my five part series on Perl references. In Part 1, we went through the basics; how to take references to items and access the items through their references. In this episode, we'll explain some of the differences and benefits of sending references into subroutines, as opposed to the list-type data variables themselves. It s divided up into three sub-sections: references as subroutine parameters, named parameters and anonymous data.

  • Part 1 - The basics
  • Part 2 - References as subroutine parameters (this document)
  • Part 3 - Nested data structures
  • Part 4 - Code references
  • Part 5 - Concepts put to use

This episode assumes that you have at least a minimal understanding of how subroutines (functions) work in Perl; both how to send data into a function, and the standard methods of accessing the data once the function has accepted it. As before, I urge you to leave corrections, criticisms, improvements, questions and requests for further clarity in the comments section below, or in an email.

From this point forward, I will often substitute certain terms with abbreviations: ref for reference, deref for dereference, aref for array reference, href for hash reference and sub or function for subroutine.

REFERENCES AS SUBROUTINE PARAMETERS

Let's start off this section with a sample piece of code:

my @a = ( 1, 2, 3 );
my %h = ( a => 10, b => 20, c => 30 );

hello( @a, %h );

sub hello {

    my @array = shift;
    my %hash  = shift;

    # do stuff
}

As it appears, you are calling the hello() function with two parameters; an array as parameter one, and a hash as parameter two. We then proceed to take the parameters and assign them accordingly. However, in Perl, this does not work as you may think. Perl doesn't keep the parameters as separate parts. Instead, it flattens all the parameters together into a single list. In the case above, if we printed the parameter list before we took anything from it, it would appear as one long list of individual items:

1 2 3 c 30 a 10 b 20 

So in the above code, @array would contain 1, while we would have forced 2 into %hash. The rest of the flattened parameters (that are essentially one long list of scalar values) remain unused.

Because refs are simple individual scalars that only point to a data structure, we can pass the ref in as opposed to the list of the data structure's contents.

my @a = ( 1, 2, 3 );
my %h = ( a => 10, b => 20, c => 30 );

my $aref = \@a;
my $href = \%h;

hello( $aref, $href );

sub hello {

    my $aref_param = shift;
    my $href_param = shift;
}

In the first example, we thought we were passing in two parameters, but perl took the values from our parameters and merged them into one long list. By passing refs, our sub receives only two parameters as intended, and we can easily differentiate our array data and our hash data. This is termed "passing by reference", and it is the most common method to pass parameters to a function when the function needs more than just a few scalar values. We can now work on the refs within the sub the same way we were doing in Part 1.

When passing by reference, any changes made to the data the ref points to will be permanently changed, even after the subroutine returns. Passing data into a sub directly (not via a ref) makes an internal *copy* of the data, and when the sub returns, the original data is not modified. If it is necessary to keep your original data intact, you can make a copy of the data by dereferencing it within the function, and returning either the copy, or a reference to the copy:

my @a = ( 1, 2, 3 );

my $aref = \@a;

my @b = hello( $aref );

say "Original array:";
for my $x ( @a ){
    print "$x ";
}

say "\nReturned copy:";
for my $y ( @b ){
    print "$y ";
}

sub hello {

    my $aref = shift;
    
    # make a copy of the referenced array
    my @array = @{ $aref };

    $array[ 0 ] = 99;

    return @array;
}

Output:

Original array:
1 2 3 
Returned copy:
99 2 3

Although we've now modified our code so that we can take data structures as a parameter via their refs, we're still using "positional" function arguments, meaning that the parameters must be sent into the function in a specified order. Here's a brief code snippet of a similar example:

sub goodbye {
    my $mandatory_param_aref = shift;
    my $optional_param_aref  = shift;
}

# call it like this

goodbye( $aref1, $aref2 );

Now, what happens if we want to modify the code to accept a second optional argument?

sub goodbye {
    my $mandatory_param_aref = shift;
    my $optional_param_aref  = shift;
    my $second_optional_aref = shift;
}

# call it like this

goodbye( $aref1, $aref2, $aref3 );

No problem. However, what happens if you don't want to use the first optional parameter? You can't just do this:

goodbye( $aref1, $aref3 );

Because the function would take $aref3 and shift it off as the first optional parameter causing potentially all kinds of grief. You could send in undef in the optional positions that you don't want to supply data for so that the second optional parameter is assigned appropriately to the correct variable within the function:

goodbye( $aref1, undef, $aref2 );

But how about in a case with five optional parameters where you only want to supply the third and fifth?

goodbye( $param1, undef, undef, $param4, undef, $param6 );

Not only is that unsightly, but it is potentially very unstable code. You can see that it wouldn't be hard to position those incorrectly. There is a solution though.

NAMED PARAMETERS USING HASH REFERENCES

my %data = (
            user => 'stevieb',
            year => 2012,            
        );

my $data_ref = \%data;

user_of_the_year( $data_ref );

sub user_of_the_year {
    my $p = shift;

    my $user = $p->{ user };
    my $year = $p->{ year };

    say "Our luser of $year is $user";
}

We created a hash with the data we want to send in to our function, then we take a reference to that hash. The hash reference is what we send into the function. Inside the function, we shift off the only parameter we received (the href), and proceed to extract the values and assign them to lexical variables through the ref using the deref operator ->.

A few things to note here. First, the positional problem is gone. The function will only ever accept a single parameter; the href. Also, if the function has optional parameters, there's no undef trickery to reposition the remaining parameters. Simply omit the named key in the hash.

In the above function definition, it isn't mandatory to dereference the hash and extract its values to scalars right away. The last line could just as easily have been written like this:

say "Our luser of $p->{ year } is $p->{ user }";

However, I personally opt to extract immediately, therefore I can very quickly see what the function expects the data to look like without having to wade through the function code. Extracting in one place also makes it very easy to visually verify that your POD function use statements are accurate.

ANONYMOUS DATA

Often it is the case that you need to make a data structure on the fly, but don't need to assign a temporary name to it. We can skip steps by using references.

Instead of this two step process:

my %h = ( a => 1, b => 2 );
my $href = \%h;

We can take a reference directly from an unnamed (anonymous) hash:

my $href = { a => 1, b => 2 };

So, to create an href to an anonymous hash, we surround the data within braces instead of parens. Note that the braces are also used to distinguish hash keys. Arrays are similar, but they use their element brackets instead:

my $aref = [ 1, 2, 3 ];

In the function example above, I created the hash, took a ref to the hash, and passed the ref into the function as a parameter. Using anonymous data, I can skip creating the hash and taking a ref to it by inserting the ref to the anonymous data right within the function call:

user_of_the_year( { user => 'stevieb', year => 2012 } );

Or for more complex function calls with named parameters, you can put it on multiple lines:

sub user_of_the_year ({
                        name    => 'stevieb',
                        year    => 2012,
                        score   => 199,
                        awards  => 3,
                    });

Thank you for reading. Again, if you have any improvements or questions, leave me comments or send me an email.

No comments:

Post a Comment