2012/04/07

use Perl; Guide to references Part 3

This is Part 3 of my five part guide to references series. In Part 1 we learnt the basic syntax for using references, in Part 2 we saw how to use references in subroutine calls, and in this episode we'll focus solely on nested data structures.

  • Part 1 - The basics
  • Part 2 - References as subroutine parameters
  • Part 3 - Nested data structures (this document)
  • Part 4 - Code references
  • Part 5 - Concepts put to use

At this point, it is rather imperative that you have a firm grasp on both the concepts and the syntax for creating, dereferencing and otherwise using references. If you are unfamiliar with any of these, I recommend you see Part 1.

As with the other parts in the series, I request that you to leave corrections, criticisms, improvements, additions, questions and requests for further clarity in the comments section below, or in an email.

NESTED DATA STRUCTURES

The two most elementary complex data structures are an array of arrays (AoA) and a hash of hashes (HoH). An AoA is simply an array where each element contains a reference to another array. Here's an example based on some of the concepts we've already learnt:

my @a;
my @a_0 = ( 1, 2, 3 );
my @a_1 = ( 4, 5, 6 );
my @a_2 = ( 7, 8, 9 );

$a[0] = \@a_0;
$a[1] = \@a_1;
$a[2] = \@a_2;

Using Data::Dumper, we see the contents of @a as follows. (I've inserted the comments for clarity)

$VAR1 = [ # the top @a array
          [ # $a[0]
            1,
            2,
            3
          ],
          [ # $a[1]
            4,
            5,
            6
          ],
          [ # $a[2]
            7,
            8,
            9
          ]
        ];

AoAs are good for storing multiple lists of data where the items will always retain their order. To access individual elements of the nested arrays, we need the -> deref operator again:

my $x = $a[0]->[0]; # value is 1

Note the positioning. We access the first element of @a as normal, but since $a[0] is a reference to another array, we must dereference here. Again:

my $y = $a[2]->[2]; # value is 9

Still using the above AoA structure, here's how to loop over each aref within the array. Note in the nested for() loop we see the @{} dereference operators again to access the data that each aref points to:

my $x = 0;

for my $aref ( @a ){

    say "in top level of a, elem $x";
    $x++;

    my $y = 0;

    for my $aref_elem ( @{ $aref } ){

        say "in second level elem $y, elem is: $aref_elem";
        $y++;
    }
}

Output:

in top level of a, elem 0
in second level elem 0, elem is: 1
in second level elem 1, elem is: 2
in second level elem 2, elem is: 3
in top level of a, elem 1
in second level elem 0, elem is: 4
in second level elem 1, elem is: 5
in second level elem 2, elem is: 6
in top level of a, elem 2
in second level elem 0, elem is: 7
in second level elem 1, elem is: 8
in second level elem 2, elem is: 9

You can compare that output to the loop itself, and also to the Data::Dumper output above to get a better idea of the nested structure.

More interesting and (imho) far more useful than the AoA is the HoH. Here's where significant usefulness begins.

my %person; # top level hash container

my %clothes  = ( shirt => 'red', pants => 'black', );
my %schedule = ( work => '0800', home => '0500', sleep => '2300', );
my %skills   = ( programming => 'poor', social => 'good' );

$person{ clothes  } = \%clothes;
$person{ schedule } = \%schedule;
$person{ skills }   = \%skills;

The Dumper output for a HoH looks much more interesting and easy to follow than the AoA:

$VAR1 = { # %person

          'skills' => {
                        'programming' => 'poor',
                        'social' => 'good'
                      },
          'clothes' => {
                         'pants' => 'black',
                         'shirt' => 'red'
                       },
          'schedule' => {
                          'work' => '0800',
                          'home' => '0500',
                          'sleep' => '2300'
                        }
        };

Here are a few examples of how to use the data:

# get the person's shirt

my $shirt_colour = $person{ clothes }->{ shirt }; # red

# change the person's shirt

$person{ clothes }->{ shirt } = 'black';

# list the persons skills

say "Person has the following skills: ";

for my $skill ( keys %{ $person{ skills } } ){
    print "$skill ";
}
print "\n";

# list each skill with the ability to perform the skill

say "Person's ";

while ( my ( $skill, $ability ) = each %{ $person{ skills } } ){

    print "$skill is $ability\n";
}

When dealing with a simple HoH, the deref operator (->) is not required. Due to the fact that Perl knows that a hash can never directly contain another hash, it is not ambiguous to type $person{ clothes }{ shirt }; Perl can identify that the nested key is a reference to another hash. Where the -> is required, is when the top level of the structure is a reference itself:

# create hrefs to anonymous hash

my $inner_1 = { a => 1, b => 2 };
my $inner_2 = { z => 26, y => 25 };

# add hrefs to hash

my %h = ( ref_1 => $inner_1, ref_2 => $inner_2 );

# take a ref to the %h hash

my $href = \%h;

# because $href is now a reference itself, we MUST use the dereference operator

say $href->{ ref_1 }{ z }; # prints 26

What if you wanted to keep track of all the classes in a school, and for each class, keep a list of all the student names? A HoH isn't needed, because all we want are the student names. The student names don't need a value. In this case, we would use a hash of arrays, or HoA:

# define the classrooms

my @room_1 = qw( steve mike dawn megan );
my @room_2 = qw( chris alexa melissa dave );
my @room_3 = qw( brittany hakim francois );

# declare the school. we'll declare it as a scalar
# because we're going to use an anonymous hash

my $school; # will become an href

# add the classrooms to the school

$school->{ room1 } = \@room_1;
$school->{ room2 } = \@room_2;
$school->{ room3 } = \@room_3;

# who's in room 2?

for my $student ( @{ $school->{ room2 } } ){
    say $student;
}

# output:
chris
alexa
melissa
dave

Notice the use of the array deref operator @{} in the for line. Things are starting to look a little more complex. Because $school->{ room2 } contains a reference to an array, we must dereference the entire thing. That example of dereferencing an array within a hash is where I see the most difficulty for programmers who are just starting to grasp refs. It is the mis-understanding of what is actually happening here that leads programmers to make syntax errors that generate output such as the following:

Not dereferencing the array ref prior to printing it:

ARRAY(0x8fba97c) 

Not using -> to dereference the $school reference to access the anonymous hash it points to. When an error like the following appears, it is a loud warning that you forgot to dereference the scalar $school, and that there is no %school counterpart... indeed, $school points to an unnamed (anonymous) hash:

Global symbol "%school" requires explicit package name at ./hoa.pl line 29.
Execution of ./hoa.pl aborted due to compilation errors.

Forgetting to dereference the array ref prior to pushing a new value onto it

Type of arg 1 to push must be array (not hash element) at ./hoa.pl line 31, near "'jeremy';"

Let's go back to school. Class three just got a new student. Let's add him to the roster.

# with push

push @{ $school->{ room3 } }, 'jeremy'; 

# or directly to the element, if we already know its position

$school->{ room3 }[3] = 'jeremy';

Let's print out all the classes.

# get the keys by dereferencing $school

for my $room_name ( keys %{ $school } ){
    
    say "Students in $room_name: ";
    print "    ";

    # get each student name from each class by
    # dereferencing each class aref

    for my $student ( @{ $school->{ $room_name } } ){
        print "$student ";
    }
    print "\n";
}

Output:

Students in room3: 
    brittany hakim francois jeremy 
Students in room1: 
    steve mike dawn megan 
Students in room2: 
    chris alexa melissa dave 

Notice that the names from the room arrays are still in original order, but the classrooms are not. Arrays keep their elements in the order in which you assign them, hashes act in a random fashion. To ensure the rooms are listed in order in this case, we simply add sort() to the for() line:

for my $room_name ( sort keys %{ $school } ){

A side note on dereferencing nested structures. The following are equivalent:

my $x = $href->{ aref }->[0];
my $x = $href->{ aref }[0];

In other words, you only need to use the -> deref operator for the first reference encountered. Perl implicitly dereferences everything thereafter without the explicit ->. This is because everything underneath the first data structure is always a reference, and Perl knows this.

There is no limit to the depths and complexity you can conceive with these nested data structures thanks to references. Almost all objects in Object Oriented Programming in Perl use storage mechanisms just like this.

Thanks for reading part three of my series. In part four, we'll focus on subroutine references (coderef) and dispatch tables. Then we'll build a menu system using all of the concepts we've learnt that you can incorporate into your own programs. Once again, please leave feedback in comments, or send me an email.

No comments:

Post a Comment