PDL in the real world

PDL is a great module to work with matrix and algebra computation. It is also quite fast and allow to produce PERL software with nice implementation of matrix computation. On CPAN, we can find a lot of tutorials about the use of the different functions and parameters of PDL. However, most of the examples in CPAN begin with such creations of  a piddle:

 $a = pdl [1..10];             # 1D array
 $a = pdl ([1..10]);           # 1D array
 $a = pdl (1,2,3,4);           # Ditto
 $b = pdl [[1,2,3],[4,5,6]];   # 2D 3x2 array
 $b = pdl 42                   # 0-dimensional scalar
 $c = pdl $a;                  # Make a new copy
 $a = pdl([1,2,3],[4,5,6]);    # 2D
 $a = pdl([[1,2,3],[4,5,6]]);  # 2D

It’s okay to understand the basis of PDL, but, in the real world, we rarely work with such data! First issue when you will begin to use PDL is « how do I put real data into piddle? ». The key is to work with tab reference. In this first example, you want to create a 1D array with data in the list @T:

my $_t = [ @T ];
my $p = pdl ( [@$_t] );

If you want to create a 2D array, the code will be similar:

my $_t1 = [ @T1 ];
my $_t2 = [ @T2 ];
my $p = pdl ( [@$_t1] , [@$_t2] );

Alternatively, you can also use the following instructions:

my $p1 = pdl ( [@$_t1] );
my $p2 = pdl ( [@$_t2] );
my $matrix = cat $p1,$p2;

Now, we can create piddle with real data, but these solutions are still using « manual » piddle initialization. Most of the times, data are read from a file (or SQL table), store in hash (or table) and  the dimension (or the number of  rows) of an array is define during the analysis… If you want to dynamically create a 2D array with several rows, one solution if to create first a 1D array and then to add other rows inside the array.

sub create_matrix ( $ )
	my ($g1) = @_;
	my $matrix = pdl ( [@$g1] );
	return $matrix;

sub update_matrix ( $ $ )
	my ($matrix,$g_new) = @_;
	my $p_new = pdl ( [@$g_new] );
	$matrix = $matrix->glue(1,$p_new);
	return $matrix;

Now, we can use these functions. Let hypothesize that your data are store in a file with this structure:

ID1	value1	value2	...	valueN
ID2	value1	value2	...	valueN
IDn	value1	value2	...	valueN

Your code could look like this:

open (IN, $file) || die "cannot open $file!\n";

my %Data = ();

while ($line = )
	$line =~ s/\s+$//;
	my @T = split (/\t/,$line);
	my $id = shift (@T);
	$Data{$id} = [ @T ];

close (IN);

my $first_row = defined;
my $matrix = undef;

foreach my $id (keys %Data)
	my $_t = $Data{$id};
	if (defined $first_row)
		$first_row = undef;
		$matrix = create_matrix($_t);
	$matrix = update_matrix ($matrix,$_t);

On CPAN, there are many other functions to manage piddle (xchg for instance) but I hope this tutorial has permited you to have a quick start with piddle.