PERL

PERL  Chromosome::Map module

A couple of days ago, I’ve released a PERL module on CPAN, which can generate PNG image of chromosomal maps. You can add several different tracks in the image (chromosomal features, markers, genes, QTL intervals, etc.). In contrary to existing modules which can draw complex images with bioperl objects, this module is quite easy to use and implement in your own script. Furthermore, this module was only designed to handle genetic and chromosomal maps, so you don’t have to read all the Bioperl documentation and a bunch of complex options!

This module is available through the CPAN installer or on the CPAN website. A more “pleasant” documentation, examples and color codes used in the module can be found here. I hope this module would help you.

PERL  PDL in the real world

PDL is a great module to work with matrix and algebra computation. It is also quite fast and allow to produce PERL software with nice implementation of matrix computation. On CPAN, we can find a lot of tutorials about the use of the different functions and parameters of PDL. However, most of the examples in CPAN begin with such creations of  a piddle:

 $a = pdl [1..10];             # 1D array
 $a = pdl ([1..10]);           # 1D array
 $a = pdl (1,2,3,4);           # Ditto
 $b = pdl [[1,2,3],[4,5,6]];   # 2D 3x2 array
 $b = pdl 42                   # 0-dimensional scalar
 $c = pdl $a;                  # Make a new copy
 $a = pdl([1,2,3],[4,5,6]);    # 2D
 $a = pdl([[1,2,3],[4,5,6]]);  # 2D

It’s okay to understand the basis of PDL, but, in the real world, we rarely work with such data! First issue when you will begin to use PDL is « how do I put real data into piddle? ». The key is to work with tab reference. In this first example, you want to create a 1D array with data in the list @T:

my $_t = [ @T ];
my $p = pdl ( [@$_t] );

If you want to create a 2D array, the code will be similar:

my $_t1 = [ @T1 ];
my $_t2 = [ @T2 ];
my $p = pdl ( [@$_t1] , [@$_t2] );

Alternatively, you can also use the following instructions:

my $p1 = pdl ( [@$_t1] );
my $p2 = pdl ( [@$_t2] );
my $matrix = cat $p1,$p2;

Now, we can create piddle with real data, but these solutions are still using « manual » piddle initialization. Most of the times, data are read from a file (or SQL table), store in hash (or table) and  the dimension (or the number of  rows) of an array is define during the analysis… If you want to dynamically create a 2D array with several rows, one solution if to create first a 1D array and then to add other rows inside the array.

sub create_matrix ( $ )
{
	my ($g1) = @_;
	my $matrix = pdl ( [@$g1] );
	return $matrix;
}

sub update_matrix ( $ $ )
{
	my ($matrix,$g_new) = @_;
	my $p_new = pdl ( [@$g_new] );
	$matrix = $matrix->glue(1,$p_new);
	return $matrix;
}

Now, we can use these functions. Let hypothesize that your data are store in a file with this structure:

ID1	value1	value2	...	valueN
ID2	value1	value2	...	valueN
.
.
.
IDn	value1	value2	...	valueN

Your code could look like this:

open (IN, $file) || die "cannot open $file!\n";

my %Data = ();

while ($line = )
{
	$line =~ s/\s+$//;
	my @T = split (/\t/,$line);
	my $id = shift (@T);
	$Data{$id} = [ @T ];
}

close (IN);

my $first_row = defined;
my $matrix = undef;

foreach my $id (keys %Data)
{
	my $_t = $Data{$id};
	if (defined $first_row)
	{
		$first_row = undef;
		$matrix = create_matrix($_t);
	}
	$matrix = update_matrix ($matrix,$_t);
}

On CPAN, there are many other functions to manage piddle (xchg for instance) but I hope this tutorial has permited you to have a quick start with piddle.

PERL  PERL modules

A reminder post for a quick script that will display the module installed on your system and their version number… Always useful…

#!/usr/bin/perl -w
use strict;
use ExtUtils::Installed;

my $instmod = ExtUtils::Installed->new();

foreach my $module ($instmod->modules())
{
	my $version = $instmod->version($module) || "unknown version";
	print "$module -- $version\n";
}

PERL  Smart sorting

To sort alphanumerical and numerical mixed data in PERL using the sort function will not provide you the results that you will expect. If this case, you have to use the cmp command to sort the data… But your numerical data will be sort in this order: 1, 10, 11… which isn’t pretty. Thus, you could use this function:

sub smart_sort ( $ $ )
{
	my ($a,$b) = @_;
	# Numerical sort
	return ($a <=> $b) if (($a =~ /^\d+$/)&&($b =~ /^\d+$/));
	# Alpha sort
	return ($a cmp $b) if ($a.$b =~ /^\w+$/);
	return ($a cmp $b) if ($a.$b =~ /^\d+\w+$/);
	return ($a cmp $b) if ($a.$b =~ /^\w+\d+$/);
}

Then, in your code, call this function in this way:

foreach my $data (sort {&smart_sort($a,$b)} keys %HASH)
{
	...
}

Now, your mixed data will be sorted as 1, 2, … A, B, C…