décembre 2009

software  G3C

What is G3C ?

We have developed a software package called G3C (for Get all Co-annotated Co-located Clusters) using the PERL language. This software has been designed to identify all groups of co-located genes, which share a similar GO annotation on a genome scale. Basically, the principle of the software is the following: within a genomic window, it computes a p-value of the existence of a cluster of co-annotated genes for a specific similarity group of GO terms given by the hypergeometric distribution.

How does it work?

You need a functional PERL environment to use the software (the needed modules are mentioned in the online documentation). You will also need a SQL database to store data (pre-processing scripts are in the package).


To download the G3C package, click on the following link: coming soon

Unix  XML::SAX conflict between Debian & CPAN

If you have made a previous install of XML::SAX via CPAN and then you try to install a debian package that required this module, you will probably get this error:

Fatal: cannot call XML::SAX->save_parsers_debian().
You probably have a locally installed XML::SAX module.
See /usr/share/doc/libxml-sax-perl/README.Debian.gz

for further details on this error, you can read this bug report page. The soluce is to remove the XML::SAX modules from CPAN. It can be performed by using this script:

#!/usr/bin/perl -w

use ExtUtils::Packlist;
use ExtUtils::Installed;

$ARGV[0] or die "Usage: $0 Module::Name\n";
my $mod = $ARGV[0];
my $inst = ExtUtils::Installed->new();

foreach my $item (sort($inst->files($mod)))
	print "removing $item\n";
	unlink $item;

my $packfile = $inst->packlist($mod)->packlist_file();
print "removing $packfile\n";
unlink $packfile;

Then, type the following commands (root):

chmod u+x rm_perl_mod.pl
./rm_perl_mod.pl XML::SAX
apt-get upgrade

It works for me…

PERL  PDL in the real world

PDL is a great module to work with matrix and algebra computation. It is also quite fast and allow to produce PERL software with nice implementation of matrix computation. On CPAN, we can find a lot of tutorials about the use of the different functions and parameters of PDL. However, most of the examples in CPAN begin with such creations of  a piddle:

 $a = pdl [1..10];             # 1D array
 $a = pdl ([1..10]);           # 1D array
 $a = pdl (1,2,3,4);           # Ditto
 $b = pdl [[1,2,3],[4,5,6]];   # 2D 3x2 array
 $b = pdl 42                   # 0-dimensional scalar
 $c = pdl $a;                  # Make a new copy
 $a = pdl([1,2,3],[4,5,6]);    # 2D
 $a = pdl([[1,2,3],[4,5,6]]);  # 2D

It’s okay to understand the basis of PDL, but, in the real world, we rarely work with such data! First issue when you will begin to use PDL is « how do I put real data into piddle? ». The key is to work with tab reference. In this first example, you want to create a 1D array with data in the list @T:

my $_t = [ @T ];
my $p = pdl ( [@$_t] );

If you want to create a 2D array, the code will be similar:

my $_t1 = [ @T1 ];
my $_t2 = [ @T2 ];
my $p = pdl ( [@$_t1] , [@$_t2] );

Alternatively, you can also use the following instructions:

my $p1 = pdl ( [@$_t1] );
my $p2 = pdl ( [@$_t2] );
my $matrix = cat $p1,$p2;

Now, we can create piddle with real data, but these solutions are still using « manual » piddle initialization. Most of the times, data are read from a file (or SQL table), store in hash (or table) and  the dimension (or the number of  rows) of an array is define during the analysis… If you want to dynamically create a 2D array with several rows, one solution if to create first a 1D array and then to add other rows inside the array.

sub create_matrix ( $ )
	my ($g1) = @_;
	my $matrix = pdl ( [@$g1] );
	return $matrix;

sub update_matrix ( $ $ )
	my ($matrix,$g_new) = @_;
	my $p_new = pdl ( [@$g_new] );
	$matrix = $matrix->glue(1,$p_new);
	return $matrix;

Now, we can use these functions. Let hypothesize that your data are store in a file with this structure:

ID1	value1	value2	...	valueN
ID2	value1	value2	...	valueN
IDn	value1	value2	...	valueN

Your code could look like this:

open (IN, $file) || die "cannot open $file!\n";

my %Data = ();

while ($line = )
	$line =~ s/\s+$//;
	my @T = split (/\t/,$line);
	my $id = shift (@T);
	$Data{$id} = [ @T ];

close (IN);

my $first_row = defined;
my $matrix = undef;

foreach my $id (keys %Data)
	my $_t = $Data{$id};
	if (defined $first_row)
		$first_row = undef;
		$matrix = create_matrix($_t);
	$matrix = update_matrix ($matrix,$_t);

On CPAN, there are many other functions to manage piddle (xchg for instance) but I hope this tutorial has permited you to have a quick start with piddle.

PERL  PERL modules

A reminder post for a quick script that will display the module installed on your system and their version number… Always useful…

#!/usr/bin/perl -w
use strict;
use ExtUtils::Installed;

my $instmod = ExtUtils::Installed->new();

foreach my $module ($instmod->modules())
	my $version = $instmod->version($module) || "unknown version";
	print "$module -- $version\n";

Unix  The beauty of screen

How to launch analyses in background on a *nix system ? You can use the « & » at the end of your command, but the screen output of your analyses will be messy and you cannot close your session. You also can use the batch command but you will lose the screen output and there will be no way to interact with your software. To my mind, the best alternative is the use of the screen command, which is detailled in this article.

To use your favorite cpu-intensive software, use the following command:


Then, you will start with a new fresh shell session and in this session, you can launch your analysis. When it’s done, type « Ctrl-A Ctrl-D » to detached your screen: you will go back to your old shell session that launched the screen session. Now, you can log out.

After a while, to reconnect to your « screen » session, use the command:

screen -list
There are screens on:
3199.pts-1.server (Detached)
3248.pts-1.server (Detached)
2 Sockets in /var/run/screen/S-user

So, to attach to the session on « 3199.pts-1.server », just type:

screen -r 3199.pts-1.server

and that’s it! When your job is done, you can leave your screen session by typing « Ctrl-D »… Don’t forget to read the man page for further details on this « magic » command.

PERL  Smart sorting

To sort alphanumerical and numerical mixed data in PERL using the sort function will not provide you the results that you will expect. If this case, you have to use the cmp command to sort the data… But your numerical data will be sort in this order: 1, 10, 11… which isn’t pretty. Thus, you could use this function:

sub smart_sort ( $ $ )
	my ($a,$b) = @_;
	# Numerical sort
	return ($a <=> $b) if (($a =~ /^\d+$/)&&($b =~ /^\d+$/));
	# Alpha sort
	return ($a cmp $b) if ($a.$b =~ /^\w+$/);
	return ($a cmp $b) if ($a.$b =~ /^\d+\w+$/);
	return ($a cmp $b) if ($a.$b =~ /^\w+\d+$/);

Then, in your code, call this function in this way:

foreach my $data (sort {&smart_sort($a,$b)} keys %HASH)

Now, your mixed data will be sorted as 1, 2, … A, B, C…