Tag: tricks

PERL  Flush file output

If you want to write something to a file (i.e write to a log file), PERL will handle a buffer and won’t write instantly to the file. In most case, this behaviour is the best (since writing to a file will downgrade performance of your program). But, it is not what you will expect when you want to write to a logfile (as your program may stop, leaving the file empty). So, in this case, you should use binmode to give your filehandle a unix layer, which is unbuffered :

open (LOG, ">".$filename) || die "cannot create file\n";
binmode (LOG, ":unix");

Then, your PERL program will write instantly to your logfile…

PERL  Biomart and error 500…

One great thing with Biomart is that you can export your query to PERL code. You set up your query once a time and then, you can launch your PERL script to update your databases, for example (after installing the Biomart API which it’s not an easy part, especially with the registry files). It sounds great, but, in practice, it doesn’t work at all very well for big queries: Bandwitch is very very very slow, and most of the time, we will get an error 500 read timeout… So, you re-launch your script, and again, and again… After a while, you will get upset about Biomart, trust me!

So, I searched in the biomart help, and I found the MartService. As the help is « missing », I tried the example in the website. To my mind, it is not particularly clear (what the POST example means ?) or working (the wget command returned me an error). So, I tried different things. I first picked up the XML file (XML button on top right). By the way, the « Unique Results only » option didn’t work in the XML file: no matter the option was selected or not, the XML file still had the following option: uniqueRows = ’0′ (don’t forget to change to ’1′, or your files will be very very very big). After hanging around the website for a while, I had copy/paste the content of the file WebExample.pl :

# an example script demonstrating the use of BioMart webservice
use strict;
use LWP::UserAgent;

open (FH,$ARGV[0]) || die ("\nUsage: perl webExample.pl Query.xml\n\n");

my $xml;
while (<FH>){
    $xml .= $_;

my $path="http://www.biomart.org/biomart/martservice?";
my $request = HTTP::Request->new("POST",$path,HTTP::Headers->new(),'query='.$xml."\n");
my $ua = LWP::UserAgent->new;

my $response;

		 my($data, $response) = @_;
		 if ($response->is_success) {
		     print "$data";
		 else {
		     warn ("Problems with the web server: ".$response->status_line);

Then, I used the following command (as indicated in the example):

perl WebExample.pl myfile.xml

… And… It worked: data were flowing in my terminal! wünderbar! Finally, don’t mess with the installation of Biomart API and configurations of registry files (not tricky at all) if you just want to automatically update your data: use the XML approach with the LWP script. It’s easier, faster and you won’t get error 500 read timeout.

PERL  Smart sorting

To sort alphanumerical and numerical mixed data in PERL using the sort function will not provide you the results that you will expect. If this case, you have to use the cmp command to sort the data… But your numerical data will be sort in this order: 1, 10, 11… which isn’t pretty. Thus, you could use this function:

sub smart_sort ( $ $ )
	my ($a,$b) = @_;
	# Numerical sort
	return ($a <=> $b) if (($a =~ /^\d+$/)&&($b =~ /^\d+$/));
	# Alpha sort
	return ($a cmp $b) if ($a.$b =~ /^\w+$/);
	return ($a cmp $b) if ($a.$b =~ /^\d+\w+$/);
	return ($a cmp $b) if ($a.$b =~ /^\w+\d+$/);

Then, in your code, call this function in this way:

foreach my $data (sort {&smart_sort($a,$b)} keys %HASH)

Now, your mixed data will be sorted as 1, 2, … A, B, C…