As mentioned in a previous post, I often need to compute simple and basic statistics of my datasets. In another previous post, I’ve performed some benchmarks with several PERL modules for simple statistic computing (now, you know which modules to use in your script). Finally, to characterize my datasets, I also perform some basic statistical tests, as t test, for instance. Initially, I wrote values in files, loaded them in R and perform a t test using the t.test() function. But, if you have to perform many tests, with methods will be quickly boring, so I undertake to search for a PERL module that does the same thing. One thing I really love about PERL and CPAN is that you will always find a module for your needs! And guess what? I’ve found the Statistics::TTest module…

And second great thing is… that this module is really simple to use, juste type:

use Statistics::TTest;
my @Data1 = ( array of value);
my @Data2 = ( array of value);
my $ttest = new Statistics::TTest;
$ttest->set_significance(95);
$ttest->load_data(\@R_nodup,\@R_alldup);
$ttest->output_t_test();

It will perform a t test between your two dataset (in @Data1 et @Data2) et you will get such output on screen:

*****************************************************
Summary from the observed values of the sample 1:
sample size= 547 , degree of freedom=546
mean=0.0475802079237217 , variance=0.353389091675633
standard deviation=0.594465383075948 , standard error=0.0254175043571037
the estimate of the mean is 0.0475802079237217 +/- 0.0499276038086589
or (-0.00234739588493724 to 0.0975078117323805 ) with 95 % of confidence
t-statistic=T=1.87194648440864 , Prob >|T|=0.0617439999999999
*****************************************************
Summary from the observed values of the sample 2:
sample size= 17 , degree of freedom=16
mean=0.309487901019442 , variance=0.370557765440336
standard deviation=0.608734560740834 , standard error=0.147639817170496
the estimate of the mean is 0.309487901019442 +/- 0.312981648419734
or (-0.00349374740029174 to 0.622469549439176 ) with 95 % of confidence
t-statistic=T=2.09623600835297 , Prob >|T|=0.052318
*****************************************************
Comparison of these 2 independent samples.
F-statistic=1.04858291941977 , cutoff F-statistic=1.8274 with alpha level=0.05 and df =(16,546)
equal variance assumption is accepted(not rejected) since F-statistic < cutoff F-statistic
degree of freedom=562 , t-statistic=T=1.78772255292605 Prob >|T|=0.07436
the null hypothesis (the 2 samples have the same mean) is not rejected since the alpha level is 0.05
difference of the mean=-0.26190769309572, standard error=0.146503545903722
the estimate of the difference of the mean is -0.26190769309572 +/- 0.287762264864091
or (-0.549669957959811 to 0.0258545717683704 ) with 95 % of confidence

If you are a bit confused by the amount of data printed on screen, you can also access directly to specific values with some methods. For instance, if you wish to print only the t value, df, p-value, you can use the following methods (more methods are available in the CPAN man pages):

use Statistics::TTest;
my @Data1 = ( array of value);
my @Data2 = ( array of value);
my $ttest = new Statistics::TTest;
$ttest->set_significance(95);
$ttest->load_data(\@R_nodup,\@R_alldup);
my $t = $ttest->t_statistic;
my $df = $ttest->df;
my $prob = $ttest->{t_prob};
my $test = $ttest->null_hypothesis();
print "t=$t (df = $df) - p-value = $prob\nNull hypothesis is $test\n";

That will give you such output:

t=1.78772255292605 (df = 562) - p-value = 0.07436
Null hypothesis is not rejected

Okay, using t test computation in PERL is nice, but someone could argue me that built-in function in R is quite good, too… And, it’s not so difficult to export data to files… Yep, you’re right… But, this module provides one (really) nice feature: it makes a F-test to check for the equal variance assumption before computing the t test (you know, it’s the var.equal flag sets to FALSE by default in R). Then, the calculation method will be different according to the F test result…

So, finally, this module is fast enough, easy to implement and doesn’t require to export data from PERL scripts.