R

R  Histograms with multiple data series

To be honest, I have to admit that I don’t like the R software. But, I have to admit that this software could be helpful, too! I’m a totally beginner with this language and I don’t like it because I could find a nice and simple tutorial on it. I’m definitively lost with the R data formats. I can’t understand the differences between vector, list, factor, array, matrix or dataframe (and I didn’t find the good description of these fundamental concepts). Moreover, I didn’t like (euphemism) the way R is managing the data:

  • try to load the file with alphanumerical values and R will consider them as factor (even I don’t want my data as factor),
  • load your single column datafile with read.table function and ask R for a mean computation and histogram drawing: the mean calculation will be OK but for hist function, you have to transpose your data,
  • load again your single column datafile with read.table and try to compute a mean, R will return a ‘NaN’ value (why not, my data are maybe too close from zero or infinite). But, on the same object, invoke the summary function and you will get a result for the mean value (uuuh?)
  • again: single column datafile (I’m too beginner with R to try to handle more complex datafile ;-) !) loaded in R with read.table. This time mean calculation is OK but invoke the median function and you will get an error message telling you that your data are corrupted (???)… Let’s use your magic wand and transpose your data (again) and try again the median function. Wonderful, it works!!! What’s wrong with R? It can compute a mean value with my data but needs a transpose of the same data to compute a median? It seems a bit illogical…

I have dozen of examples like these! Okay, I’m a beginner with R and I’m definitively not good with this software! But, to my mind, the way of programming with R is the opposite of the good practice of coding (and it’s a PERL coder who says that!):

  • Don’t modify my data
  • Let me handle the format of my data
  • If data format is good for one task, it should be okay for a similar one

Anyway, I have to admit that I’m a bit dishonest with R and that I also need this software for some task easier with R than with PERL (remember one of the quality of PERL coder: laziness ;-) !). Initially, the purpose of this article was to talk about drawing histograms with multiple data series with R, so let’s do it!

I’ve found two ways for drawing such histograms with R: the first is explained here and for the second… I ask my PhD student if she knows how to do it with R (!)

For the multhist function, I’ve spent some times to put my data in the right format! I’ve always loved tutorials that explain functions with random data! Who analyses random data? Anyway, I finally succeed (great!):

require(plotrix)
setwd("/mypath/to/mydata") 

data1 <- read.table("data1",header = FALSE)
data2 <- read.table("data2",header = FALSE)
data3 <- read.table("data3",header = FALSE)
data4 <- read.table("data4",header = FALSE)

data1 <- data1[,1]
data2 <- data2[,1]
data3 <- data3[,1]
data4 <- data4[,1]

l <- list (data1,data2,data3,data4)
multhist(l,freq=FALSE)

and here’s the histogram!

hist_multhist
But, despite the “additional arguments to hist or barplot” in the arguments section of the man page of the multhist function, I can neither change the breaks of the histgrams (breaks=brk which is a vector (?) of the break values), nor add a legend to the graph…
Read more »