Tag: Genetic

Unix  Code more shell scripts (Part. II)

With NGS data, we are experiencing to handle very (very very) large datasets. Whatever the coding language used (PERL, python, C…), the open/read part of a program is (always) very slow. Thus, the debug process could really be a pain due to the time to read the file. Using system unix tools is one of the approach to cut /merge / debug / analyze the content of the files…

Wanna split?

One column contains data with this format XXXX_YYYY and you want to have XXXX and YYYY and two different columns? Easy, use (again) the tr command:

tr '_' '\t' < file > new_file

Faster than a PERL program!

What is my header content?

Somebody has sent you a large file with many (many) columns and you want to know what is the header content (to extract the goog columns with the cut command, for instance)? Whatever the separator used (tab, comma, etc), reading and counting columns on screen is a pain… First, extract the first line with:

head -1 file > header_content

Then, transform your column separator in a carriage-return character (here for a TSV file):

tr '\t' '\n' < header_content > list_header

Finally, to display the header, type:

cat -n list_header

(the -n option will display the line number, easier for the cut command)

How many [XXXX] in my file?

Let’s end with a short introduction about the grep command (I will prepare a more consequent post only for grep next time). The ‘grep’ command is powerful, no doubt about that, but not so easy the first time! Basically, the grep syntax is:

grep [options] my_pattern file

One usefull option is the -w parameter (means ‘word’). Thus, if you only want to ‘grep’ a specific pattern (i.e. chr and not chromosome), you should use this parameter.

software  AnnotQTL accepted for publication in the NAR 2011 webserver edition

NAR just have let us know that our manuscript of AnnotQTL has been accepted for publication. The manuscript will be online in 2-3 weeks! In addition to the first release of AnnotQTL, we have added several ‘missing’ features:

  • The assembly version and the date of the request are now inserted into the exported text file (TSV or XML).
  • You can now run multiple analyses of several QTL regions (via a specific form in the ‘submit a request’ section). Of course, in this ‘multiple analyses’ mode, the highlight features still work (but for a unique biological function, i.e.: reproduction trait for all the QTL regions).

AnnotQTL can be found at http://annotqtl.genouest.org. We have also decided to add new features in a close future (based on the referee’s comments), such as to define a genomic region based on the STS markers surrounding a QTL region. A contact form is available on the official AnnotQTL website, you can leave a comment or ask for new species. The article is available here.

software  AnnotQTL website has been released!

Recently, we have released a new website: AnnotQTL, a web tool designed to gather the functional annotation of different prominent website though limiting the redundancy of information.

The last steps of genetic mapping research programs require to closely analyze several QTL regions to select candidate genes for further studies. Despite the existence of several websites (NCBI genome browser, Ensembl Browser and UCSC Genome browser) or web tools (Biomart, Galaxy) to achieve this task, the selection of candidate genes is still laborious. Indeed, information available on the prominent websites slightly differs in terms of gene prediction and functional annotation, and some other websites provide extra-information that researchers may want to use. Merging and comparing this information can be done manually for one QTL containing few genes but would be hardly possible for many different QTL regions and dozen of genes. Here we propose a web tool that, for the region of interest, merges the list of genes available in NCBI and Ensembl, removes redundancy, adds the functional annotation of different prominent web site and highlights the genes for which the functional annotation fits the biological function or diseases of interest. This tool is dedicated to sequenced livestock animal species (cattle, pig, chicken, and horse) and the dog as they have been extensively studied (indeed, more than 8000 QTL were detected).

The AnnotQTL server could be found here : http://annotqtl.genouest.org/

Blablabla  Winner!

Just a short post to let you know that our PhD students won two awards of the festival of the (very) short films for popular sciences of Rennes (jury award and audience award). If you can understand french, let have a look at this video dealing with the concept of QTL and linkage analysis. I’m very proud of them! Nice work, folks! Congratulations to Yuna, Marion, Xiao and Yvan!


La poule et la truite

PERL  Chromosome::Map module

A couple of days ago, I’ve released a PERL module on CPAN, which can generate PNG image of chromosomal maps. You can add several different tracks in the image (chromosomal features, markers, genes, QTL intervals, etc.). In contrary to existing modules which can draw complex images with bioperl objects, this module is quite easy to use and implement in your own script. Furthermore, this module was only designed to handle genetic and chromosomal maps, so you don’t have to read all the Bioperl documentation and a bunch of complex options!

This module is available through the CPAN installer or on the CPAN website. A more “pleasant” documentation, examples and color codes used in the module can be found here. I hope this module would help you.

software  MarketSet

What is MarkerSet?

MarkerSet is a command-line tool designed for selection of marker panel(s) according to genomic location and known informativity on experimental crosses (e.g. for wide genome scan). MarkerSet can be used with either SNP or microsatellite markers, because it only relies on informativity data. Basically, the algorithm will select the most informative markers in two windows separated by a constant gap, and sliding on the genome. MarkerSet will define automatically the windows size and the gap distance between the two windows depending on the wished number of markers to select and genome size. MarkerSet has several options to optimize markers selection with the possibility to give more weight to the markers distances or to the markers informativity. In case of availability of several experimental designs, it is possible to compare the markers set quality obtained by selecting markers perfectly fitted for each experimental design (monodesign selection), or by selecting a set of markers common to all designs (multidesign selection).

MarkerSet is designed to be compatible with any kind of markers and species.

What do I need to run MarkerSet?

MarkerSet needs a functional PERL environment with POSIX module available (typically *nix system, may work with MacOS X). MarkerSet.pl and config.pm files must be installed in the same directory.

The input file must be a plain tabulated text file with the following informations (headers are requested):

Markers name – chromosome – genomic location – informativity values for cross 1 (typically the number of heterozygote founders) – informativity values for cross 2 …

The available download package comes with core files (PERL program and config file), README file for explanations on program parameters, example input files, and a verbose log file for a better understanding of the algorithm.


To download the markerset package, click on the following link: markerset_package.tar.gz

The article about this software could be found at BMC research Notes and the INSTALL and USAGE documentation can be found here.