next up previous contents
Next: The MARKOV package : Up: GenRGenS v2.0 User Manual Previous: Introduction   Contents

Subsections

Generalities and file formats

How to run GenRGenS

This section is dedicated to the installation and usage of GenRGenS v2.0.
GenRGenS performs random generation from a given sequences model. Models are passed to the main generation engine through description files. These are text files meeting certain syntax criteria. Until further improvements GenRGenS supports four classes of models. Each of these models' description file has a specific syntax, although they all share certain properties.

Downloading and installing GenRGenS

Foreword: GenRGenS' current implementation uses Java, so it will be necessary to download and install a version of Java's runtime environment (or virtual machine) superior to 1.1.2, freely downloadable from http://java.sun.com.

GenRGenS latest versions (binaries+sources) can be found at the following URL:
http://www.lri.fr/bio/GenRGenS
First download the bundle file found in the download section of the site. If needed, uncompress the archive into an appropriate directory, using free or shareware tools like 7zip (.ZIP files), Java bundled tool jar (.JAR files) or GNU tar (.TAR files) into a root directory of your choice. We'll assume for the next sections that the chosen directory is GenRGenSDir.

Command-line version

After decompression of the archive, move to GenRGenSDir and open a shell to invoke the Java virtual machine through the following command:
java -cp . GenRGenS.GenRGenS [options] [-nb $ k$] -size $ n$ DescriptionFile
where:
-
$ k$ is the number of sequences to be generated by GenRGenS. Defaults to $ 1$.
-
$ n$ is an indicative length for the generated sequences. Depending on the class of models, it can either be the exact length, an upper bound or just ignored depending on the type of generation. Required.
-
DescriptionFile is the path to a description file describing the random sequence model. Required.
Specific options may be available for some classes of models and will be detailed further.

Graphical User Interface

There are two ways to run GenRGenS interactive version, one is achieved through command line and the other uses file associations supported by certain platforms:
-
From Shell: Move to GenRGenSDir and, at the prompt, enter the following command:
  java -cp . GenRGenS.GenRGenS
or java -jar GenRGenSvX.X.JAR
-
From a graphical environment: Some operating systems maintain associations between files and eligible executables, based on file name suffixes or file header analysis. Under these platforms, a simple double-click on the JAR archive will execute the main application.
Figure 2.1: Screenshot of the main GUI
Image scrshot
Once GenRGenS UI is run, a typical random generation scenario would be:
Figure 2.2: Defining the generation parameters
Image scrshot2
The Generation Configuration window defines of a few additional parameters:

Description files

Description files describe random models to the generation engine of GenRGenS. They are composed of clauses, each defining a parameter in the random model. The syntax of description files will be detailed below, along with the most common clauses.

Main structure

GenRGenS description files are sequences of clauses. All clauses are based on the pattern Param_Name = Param_Value, where Param_Name is the name of the parameter being defined and Param_Value its value. Parameters available are specific to a given random model, although some parameters are shared by most if not all description files types. Clauses must be ordered in a generation-specific way, otherwise they will be rejected by GenRGenS' main engine. A clause can be optional: if omitted, its associated parameter will default to a value detailed in the random model's description part of this document.

Common clauses

The TYPE clause

TYPE = {MARKOV,GRAMMAR,RATEXP,MASTER}
The TYPE clause is the first clause of any description file. It defines the type of random model to be used for generation. Currently supported values for FileType are listed below:
TYPE Random model description
MARKOV Markovian random generation.
GRAMMAR Random generation based on context-free grammars.
MASTER Random generation of hierarchical sequences.
RATEXP Prosite patterns and rational expressions.

The ALIAS clause

ALIASES = $ s_1$=$ id_1$ $ s_2$=$ id_2$ ...
$ s_i$ is a symbol used for random generation
$ id_2$ is a new representation for this symbol
Another common clause is the ALIAS clause. It causes GenRGenS to substitute the right hand sides of the equalities to the left hand sides after the generation is performed. This clause simplifies the writing of large random models while keeping the output explicit, as one can write the whole model using letters and substituting more explicit symbols to letters afterward. See chapter 3 for advanced use of this clause.

Simple example

Figure 2.3: A simple Markovian description file
\begin{figure}
\texttt{
\begin{center}
\begin{tabular}{\vert l\vert l\vert...
...\end{tabular}\\ [0.2cm] \hline
\end{tabular}
\end{center}
}
\end{figure}
For instance, here is the toy example of a description file describing a simple Markovian model:

Clause 1 defines the class of random model to be used for random generation. Here, a Markovian model is choosen, thus raising a need for the definition of various parameters whose roles a explained below.

Clause 2 defines the order of the Markovian model. A 0 value stands for a Bernoulli model, i.e. the probability of emission of a letter doesn't depend on letters priorly emitted. Further details and definitions can be found on chapter 3.

Between clause 2 and clause 3, an optional clause PHASE = int is omitted. Default value $ 1$ will then be used for the number of phases, thus defining an homogenous Markovian model.

Clause 3 defines the emission probabilities for the various symbols. Instead of asking the user to provide the probabilities for the different k-mers, we preferred to compute the probabilities from given numbers of occurrences. This approach is well fit for the use of a Markovian profile built from a real sequence. Here, we are using the DNA bases A,C,G and T. Here, Adenosine(A) will be emitted with a $ \frac{33}{33+20+15+32} = 0.33$ probability.

Clause 4 performs-post generation rewriting of the sequence. Here, it allows generation of RNA sequences from a DNA-dedicated Markovian model 2.1.
next up previous contents
Next: The MARKOV package : Up: GenRGenS v2.0 User Manual Previous: Introduction   Contents
Yann Ponty 2007-04-19