Next: The MARKOV package : Up: GenRGenS v2.0 User Manual Previous: Introduction Contents

Subsections

Generalities and file formats

How to run GenRGenS

This section is dedicated to the installation and usage of GenRGenS v2.0.
GenRGenS performs random generation from a given sequences model. Models are passed to the main generation engine through description files. These are text files meeting certain syntax criteria. Until further improvements GenRGenS supports four classes of models. Each of these models' description file has a specific syntax, although they all share certain properties.

Downloading and installing GenRGenS

Foreword: GenRGenS' current implementation uses Java, so it will be necessary to download and install a version of Java's runtime environment (or virtual machine) superior to 1.1.2, freely downloadable from http://java.sun.com.

GenRGenS latest versions (binaries+sources) can be found at the following URL:

http://www.lri.fr/bio/GenRGenS

First download the bundle file found in the download section of the site. If needed, uncompress the archive into an appropriate directory, using free or shareware tools like 7zip (.ZIP files), Java bundled tool jar (.JAR files) or GNU tar (.TAR files) into a root directory of your choice. We'll assume for the next sections that the chosen directory is GenRGenSDir.

Command-line version

After decompression of the archive, move to GenRGenSDir and open a shell to invoke the Java virtual machine through the following command:

java -cp . GenRGenS.GenRGenS [options] [-nb ] -size DescriptionFile

where:

-: is the number of sequences to be generated by GenRGenS. Defaults to .
-: is an indicative length for the generated sequences. Depending on the class of models, it can either be the exact length, an upper bound or just ignored depending on the type of generation. Required.
-: DescriptionFile is the path to a description file describing the random sequence model. Required.

Specific options may be available for some classes of models and will be detailed further.

Graphical User Interface

There are two ways to run GenRGenS interactive version, one is achieved through command line and the other uses file associations supported by certain platforms:

-

From Shell: Move to GenRGenSDir and, at the prompt, enter the following command:

	`java -cp . GenRGenS.GenRGenS`
or	`java -jar GenRGenSvX.X.JAR`

-

From a graphical environment: Some operating systems maintain associations between files and eligible executables, based on file name suffixes or file header analysis. Under these platforms, a simple double-click on the JAR archive will execute the main application.

**Figure 2.1:** Screenshot of the main GUI

Once GenRGenS UI is run, a typical random generation scenario would be:

Open a description file, using 'File/Open Description File...' or 'File/Reopen' menu items or clicking on button A
Define some required/optional parameters, such as the sequence length or the number of sequences, inside the Generation Configuration window spawned from 'Utilities/GenRGenS' menu item or by clicking on button C.
Validate generation. Random sequences are generated and displayed on text buffer D. Potential errors or warnings are shown on text buffer E.
Save generated sequences to disk, using 'Buffers/Save Output Buffer Content...' or by cliquing on button B.

**Figure 2.2:** Defining the generation parameters

The Generation Configuration window defines of a few additional parameters:

Box 1: The size of the emitted sequences or an upper-bound depending on the type of generation.
Box 2: The number of generated sequences.
Checkbox 3: Whether or not to display informations about the generation during the process.
Checkbox 4: Toggles displaying of generated sequences on and off. Useful when large numbers of sequences are required.
Messagebox 5: Selects the output file.
Panel 6: Display a number of generator-specific option. Here, for a markovian generator, it is possible to provide the size parameter (Box 1) as an upper bound for the sequence size. The sequence is then generated letter by letter, until a dead-end is encountered, i.e. there is no sequel for this sequence in this model, or the size provided is reached.

Description files

Description files describe random models to the generation engine of GenRGenS. They are composed of clauses, each defining a parameter in the random model. The syntax of description files will be detailed below, along with the most common clauses.

Main structure

GenRGenS description files are sequences of clauses. All clauses are based on the pattern Param_Name = Param_Value, where Param_Name is the name of the parameter being defined and Param_Value its value. Parameters available are specific to a given random model, although some parameters are shared by most if not all description files types. Clauses must be ordered in a generation-specific way, otherwise they will be rejected by GenRGenS' main engine. A clause can be optional: if omitted, its associated parameter will default to a value detailed in the random model's description part of this document.

Common clauses

The `TYPE` clause

TYPE = {MARKOV,GRAMMAR,RATEXP,MASTER}

The TYPE clause is the first clause of any description file. It defines the type of random model to be used for generation. Currently supported values for FileType are listed below:

`TYPE`	Random model description
`MARKOV`	Markovian random generation.
`GRAMMAR`	Random generation based on context-free grammars.
`MASTER`	Random generation of hierarchical sequences.
`RATEXP`	Prosite patterns and rational expressions.

The `ALIAS` clause

ALIASES = = = ...

is a symbol used for random generation

is a new representation for this symbol

Another common clause is the ALIAS clause. It causes GenRGenS to substitute the right hand sides of the equalities to the left hand sides after the generation is performed. This clause simplifies the writing of large random models while keeping the output explicit, as one can write the whole model using letters and substituting more explicit symbols to letters afterward. See chapter 3 for advanced use of this clause.

Simple example

**Figure 2.3:** A simple Markovian description file
$\begin{figure} \texttt{ \begin{center} \begin{tabular}{\vert l\vert l\vert... ...\end{tabular}\\ [0.2cm] \hline \end{tabular} \end{center} } \end{figure}$

For instance, here is the toy example of a description file describing a simple Markovian model:

Clause 1 defines the class of random model to be used for random generation. Here, a Markovian model is choosen, thus raising a need for the definition of various parameters whose roles a explained below.

Clause 2 defines the order of the Markovian model. A 0 value stands for a Bernoulli model, i.e. the probability of emission of a letter doesn't depend on letters priorly emitted. Further details and definitions can be found on chapter 3.

Between clause 2 and clause 3, an optional clause PHASE = int is omitted. Default value

will then be used for the number of phases, thus defining an homogenous Markovian model.

Clause 3 defines the emission probabilities for the various symbols. Instead of asking the user to provide the probabilities for the different k-mers, we preferred to compute the probabilities from given numbers of occurrences. This approach is well fit for the use of a Markovian profile built from a real sequence. Here, we are using the DNA bases A,C,G and T. Here, Adenosine(A) will be emitted with a $\frac{33}{33+20+15+32} = 0.33$ probability.

Clause 4 performs-post generation rewriting of the sequence. Here, it allows generation of RNA sequences from a DNA-dedicated Markovian model ^2.1.

Next: The MARKOV package : Up: GenRGenS v2.0 User Manual Previous: Introduction Contents

Yann Ponty 2007-04-19