
RNA Data Analysis and Research
[Home] [Web Server] [Help] [Download Software]
Introduction to the RSmatch Software
Many ribonucleic acids (RNAs) play important roles in gene regulation, including non-coding RNAs and cis elements in mRNAs. Some of their functions are attributable to the structure they adopt, which are also called RNA motifs. Like sequence elements, RNA structure elements can be identified by comparing RNAs containing similar structures. The RSmatch package is designed to provide a light-weight approach to compare RNA structures, thereby uncovering functional structure elements. Compared with other tools for RNA structure comparison, RSmatch is fast, requiring quadratic time determined by the sizes of two given structures.
RSmatch uses two scoring schemes, i.e. position independent and position dependent schemes. The position independent scheme entails two scoring matrices, one for single-stranded regions and the other for double-stranded regions. This scoring scheme is used in pair-wise comparisons and database searches. The position dependent scheme, also known as profile, scores individual structure positions and is used by the multiple structure alignment and iterative database search functions. RSmatch provides both global and local alignment options even though the latter is more useful in most cases. In addition, RSmatch can take pattern-based structures as input. Please check the following publication for details:
Liu., J., Wang, J.T., Hu, J., and Tian, B. A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics 2005, 6:89.
In the current version (2.0), RSmatch provides the following functions: (1) Pair-wise comparison & Database search (2) Multiple structure alignment with an extended mode to compute common structure (NEW) (3) Iterative database search (NEW)
Also, it provides following utilities: (1)
We now give the user the option to perform constrained alignment of RNA structures(NEW). This is done in two ways: (a) Using phylogenetic information to derive the information content at each position of the RNA structure. (b) User-defined specification of conserved region. (2) Slide foldingThese functions are described in detail below:
This function finds the RNA structures in the database that locally or globally match a given query structure. This function can also be used to detect motif occurrences in a RNA structure database when the query structure is a known motif with a defined pattern.
RSmatch constructs a multiple alignment for a given set of RNA structures by progressively expanding the alignment one at a time. This function is useful when a small set of RNAs are functionally related by a shared motif. This function has been enhanced to compute common structure for a group of related RNA sequences.
For iterative database search, RSmatch is able to continuously conduct database searches using a position-specific scoring matrix and update the matrix using the latest result. This function could be much more sensitive than the regular database search, but at the cost of computing time.
Please contact jason.t.wang@njit.edu for comments/suggestions/queries.
RADAR fast download version for UNIX/Windows
Then, download the following files and place them in the directory containing the RADAR.jar file:
Note: The above files do not depend on Vienna RNA package and hence requires RNA secondary structure input.
Following document provides instructions regarding executing the jar file:
RADAR complete version for UNIX/Windows:
For UNIX
Note: This version is for UNIX only since Vienna RNA package which we use to fold the RNA sequences is available only for UNIX platforms.
Bug reports
6/18/2007: Fixed the bug in order to detect if the program goes in an infinite loop while aligning two structures and if so proceed with aligning the next structure instead of terminating the process.
6/30/2007: Changed the constrained alignment formula that calculates the bonus to be given for binary 0/1 constrained alignment
from:
2- (length of conserved region)/(total length)
to:
1 + (length of conserved region)/(total length).
The version 2.0 of RSmatch has also been implemented as a web based tool RADAR which can be accessed freely at http://aria.njit.edu/biodata/rna/RSmatch/server.htm.
Installation instructions for UNIX version of RSmatch2.0:
[A] Install RSmatch 2.0
If the input data are RNA sequences in the FASTA format, follow these
instructions to install RSmatch2.0 and
Vienna RNA package v1.4.
[B] Install Vienna RNA v1.4 & RSmatch2.0
Installation instructions for Windows version of RSmatch2.0:
[A] Install RSmatch 2.0
Note: The Java CLASSPATH variable needs to be set correctly to be able to search current directory.
Older versions:
>NM_003234:3394-3493 Homo sapiens transferrin receptor (p90, CD71) (TFRC), mRNA GCTTTCTGTCCTTTTGGCACTGAGATATTTATTGTTTATTTATCAGTGACAGAGTTCACTATAAATGGTGTTTTTTTAATAGAATATAATTATCGGAAGC ((((((.((((....)).))...((((.........(((((((.(((((......))))))))))))(((((((......)))))))...))))))))))
The second type is the FASTA format for RNA sequences. For the sequence data, RSmatch2.0 will automatically invoke Vienna RNA v1.4 to fold the sequences into structures and then align the structures. A sample sequence in the FASTA format is like this:
>NM_003234:3394-3493 Homo sapiens transferrin receptor (p90, CD71) (TFRC), mRNA GCTTTCTGTCCTTTTGGCACTGAGATATTTATTGTTTATTTATCAGTGACAGAGTTCACTATAAATGGTGTTTTTTTAATAGAATATAATTATCGGAAGC
[B] Specification of conserved region:
a. Use of phylogenetic information
Here the idea is that if for a given RNA sequence that is to be used as the query sequence, we have a set of very closely related RNA sequences then compute the multiple sequence alignment of these sequences using any of the several tools available for doing this. The result of multiple sequence alignment is used to derive the information content at each position of the RNA sequence. This information content is a value between 0-1 which indicates the percentage conservation at that position. RSmatch 2.0 provides the users with a utility “cal_cons”.
The utility takes the input as a multiple sequence alignment in the form of a single block & outputs conservation factors.
Example:
Input : NM_000146_for_cons
b. User-defined specification of conserved region
This is like a simple 0/1 conservation. The user is required to indicate with a “*” below the position that should be taken as conserved.
Example: tab4_query.str
[C] Output:
The output of RSmatch2.0 gives detailed alignment information. The Stockholm format is adopted to display the output of multiple structure alignment.
You can find the general syntax of the command by typing ./RSmatch2.0.
The general syntax is as follows:
RSmatch2.0 [options]
General options: -p [ pmatch | imatch | mmatch] choose a program from: pmatch: pair-wise comparison & database search; imatch: iterative database search; mmatch: multiple RNA structure alignment with an option for finding common structure; -u [slide | cal_cons] choose a utility from: slide: slide fold RNA sequences cal_cons: calculate conservation factors for the multiple alilgnment -D <database> FASTA-formatted sequence database. -d <database> secondary structure database. -g <penalty> gap penalty (default -6). -o <output> file to receive output, default to 'result.out'. -r <range> range of folding free energe (kcal/mol), used to select alternative RNA structures; default is 0. -S <ratio> sliding step length, expressed as a ratio of <W_length>; default is 0.5. -W <W_length> sliding window size; default is 100 nt. -z F To turn off the slide foldingOptions for the utility 'cal_cons' -M <multiple seq. alignment file> The file containing the multiple seq. alignmentOptions for 'pmatch', 'imatch': -n <topN> ouput top 'topN' hits. -Q <query> query sequence in FASTA format. -q <query> query secondary structure.Options for 'pmatch' : -s <score_matrix> file containing position independent score matrices; default is 'scoreMat.structure'. -G <alignment type> T: global alignment F: local alignment default: F -m <query type> query type: 0: real structure without IUB code 1: pattern structure containing IUB code default: 0 -c <conservation factors file> file that contains the conservation factorsOptions for 'mmatch':'mmatch' accepts a dataset of RNA structures except when the following option is selected: -A <enable prediction of common structure> T: enables prediction of common structure (Input for this has to be a dataset of RNA sequences)Options for 'imatch': -R <repeat> number of iterationsOptions for 'ecompare': -F <factor> the window-size decreasing rate. A series of window sizes are generated for folding sequences. The default <factor> is the ratio of two contiguous window sizes.
Examples:
You can find the general syntax of the command by typing java RSmatch.
The general syntax is as follows:
RSmatch2.0 [options]
General options: -p [ pmatch | imatch | mmatch] choose a program from: pmatch: pair-wise comparison & database search; imatch: iterative database search; mmatch: multiple RNA structure alignment with an option for finding common structure; -d <database> secondary structure database. -g <penalty> gap penalty (default -6). -o <output> file to receive output, default to 'result.txt'.Options for 'pmatch', 'imatch': -n <topN> ouput top 'topN' hits. -q <query> query secondary structure.Options for 'pmatch' : -s <score_matrix> file containing position independent score matrices; default is 'scoreMat.structure'. -G <alignment type> T: global alignment F: local alignment default: F -m <query type> query type: 0: real structure without IUB code 1: pattern structure containing IUB code default: 0 Options for 'imatch': -R <repeat> number of iterations (default: 5)Note: Windows version of RSmatch 2.0 does not accept RNA sequences as input.
Examples:
For any suggestions, comments or queries about this website, please contact jason.t.wang@njit.edu.