SPARCS: Structural Profile Assignment of RNA Coding Sequences

What is SPARCS ? SPARCS is a pipeline program that aims to reveal structured and unstructured regions by comparing the biological sequence to the random sequences. Two types of random sequences are generated. One is dinucleotide shuffling which preserves both the amino acid sequence and the dinucleotide statistics. The other one simply preserve the amnino acid sequeqnce. The sampling of these 2 methods enable the estimates of accurate z-score and thus reveal the structured, unstructured and disordered regions.

What are the minimal and maximal input size? Currently, the program is limited to 200 to 2000 nucleotides. Longer sequence can be run upon request or by downloading the source code.

What properties are evaluated at each window? Currently, we analyze 2 metrics: base pairing probability and base pairing entropy. Base pairing probability is simply the sum of all base pairing probabilities in the current window. Entropy tells if the biological sequence is more chaotic than the random sequences. Structured region is defined as the intersection of high base pairing probability and low base pairing entropy regions. On the contrary, unstructured regions is defined as the intersection of low base pairing probability and high base pairing entropy regions. In addition, we also address the disorfered region, which is the intersection of high base pairing probability and high base pairing entropy region.Each of the base pairing probability and base pairing entropy region is determined using a z score threshold value.

What are the input parameters? For now, we take in plain DNA/RNA sequence or fasta file as input. Moreover,the user can define the z score threshold for each of the metric mentioned above. The higher the z score, the more different of the true sequence from the random seuqences. The default values are set to 0.2. Users also have the choice to determine the minimum length of all three regions (structured, unstructured and disordered). The default is set to 10 nucleotides.

How to interpret the z-score ? Every nucleotide position is associated with a z-score. This z-score is obtained by averaging z-scores of all windows that has this nucleotide involved. The averaged z-score contributes to the nucleic acids structural profile. The corresponding color dotted line on the graph is the cut off point for the z-score. If it is above the line, it means the position on the biological sequence with the corresponding prediction is significantly stronger compared to the random sequences.

What is the output and how to interpret it? The program outputs 2 z-score plots which shows the structural profile for the dinucleotide and uniform model. A sample output example can be found here: Sample output page. Every position in the biological sequence is mapped to its corresponding z-score. Besides, a list of structured, unstructured and disordered regions (ex.[770,798] from nucleotide 770 to 798) are also outputted for both dinucleotide shuffling and the amino acid preserving only methods.

Can I run SPARCS myself? Yes. Both the sequence genrator in C++ and the full program in python are downloadable from the main page. Feel free to contact us if you need more functionalities or for bug report.

SPARCS: Structural Profile Assignment of RNA Coding Sequences

Source code:

Reference