CPGAVAS2

WARNINGS and NOTES

There are a large number of tools and protocols for SNP discovery and RNA editing site predction. This module is only for the exploratory analyses of the sequence diversity in plastids for experimental biologists. It is certain that the current analysis pipelines will not satisfy the needs of all types of experiment.
The exact analysis results will depend on particular experimental designs, such as types of sample, types of sequencing library constructed, sizes of data and etc.
A data preprocessing tool "prea" has been released to github. It allows the user to filter their reads based on a reference sequence. The filtered reads can then be uploaded to our sever for analysis.
We have released demo codes for the two pipelines through github. Interested users can use it as an example to set up the pipelines for their local use.

Setup and Run the Analsis

1. Select the analysis pipeline:

Analysis pipeline

2. Upload your FASTA, GFF (optional) and FASTQ files in one "tar.gz" file:

These files should have been "tar"ed and "gzip"ed using "tar" on linux platform or 7-zip on window platform.
Here are sample sets for RNA Editing Site discovery: sample set 1 and sample set 2, containing reads enriched for ndhB genes without and with the GFF file respectively.
Here is a sample set for SNP discovery: sample set 3.
For Mac users, the system might need to be adjusted before you can use the sample set. Specific instructions are provided in the HELP document.
Please make sure you specify the correct names for the files in your dataset in the next section.

3. Specify the types of file in your uploaded "tar.gz" file:

(Required) Name of the FASTA file representing the reference genome	-
(Optional) Name of the GFF file representing the reference annotation. WARNING: if the id in the FASTA and GFF files are different, the sequence id in the FASTA file will be replaced with that in the GFF file.	-
(Required) Names of FASTQ file. Pairs of read should be separated with a space as shown in the example. For multiple files, they should follow the format: "reads1[,reads2,...] [reads1[,reads2,...]]" as described in the tophat2 manual.	-

(Optional) 4. Upload your parameter file for RNA editing site prediction:

A sample parameter file is provided here..

(Optional) 5. Other information:

Enter your email address to receive a notice for job completion.

Last updated: March 31st, 2019.
For questions and comments, please send email to cliu@implad.ac.cn.

Center for Bioinformatics
Institute of Medicinal Plant Development
PeKing Union Medical College
Chinese Academy of Medical Sciences
Address: No. 151, Malianwa North Road, Haidian District, Beijing 100093, P.R.China