Oral Presentation Science Protecting Plant Health 2017

A new bioinformatics pipeline for the rapid detection of plant viruses using next generation sequencing data (4366)

Amanda Baizan-Edge 1 2 , Sue Jones 2 , Stuart MacFarlane 3 , Lesley Torrance 1 3
  1. School of Biology, University of St Andrews, St Andrews, -, United Kingdom
  2. Information and Computational Science Group, James Hutton Institute, Dundee, United Kingdom
  3. Cell and Molecular Science Group, James Hutton Institute, Dundee, United Kingdom

Rapid detection of viruses in plants is essential to avoid the large negative economic impact caused by these pathogens. Currently, common methods to identify viruses rely on biological indicators and molecular assays, which are limited to one or group of viruses they are designed to detect. Because of this, next generation sequencing (NGS) techniques, such as smallRNA- and RNA-seq, are being rapidly adopted in the field. These techniques allow the unbiased detection of multiple viruses and the identification of novel viruses, an increasingly important issue in viral diagnostics. This is particularly true in the prevention of emerging viruses at boarders, as post entry quarantine systems are being put under pressure to detect new pathogens because of increases in global trade and movement.

However, using NGS methods for viral diagnosis requires lengthy and robust bioinformatics analysis. For this reason, the last few years have seen an increase in bioinformatics tools designed for viral diagnosis from mixed (host and pathogen) samples. However, most are designed for clinical samples, which render them suboptimal for the identification of viruses in plants. In addition, many tools rely on alignment methods for de novo assembly, and mapping of short reads to a reference, which (in addition of being slow) can lead to related viral sequence being undetected, due to high mutability rates of viral genomes.

For this reason, we have developed a new bioinformatics pipeline that takes mixed raw smallRNA- or RNA-seq reads from plant samples and produces a viral index using k-mer profiles. This method divides sequences into k-mers of specific lengths and uses exact matching. The pipeline has been developed on Galaxy, an open platform for intensive data analysis, allowing analysis to be conducted with no command line input, making it accessible to all researchers.