GAGET: Genome Assembly Graph Evaluation Tool – GUI and Accuracy

Docente

Marco Santambrogio (web, mail)

Referente del progetto

Guido Walter Di Donato ( mail)

Area di ricerca

Architetture dei sistemi di elaborazione

Keyword (max 3 separate da virgola)

Graph, Genomics

Descrizione (max 500 caratteri)

Graph-based data structures provide a natural mechanism for the compact representation of related genomic sequences, and variations among them, as alternative paths in a directed graph. Consequently, many genome assembly tools currently use internal graph representations and offer the possibility to output the assembly graph in various formats. However, most genome assembly projects still focus on “classic” contigs and scaffolds rather than assembly graphs, due to the lack of proper tools for the analysis and the quality assessment of such graphs. In this context, we are currently developing GAGET, a tool for the evaluation of genome assembly graphs, based on the alignment of reference sequences to the graphs themselves.
Currently, GAGET computes a series of different quality metrics, adapted from the sequence to the graph domain (e.g. N50, NG50, GC content), and it provides as output a report with different plots describing the results. The aim of this project is to develop an interactive Graphic User Interface (GUI) for navigating the assembly graph and the reference genome, and visualizing the computed metrics. An additional goal is to improve the accuracy of the current algorithm for selecting the best set of compatible local alignments between the reference and the assembly graph, in order to reconstruct the path in the graph that better represents the reference sequence.

Comments are closed.