ALGORITHMS AND DATA STRUCTURES TO DETECT ONCOVIRUSES IN HUMAN CANCER USING NEXT GENERATION SEQUENCING DATA

Date

2012-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Evidence suggests human cancer can be induced by viruses. One way to test this hypothesis is to look for viral sequences in the human cancer genome. Next Generation Sequencing (NGS) technology sequences the whole human genome in a short period of time. This opens a door for a systematic analysis of the human genome and a thorough search for oncogenic viral sequences in cancer. However, a huge amount of sequencing reads generated by NGS poses a great challenge on the computational part of data analysis in terms of computing speed and memory usage. Data structures such as hash and tree are widely implemented to improve the performance of computing algorithms. Here, I described both data structures that have been developed in our center and compared their performance. Hash out performed tree when mapping the reads to a small reference sequence database. Subsequently, real human cancer data were analyzed by using the hash-based mapper and different oncoviral sequences were found in different cancers.

Description

Keywords

Next-generation sequencing, Oncovirus, Cancer, Hash, Tree, Sequence reads

Citation