Please use this identifier to cite or link to this item:
Title: Compression of multiple DNA sequences using intra-sequence and inter-sequence similarities
Author(s): Siu, Wan Chi 
Author(s): Cheng, K.-O.
Wu, P.
Law, N.-F.
Issue Date: 2015
Publisher: IEEE
Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics 
Volume: 12
Issue: 6
Start page: 1322
End page: 1332
Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recently, remarkable compression performance of individual DNA sequence from the same population is achieved by encoding its difference with a nearly identical reference sequence. Nevertheless, there is lack of general algorithms that also allow less similar reference sequences. In this work, we extend the intra-sequence to the inter-sequence similarity in that approximate matches of subsequences are found between the DNA sequence and a set of reference sequences. Hence, a set of nearly identical DNA sequences from the same population or a set of partially similar DNA sequences like chromosome sequences and DNA sequences of related species can be compressed together. For practical compressors, the compressed size is usually influenced by the compression order of sequences. Fast search algorithms for the optimal compression order are thus developed for multiple sequences compression. Experimental results on artificial and real datasets demonstrate that our proposed multiple sequences compression methods with fast compression order search are able to achieve good compression performance under different levels of similarity in the multiple DNA sequences.
DOI: 10.1109/TCBB.2015.2403370
CIHE Affiliated Publication: No
Appears in Collections:CIS Publication

SFX Query Show full item record

Google ScholarTM




Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.