Please use this identifier to cite or link to this item:
|Title:||Compression of multiple DNA sequences using intra-sequence and inter-sequence similarities||Author(s):||Siu, Wan Chi||Author(s):||Cheng, K.-O.
|Issue Date:||2015||Publisher:||IEEE||Journal:||IEEE/ACM Transactions on Computational Biology and Bioinformatics||Volume:||12||Issue:||6||Start page:||1322||End page:||1332||Abstract:||
Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recently, remarkable compression performance of individual DNA sequence from the same population is achieved by encoding its difference with a nearly identical reference sequence. Nevertheless, there is lack of general algorithms that also allow less similar reference sequences. In this work, we extend the intra-sequence to the inter-sequence similarity in that approximate matches of subsequences are found between the DNA sequence and a set of reference sequences. Hence, a set of nearly identical DNA sequences from the same population or a set of partially similar DNA sequences like chromosome sequences and DNA sequences of related species can be compressed together. For practical compressors, the compressed size is usually influenced by the compression order of sequences. Fast search algorithms for the optimal compression order are thus developed for multiple sequences compression. Experimental results on artificial and real datasets demonstrate that our proposed multiple sequences compression methods with fast compression order search are able to achieve good compression performance under different levels of similarity in the multiple DNA sequences.
|URI:||https://repository.cihe.edu.hk/jspui/handle/cihe/2302||DOI:||10.1109/TCBB.2015.2403370||CIHE Affiliated Publication:||No|
|Appears in Collections:||CIS Publication|
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.