Please use this identifier to cite or link to this item:
https://repository.cihe.edu.hk/jspui/handle/cihe/1248
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Siu, Wan Chi | en_US |
dc.contributor.other | Cheng, K.-O. | - |
dc.contributor.other | Law, N.-F. | - |
dc.date.accessioned | 2021-08-11T08:10:38Z | - |
dc.date.available | 2021-08-11T08:10:38Z | - |
dc.date.issued | 2019 | - |
dc.identifier.uri | https://repository.cihe.edu.hk/jspui/handle/cihe/1248 | - |
dc.description.abstract | Due to the advancement of DNA sequencing techniques, the number of sequenced individual genomes has experienced an exponential growth. Thus, effective compression of this kind of sequences is highly desired. In this work, we present a novel compression algorithm called Reference-based Compression algorithm using the concept of Clustering (RCC). The rationale behind RCC is based on the observation about the existence of substructures within the population sequences. To utilize these substructures, k -means clustering is employed to partition sequences into clusters for better compression. A reference sequence is then constructed for each cluster so that sequences in that cluster can be compressed by referring to this reference sequence. The reference sequence of each cluster is also compressed with reference to a sequence which is derived from all the reference sequences. Experiments show that RCC can further reduce the compressed size by up to 91.0 percent when compared with state-of-the-art compression approaches. There is a compromise between compressed size and processing time. The current implementation in Matlab has time complexity in a factor of thousands higher than the existing algorithms implemented in C/C++. Further investigation is required to improve processing time in future. | en_US |
dc.language.iso | en | en_US |
dc.publisher | IEEE | en_US |
dc.relation.ispartof | IEEE/ACM Transactions on Computational Biology and Bioinformatics | en_US |
dc.title | Clustering-based compression for population DNA sequences | en_US |
dc.type | journal article | en_US |
dc.identifier.doi | 10.1109/TCBB.2017.2762302 | - |
dc.contributor.affiliation | School of Computing and Information Sciences | en_US |
dc.relation.issn | 1557-9964 | en_US |
dc.description.volume | 16 | en_US |
dc.description.issue | 1 | en_US |
dc.description.startpage | 208 | en_US |
dc.description.endpage | 221 | en_US |
dc.cihe.affiliated | No | - |
item.languageiso639-1 | en | - |
item.fulltext | No Fulltext | - |
item.openairetype | journal article | - |
item.grantfulltext | none | - |
item.openairecristype | http://purl.org/coar/resource_type/c_6501 | - |
item.cerifentitytype | Publications | - |
crisitem.author.dept | Yam Pak Charitable Foundation School of Computing and Information Sciences | - |
crisitem.author.orcid | 0000-0001-8280-0367 | - |
Appears in Collections: | CIS Publication |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.