Please use this identifier to cite or link to this item: https://repository.cihe.edu.hk/jspui/handle/cihe/1248
DC FieldValueLanguage
dc.contributor.authorSiu, Wan Chien_US
dc.contributor.otherCheng, K.-O.-
dc.contributor.otherLaw, N.-F.-
dc.date.accessioned2021-08-11T08:10:38Z-
dc.date.available2021-08-11T08:10:38Z-
dc.date.issued2019-
dc.identifier.urihttps://repository.cihe.edu.hk/jspui/handle/cihe/1248-
dc.description.abstractDue to the advancement of DNA sequencing techniques, the number of sequenced individual genomes has experienced an exponential growth. Thus, effective compression of this kind of sequences is highly desired. In this work, we present a novel compression algorithm called Reference-based Compression algorithm using the concept of Clustering (RCC). The rationale behind RCC is based on the observation about the existence of substructures within the population sequences. To utilize these substructures, k -means clustering is employed to partition sequences into clusters for better compression. A reference sequence is then constructed for each cluster so that sequences in that cluster can be compressed by referring to this reference sequence. The reference sequence of each cluster is also compressed with reference to a sequence which is derived from all the reference sequences. Experiments show that RCC can further reduce the compressed size by up to 91.0 percent when compared with state-of-the-art compression approaches. There is a compromise between compressed size and processing time. The current implementation in Matlab has time complexity in a factor of thousands higher than the existing algorithms implemented in C/C++. Further investigation is required to improve processing time in future.en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.ispartofIEEE/ACM Transactions on Computational Biology and Bioinformaticsen_US
dc.titleClustering-based compression for population DNA sequencesen_US
dc.typejournal articleen_US
dc.identifier.doi10.1109/TCBB.2017.2762302-
dc.contributor.affiliationSchool of Computing and Information Sciencesen_US
dc.relation.issn1557-9964en_US
dc.description.volume16en_US
dc.description.issue1en_US
dc.description.startpage208en_US
dc.description.endpage221en_US
dc.cihe.affiliatedNo-
item.languageiso639-1en-
item.fulltextNo Fulltext-
item.openairetypejournal article-
item.grantfulltextnone-
item.openairecristypehttp://purl.org/coar/resource_type/c_6501-
item.cerifentitytypePublications-
crisitem.author.deptYam Pak Charitable Foundation School of Computing and Information Sciences-
crisitem.author.orcid0000-0001-8280-0367-
Appears in Collections:CIS Publication
SFX Query Show simple item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.