To search, Click below search items.


All Published Papers Search Service


An Efficient DNA Sequence Compression using Small Sequence Pattern Matching


Murugan. A and Punitha. K


Vol. 21  No. 8  pp. 281-287


Bioinformatics is formed with a blend of biology and informatics technologies and it employs the statistical methods and approaches for attending the concerning issues in the domains of nutrition, medical research and towards reviewing the living environment. The ceaseless growth of DNA sequencing technologies has resulted in the production of voluminous genomic data especially the DNA sequences thus calling out for increased storage and bandwidth. As of now, the bioinformatics confronts the major hurdle of management, interpretation and accurately preserving of this hefty information. Compression tends to be a beacon of hope towards resolving the aforementioned issues. Keeping the storage efficiently, a methodology has been recommended which for attending the same. In addition, there is introduction of a competent algorithm that aids in exact matching of small pattern. The DNA representation sequence is then implemented subsequently for determining 2 bases to 6 bases matching with the remaining input sequence. This process involves transforming of DNA sequence into an ASCII symbols in the first level and compress by using LZ77 compression method in the second level and after that form the grid variables with size 3 to hold the 100 characters. In the third level of compression, the compressed output is in the grid variables. Hence, the proposed algorithm S_Pattern DNA gives an average better compression ratio of 93% when compared to the existing compression algorithms for the datasets from the UCI repository.


DNA sequences, ASCII symbol, Binary Codes, compression, pattern matching, Data compression.