DNA-Inspired Information Concealing

Reading time: 5 minute
...

📝 Original Info

  • Title: DNA-Inspired Information Concealing
  • ArXiv ID: 0904.4449
  • Date: 2009-04-29
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Protection of the sensitive content is crucial for extensive information sharing. We present a technique of information concealing, based on introduction and maintenance of families of repeats. Repeats in DNA constitute a basic obstacle for its reconstruction by hybridization.

💡 Deep Analysis

Deep Dive into DNA-Inspired Information Concealing.

Protection of the sensitive content is crucial for extensive information sharing. We present a technique of information concealing, based on introduction and maintenance of families of repeats. Repeats in DNA constitute a basic obstacle for its reconstruction by hybridization.

📄 Full Content

Contemporary computer systems may be distributed and may consist of many interconnected processing units or a large number of networked computer subsystems. In addition contemporary digital networks may consist of a large number of end-and intermediate-nodes. In all these systems, information, in the form of the sequences over some alphabet of symbols, is circulating or being stored. The entity controlling a subsystem or a node is often unwilling or prohibited to share this information-sequences with other nodes. However, sharing of some reduced local information might be very useful for purposes of security, stability and various analysis of the system performance, and for data mining. Such analysis might for example allow to identify frequently appearing segments by performing approximate statistical analysis on segment frequency, allowing to detect replicating malicious code-worms. It also allows to identify segments-markers of computer viral infection, by detecting patterns existing in some database of malicious sequences. Such databases are used e.g. in contemporary intrusion detection systems or spam filters. It has been shown that being able to perform pattern matching against only fixed-length prefixes or substrings of longer sequences can provide approximate hints as to the presence of suspicious content [2]. Likewise, established worm detection techniques such as Autograph [3] or EarlyBird [4] are based on counting frequency of small blocks of a fixed size.

Sharing of reduced local information among the members of an interconnected computer system or communication network thus helps to discover attacks earlier. Affected parts may be isolated and further attack spread prevented. The benefits of sharing local information may be reaped in case of existence of a computational information processing, which preserves local information (e.g. all segments of certain maximal length) and makes impossible to reconstruct longer or sensitive parts of the information sequences.

We call such information processing concealing. The systems which conceal information and share the concealed information are likely to possess a competative advantage in the form of robustness, attack resistance and immunity due to ability to exchange, publish and protect information. Clearly, any information concealing algorithm needs to address two conflicting goals:

(1) preserving presence and, possibly, frequency rank of segments of given size (making spam identification and worm detection still possible), while (2) making reconstruction of content longer than the predefined limit computationally hard (e.g. disabling interpretation or understanding of the private content).

2.1. Repeats in DNA. Our inspiration comes from an important feature of eukaryotic DNA, namely that it contains various repeat families, and that their presence constitutes a basic difficulty in DNA reconstruction by hybridisation [6].

A large proportion of eukaryotic genomes is composed of DNA segments that are repeated either precisely or in variant form more than once. Highly repeated segments are arranged in two ways: as tandem arrays or dispersed among many unlinked genomic locations. As yet, no function has been associated with many of the repeats [8]. In the paper [1] which accompanies this paper, the authors propose that in eukaryotes the cells have DNA as a depositary of concealed genetic information and the genome achieves the self-concealing by accumulation and maintenance of repeats. The protected information may be shared and this is useful for the development of intercellular communication and in the development of multicellular organisms.

The assertion that the repeats are maintained in DNA in a programmed way for self-concealing explains basic puzzling features of repeats: the uniformity along with the polymorphism of the repeated sequences; the freedom of the repeated DNA to adopt quite different primary sequences in closely related species; apparent non-functionality of the precise amount or the precise sequence of the repeats.

The containment of repeats versus DNA sequencing problem is receiving extensive attention of biologists, computer scientists and mathematicians (see [5], [6], [7]).

Repeats versus DNA reconstruction. We explain the basic idea of concealing by repeats in this subsection. Assume we are given a collection K of segments of DNA. Each segment S from K is divided into two parts, the initial part S(I) and the terminal part S(T ). We thus may write S = S(I)|S(T ). This is an artificial assumption imposed only for the clarity of the presentation.

A reconstruction of K is a sequence of its segments so that the terminal part of each segment agrees with the initial part of next segment in the sequence. If several of these initial and terminal parts coincide, there may be an exponential number of possible reconstructions.

Let us consider a very simple example. Let K be the following collection of segments, where the initial an

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut