Reseach Projects- Genomic Signal Processing





Home

Resume

Grant Awards

Research Projects

Publications

Teaching

Students



Analysis of Proteomics and Genomics Based on Signal

Processing and Communication Theory



                                                                                                  DNA

Problem Statement

    Over the past half a century we have undergone a revolution in our ability to archive, process and exchange information. Communication of biological systems took a head start 3.5 billion years ago. However, for all the attention that is directed toward the changing conception of information and its function in our world, remarkably little is known about the broad role of information in biological systems. In my work, I have sought to mathematically model and understand the genetic information processing system.

Research Summary

    Investigation of the genomic structure has focused primarily on physicochemical and related issues. Yet protein synthesis is at its core an information transmission phenomenon. It therefore seems reasonable to postulate that the evolutionary pressures shaping the DNA sequence might not have been confined to physicochemical issues alone, and that considerations relating to informatics might have had a constraining evolutionary role guided by limitations imposed by molecular physics and chemistry. In particular, the development of a communication system to model the genetic information storage and transmission apparatus is important for many research areas such as “junk DNA” research, aging theories and evolutionary studies. From a communication theory perspective, the DNA is viewed as the source-channel encoded message and the problem of a species’ evolution will be represented as the iteration of the time-dependent communication channel over time: “The channel of evolution”. It is important to emphasize at this point that biological communication systems are not isomorphic to engineering communication systems. One fundamental difference is that biological communication systems did not evolve to minimize transmission errors, for a perfect transmission of biological information spells stagnation and ultimately extinction.

    By studying the asymptotic behavior of the biological communication system, we predicted a distribution of amino acids in nature that matches nearly perfectly an estimate of the natural abundance of amino acids performed experimentally by biologists. The biological implications of the asymptotic analysis entail that a parent organism will be unrelated to its offsprings after infinitely many generations no matter how small the time-dependent mutation rate is as long as it is non-zero. Investigation of the divergence rate showed that it is at least geometric. Moreover, based on the highly redundant structure of the DNA sequence (e.g., presence of a large percentage of non-coding segments within genes called introns), we demonstrated that the role of introns in the genome is to maintain a genius balance between two competing yet complementary forces: stability and adaptability. Introns stabilize the unstable genome by allowing reliable transmission of the genetic information. They also drive evolution by increasing the rate of genetic recombination. The perpetuation of life depends on this fine tuning between stability and adaptability. The predictions of all our theoretical models were validated against biological data.

Intellectual Merit

    The results derived from this work have the potential to dramatically impact our current understanding of evolution and the role of genetic information encoded in the DNA. For example, from a communication engineering point of view, the so-called “junk DNA” may turn out to be just as important as the much sought-after genes. This research will allow us to better understand the genetic mechanisms for transmission of information and to use new models in bioinformatics tasks (e.g. genomic and proteomic database search and retrieval).

Poster presented at the Advisory Board Meeting at UIC

Related publications:

  1.   L. Gong, N. Bouaynaya and D. Schonfeld, “Information-Theoretic Model of Evolution over Protein Communication Channel”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, accepted.    

  2.    N. Bouaynaya and D Schonfeld, “Protein Communication System: Evolution and Genomic Structure” , Algorithmica, vol. 48, no. 4, pp. 375–397, August 2007.  

  3.  N. Bouaynaya and D. Schonfeld, “Analysis of Protein Evolution as a Communication System”, in IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS’06), College Station, TX, May 2006, pp. 23-24.  

  4.   N. Bouaynaya and D. Schonfeld, “Biological Evolution: Distribution and Convergence Analysis of Amino Acids”, in IEEE International Conference of the Engineering in Medicine and Biology Society (EMBC’06) , New York City, August 2006, pp. 2045-2048.  

  5.   N. Bouaynaya and D. Schonfeld, "The Genomic Structure: Proof of the Role of Non-Coding DNA", in IEEE International Conference of the Engineering in Medicine and Biology Society (EMBC'06), New York City, August 2006, pp. 4544-4547.  

  6.   L. Gong, N. Bouaynaya and D. Schonfeld, "Information-Theoretic Bounds of Evolutionary Processes Modeled as a Protein Communication System" , in IEEE Statistical Signal Processing Workshop , Madison, WI, August 2007, pp. 1-5. (Invited Paper)  

  7.    N. Bouaynaya and D. Schonfeld, "Non-Stationary Analysis of Genomic Sequences" , in IEEE Statistical Signal Processing Workshop, Madison, WI, August 2007, pp. 200-204.  

  8.   N. Bouaynaya and D. Schonfeld, “ Non-stationary Analysis of Coding and Non-coding Regions in Nucleotide Sequences ”, IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 3, June 2008.  

  9.    N. Bouaynaya and D. Schonfeld, "Emergence of New Structure from Non-Stationary Analysis of Genomic Sequences", in IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS'08), Phoenix, June 2008.  

  10.    Jerzy S Zielinski, Nidhal Bouaynaya, Dan Schonfeld and William O'Neill, "Time-Dependent ARMA Modelling of Genomic Sequences", in Proceedings of BMC Bioinformatics, vol. 9, Suppl. 9, 2008.  

  11.    Nidhal Bouaynaya, Dan Schonfeld and Radhakrishnan Nagarajan, "Analysis of Temporal Gene Expression profiles using Time-Dependent MUSIC Algorithm", in IEEE International Workshop on Genomic Signal processing and Statistics (GENSIPS'09), Minneapolis, Minnesota, May 2009.