Mathematical Analysis and Computational Integration of Massive Heterogeneous Data from the Human Retina

Reading time: 5 minute
...

📝 Original Info

  • Title: Mathematical Analysis and Computational Integration of Massive Heterogeneous Data from the Human Retina
  • ArXiv ID: 1111.6631
  • Date: 2012-10-11
  • Authors: The original author list is not provided in the supplied excerpt. —

📝 Abstract

Modern epidemiology integrates knowledge from heterogeneous collections of data consisting of numerical, descriptive and imaging. Large-scale epidemiological studies use sophisticated statistical analysis, mathematical models using differential equations and versatile analytic tools that handle numerical data. In contrast, knowledge extraction from images and descriptive information in the form of text and diagrams remain a challenge for most fields, in particular, for diseases of the eye. In this article we provide a roadmap towards extraction of knowledge from text and images with focus on forthcoming applications to epidemiological investigation of retinal diseases, especially from existing massive heterogeneous collections of data distributed around the globe.

💡 Deep Analysis

Figure 1

📄 Full Content

In epidemiological studies of retinal diseases, one encounters the problem of extracting knowledge from heterogeneous collections of data consisting of numerical, descriptive and imaging. Large scale epidemiological studies use sophisticated statistical analysis, differential equations and related versatile mathematical tools that are developed for numerical data. In contrast, knowledge extraction from retina images and descriptive information in the form of text and diagrams remain scarcely developed. This article provides a roadmap towards epidemiology of diseases of the eye using extraction of knowledge from text, images and numerical data, and heterogeneous data fusion. In the first part of the article an outline of knowledge extraction from massive text data are discussed. In the second part, we address a general technique for systematic knowledge from very massive retinal image collections. The mathematical tools and concepts are adapted from the PDE based image analysis, which uses advanced numerical and symbolic computation libraries. The algorithms and the ensuing codes are particularly designed for very large industrial-grade projects.

To bring such sophisticated machinery to bear results on retinal image data requires transformation of the raw images into black and white (or gray scale). The simplified images are preprocessed to delineate the anatomical details of the vasculature. Morphometric invariants of the brain structures encountered in such images are, then, extracted in the form of diameter of blood vessels. A weighted graph to capture a combinatorial organization of vessels, branching and notable curvilinear features further a branched tubular surface encodes the endothelial cells that cover the inner part of blood vessels and the whole micro-vessel structures. The above mentioned data are hierarchically organized to separate distinct phenotypic traits that arise at a particular scale with the appropriate resolution. Finally, the hierarchical graph that encodes combinatorial -quantitative -geometric phenotypic traits could be quantified for massive data analysis using graph theoretic invariants such as the spectrum of Laplace and other operators on graphs. In short, image data provides data structures that quantify morphological traits according to hierarchy resolution and combinatorial features.

As a result of exploding volume of scientific publications which emerge each day, it’s becoming more and more intractable to read and consume the knowledge embedded in the scientific community in an integrated, meaningful and efficient manner. Among the main difficulties is the productivity and utilization of the manuscripts and other types of written material, like descriptive data about the disease or the health status of the patient, for professionals in different fields of study who do not want and probably cannot spend a lifetime reading every minor detail of all the disciplines they deal with in interdisciplinary research activities. In order to be able to use the knowledge and also to communicate with the experts in the field, it is essential to be able to have a general overview of the important entities and their relationships. As an example, biological computation specialists need to understand the roles of different genes and their relationships in order to be able to perform meaningful computations and analysis. Another example is to “understand” the notes and descriptions about patients’ state of health scribed during physical examinations. Information extraction (IE) is among the most promising emerging technique which, in the era of supercomputing and computational clouds, enables fast analysis of drastically huge bulks of documents. IE specialists nowadays think of doing complicated machine learning tasks on the whole web [1] (the size of the indexed Web is estimated to be about 46 billion pages as of November 2011 [2]). This ambitious goal has turned IE techniques into powerful and strong tools which could be utilized to help academic community reducing the time-of-flight for getting into useful and productive research activities. The goal of this phase is to incorporate different large-scale, high-throughput, flexible IE techniques to enhance our problem solving skills and leverage our communications with collaborating professionals in genomic sciences, health care and neuroscience to establish fruitful research projects and to contribute in building a framework for translating the extensive amount of knowledge (and wisdom) buried inside scientific publications descriptive healthcare records, even for decades, to be used in action by large number of researchers and students. IE usually starts with digesting the free text into a structured form using various natural language processing (NLP) methods [3]. These methods are built upon different sources of information, ranging from linguistic grammatical structures to statistical properties of various types of written and spoken

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut