Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis

Reading time: 6 minute
...

📝 Original Info

  • Title: Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis
  • ArXiv ID: 0804.2848
  • Date: 2009-11-13
  • Authors: ** Kevin M. Carter, Raviv Raich, William G. Finn, Alfred O. Hero III **

📝 Abstract

Flow cytometry is often used to characterize the malignant cells in leukemia and lymphoma patients, traced to the level of the individual cell. Typically, flow cytometric data analysis is performed through a series of 2-dimensional projections onto the axes of the data set. Through the years, clinicians have determined combinations of different fluorescent markers which generate relatively known expression patterns for specific subtypes of leukemia and lymphoma -- cancers of the hematopoietic system. By only viewing a series of 2-dimensional projections, the high-dimensional nature of the data is rarely exploited. In this paper we present a means of determining a low-dimensional projection which maintains the high-dimensional relationships (i.e. information) between differing oncological data sets. By using machine learning techniques, we allow clinicians to visualize data in a low dimension defined by a linear combination of all of the available markers, rather than just 2 at a time. This provides an aid in diagnosing similar forms of cancer, as well as a means for variable selection in exploratory flow cytometric research. We refer to our method as Information Preserving Component Analysis (IPCA).

💡 Deep Analysis

Deep Dive into Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis.

Flow cytometry is often used to characterize the malignant cells in leukemia and lymphoma patients, traced to the level of the individual cell. Typically, flow cytometric data analysis is performed through a series of 2-dimensional projections onto the axes of the data set. Through the years, clinicians have determined combinations of different fluorescent markers which generate relatively known expression patterns for specific subtypes of leukemia and lymphoma – cancers of the hematopoietic system. By only viewing a series of 2-dimensional projections, the high-dimensional nature of the data is rarely exploited. In this paper we present a means of determining a low-dimensional projection which maintains the high-dimensional relationships (i.e. information) between differing oncological data sets. By using machine learning techniques, we allow clinicians to visualize data in a low dimension defined by a linear combination of all of the available markers, rather than just 2 at a time.

📄 Full Content

arXiv:0804.2848v1 [stat.ML] 17 Apr 2008 1 Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis Kevin M. Carter1, Raviv Raich2, William G. Finn3, and Alfred O. Hero III1 1 Department of EECS, University of Michigan, Ann Arbor, MI 48109 2 School of EECS, Oregon State University, Corvallis, OR 97331 3 Department of Pathology, University of Michigan, Ann Arbor, MI 48109 {kmcarter,wgfinn,hero}@umich.edu, raich@eecs.oregonstate.edu Abstract Flow cytometry is often used to characterize the malignant cells in leukemia and lymphoma patients, traced to the level of the individual cell. Typically, flow cytometric data analysis is performed through a series of 2-dimensional projections onto the axes of the data set. Through the years, clinicians have determined combinations of different fluorescent markers which generate relatively known expression patterns for specific subtypes of leukemia and lymphoma – cancers of the hematopoietic system. By only viewing a series of 2-dimensional projections, the high-dimensional nature of the data is rarely exploited. In this paper we present a means of determining a low-dimensional projection which maintains the high-dimensional relationships (i.e. information) between differing oncological data sets. By using machine learning techniques, we allow clinicians to visualize data in a low dimension defined by a linear combination of all of the available markers, rather than just 2 at a time. This provides an aid in diagnosing similar forms of cancer, as well as a means for variable selection in exploratory flow cytometric research. We refer to our method as Information Preserving Component Analysis (IPCA). Index Terms Flow cytometry, statistical manifold, information geometry, multivariate data analysis, dimension- ality reduction, clustering Acknowledgement: This work is partially funded by the National Science Foundation, grant No. CCR-0325571. September 30, 2018 DRAFT 2 I. INTRODUCTION Clinical flow cytometric data analysis usually involves the interpretation of data culled from sets (i.e. cancerous blood samples) which contain the simultaneous analysis of several measure- ments. This high-dimensional data set allows for the expression of different fluorescent markers, traced to the level of the single blood cell. Typically, diagnosis is determined by analyzing individual 2-dimensional scatter plots of the data, in which each point represents a unique blood cell and the axes signify the expression of different biomarkers. By viewing a series of these histograms, a clinician is able to determine a diagnosis for the patient through clinical experience of the manner in which certain leukemias and lymphomas express certain markers. Given that the standard method of cytometric analysis involves projections onto the axes of the data (i.e. visualizing the scatter plot of a data set with respect to 2 specified markers), the multi- dimensional nature of the data is not fully exploited. As such, typical flow cytometric analysis is comparable to hierarchical clustering methods, in which data is segmented on an axis-by-axis basis. Marker combinations have been determined through years of clinical experience, leading to relative confidence in analysis given certain axes projections. These projection methods, however, contain the underlying assumption that marker combinations are independent of each other, and do not utilize the dependencies which may exist within the data. Ideally, clinicians would like to analyze the full-dimensional data, but this cannot be visualized outside of 3-dimensions. There have been previous attempts at using machine learning to aid in flow cytometry di- agnosis. Some have focused on clustering in the high-dimensional space [1], [2], while others have utilized information geometry to identify differences in sample subsets and between data sets [3], [4]. These methods have not satisfied the problem because they do not significantly approach the aspect of visualization for ‘human in the loop’ diagnosis, and the ones that do [5], [6] only apply dimensionality reduction to a single set at a time. The most relevant work, compared to what we are about to present, is that which we have recently presented [7] where we utilized information geometry to simultaneously embed each patient data set into the same low-dimensional space, representing each patient as a single vector. The current task differs in that we do not wish to reduce each set to a single point for comparative analysis, but to use dimensionality reduction as a means to individually study the distributions of each patient. As such, we aim to reduce the dimension of each patient data set while maintaining the number of September 30, 2018 DRAFT 3 data points (i.e. cells). With input from the Department of Pathology at the University of Michigan, we have deter- mined that the ideal form of dimensionality reduction for flow cytometric visualization would contain several properties. The data nee

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut