An NIH funded project to study the immunoregulatory properties of Mesenchymal stem cells (MSC) during islet cell transplants. NCSA conducts Core C which looks at eScience approaches, looking at large already existing collections of data, to make new discoveries from previously conducted research involving MSC. Medical research inherently suffers from what statisticians call the curse of dimensionality. Involving complex organisms such as ourselves with untold numbers of interacting biological properties, coupled with lengthy and costly experiments to observe only a small number of these, researchers are faced with large sparse collections of data that offer small glimpses into a very high dimensional feature space. Keeping up with the growing amount of published work produced as a result of this, what is needed to make any discoveries, is becoming difficult. Further, published works often consider only a small portion of the available data (e.g. the data gathered solely by the authors). Because of the nonlinear nature of the data, considering data from multiple studies at once may very well lead to new discoveries.
In this work we address the need to index large diverse collections of data scattered in a variety of formats from spreadsheet files, to databases, to PDFs of published articles. Our goals include providing access to information within these heterogeneous sources in a uniform manner, providing a host of user friendly visualizations and data mining tools to allow medical researchers to explore the data, and exploring means of robustly incorporating information contained within unstructured data sources such as microscopy images.