Main project page - Digging into image data, NCSA at the University of Illinois at Urbana-Champaign

About

Digging into Image Data to Answer Authorship-Related Questions (DID-ARQ)

Digging into Image Data to Answer Authorship Related Questions (DID-ARQ) seeks to explore authorship studies of visual arts through computational image analyses.
Authorship Overview: In the past, authorship has been explored in terms of attributions, typically of either individual masterpieces or small collections of art from the same period, location, or school. Due to these localized strategies of exploration and research, commonalities and shared characteristics are largely unexplored. In fact, it is rare to find discussions beyond a single discrete dataset. More significantly, to our knowledge, there have to date been no studies of image analyses targeting the problem of authorship applied to very large collections of images and evaluated in terms of accuracy over diverse datasets.

Addressing Authorship: DID-ARQ investigates the accuracy and computational scalability of adaptive image analyses when they are applied to diverse collections of image data. While identifying distinct characteristics of artists is time-consuming for individual researchers using traditional methodologies, computer-assisted techniques can help humanists discover salient characteristics and increase the reliability of those findings over a large-volume corpus of digitized images. Computer-assisted techniques can provide an initial bridge from the low-level image units, such as color of pixels, to higher-level semantic concepts such as brush strokes, compositions or patterns.

This effort will utilize three datasets of visual works -- 15^th-century manuscripts, 17^th and 18^th-century maps, and 19^th and 20^th-century quilts to investigate what might be revealed about the authors and their artistic lineages by comparing manuscripts, maps, and quilts across four centuries.

Examples of three datasets of images: fifteenth century manuscript, seventeenth and eighteenth-century maps, and quilts from the last two hundred years.

Problem Description: Based on the artistic, scientific or technological questions, DID-ARQ intends to formulate and address the problem of finding salient characteristics of artists from two-dimensional (2D) images of historical artifacts. Given a set of 2D images of historical artifacts with known authors, our project teams aim to discover what salient characteristics make an artist different from others, and then to enable statistical learning about individual and collective authorship.

The objective of this effort is to learn what is unique about the style of each artist, and to provide the results at a much higher level of confidence than previously has been feasible by exploring a large search space in the semantic gap of image understanding. As such, we would like to:
(a) design image analysis algorithms that will extract salient image features, group images based on similarity of these features, classify groups according to a priori knowledge, and optimize algorithmic steps and parameters;
(b) apply the algorithms jointly developed to the three collections of images;
(c) report accuracy and computational requirements over all of the image collections.

Acknowledgments

The project is supported by the National Science Foundation (NSF) and National Endowment for the Humanities ( from the United States, the Joint Information Systems (JISC) from the United Kingdom and the Social Sciences and Humanities Research Council (SSHRC) from Canada via a Digging into Data Challenge Grant Award. The material presented on this web page is based upon work supported by the National Science Foundation under Grant No. 10-39385.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

Digging Into Data Grant Announcement

Project description

Open research problems were divided into artistic, scientific and technological questions based on the specific datasets that elicit those questions. DID-ARQ expects that these questions will be useful across the work of all three groups.

Artistic questions: Artistic questions include not only where and by whom were the artefacts created, but also what characteristics distinguish individual artists and groups of artists (e.g manuscripts illuminators, map makers and engravers, quilt-makers). How do the artifacts reflect artistic styles, the tastes of the particular region and historical moments to which they belong?
Scientific questions: Scientific questions are more dataset specific. For medieval chronicles the questions would likely include: What was the impact of Hundred Years' War (1337-1437 C.E.) on culture as measured by the various aspects of these manuscripts? How do they reflect contacts between the cultures of France and England? For maps the questions would likely explore detailed geographical and/or climatological knowledge in representations of coastlines, rivers, mountain passes that indicate potential routes for exploration and trade etc. Scientific questions about quilts would likely include: Can the quilts created by certain quilt-makers be differentiated from those of other communities? Can changes be found through changes in quilt-making styles? Can a resurgence or interest in a particular historic cultural community's quiltmaking styles be found in quilt-making a century later? To what extent are quilts made by rural quilters similar or dissimilar to those made by urban quilters in the same time period? Does this change over time?
Technological questions: Technological questions are related to the design of algorithms that can extract evidence at the low-level image units that could be aggregated into higher-level semantic concepts and support humanists in image understanding and authorship assignment. This would include considerations of the statistical confidence of authorship hypotheses obtained by processing volumes of images that could not have been visually inspected with the current human resources within a reasonable time frame.

Technologies used in the DID project

Multimedia content management system (called Medici): The system developed by NCSA provides a place for web-based sharing of test data across multiple sites. The functionality of the system includes drag-and-drop upload, automated metadata extraction, collection creation, tagging and annotation, preview and large size image display, search based on metadata, overlay with Google map if latitude and longitude metadata are embedded in files, and others.

Image To Learn (Im2Learn): This is a software library of various image analysis tools assisting in solving real life problems in the application areas of machine vision, precision farming, land use and land cover classification, map analysis, geo-spatial information systems (GIS), bio-informatics, microscopy and medical image processing, and advanced sensor environments. The library provides basic functionality for analyzing historical maps, photographs of quilts and illustration in historical manuscripts.

Versus: This is an application programming interface (API) to incorporate methods for comparing digital objects. By implementing the Versus API, multiple comparison methods can be applied to various images and explored using high performance computing resources.

Digging into Image Data to Answer Authorship-Related Questions (DID-ARQ)

Acknowledgments

Technologies used in the DID project

Collaborating Sites:

Team members: