Polyglot

Repository: https://opensource.ncsa.illinois.edu/stash/scm/pol/polyglot.git
Documentation: Web | PDF | Javadoc
Bug Reporting: Jira
License: NCSA Open Source
Status: Active

Utilizing a tool called a Software Server to program against functionality within arbitrary 3rd party software, Polyglot is a distributed service which carries out file format conversions utilizing the open, save, import, and export capabilities amongst a dynamic and extensible collection of available software. Polyglot addresses the need to access content amongst the many possible formats available to store data digitally. Polyglot also addresses the problem of information loss that inevitably occurs through conversion and provides means of quantifying that loss and then minimizing it during future conversions.

Details

Motivated by a need to identify 3D file formats best suited for long term digital preservation and the large number of available formats, NCSA Polyglot (n. one who speaks many languages) was created as a means of providing an extensible, scalable, and quantifiable conversion service. In our work we define a file format as well suited for long term preservation if it is open, widely supported, and incurs little information loss when converted to by many other formats (an essential requirement given the already existing collections of files across many formats). With suitable measures we can estimate information loss by comparing files before and after conversions. In order to actually identify an ideal format for long term preservation, however, we need to be able to evaluate conversions between potentially all available formats. In other words, we require a "universal" converter. Building a "universal" converter by directly supporting any of the available formats requires either implementing a file loader for each format (to extract its content independent of the file type) or implementing transcoders which directly convert between pairs of formats. Both tasks are arduous if not impossible given the number of formats available. This is made even more difficult by the fact that many formats are propriety with closed specifications. In order to build a practical "universal" converter we take a different approach. It is a fact that vendors of proprietary formats will support their format within their own software. It is also generally the case that these software applications support importing and exporting to some subset of file formats. We utilize this built in support across and do this across many software packages to build a conversion service. At the heart of Polyglot are software servers which allow arbitrary software to be placed in the "cloud" under a uniform API. The Polyglot service, focused on conversions, utilizes the "open", "save", "import", and "export" operations provided by a collection of distributed software servers. From these operations an input/output graph is constructed which stores formats at its vertices and conversions between input and output formats using a particular piece of software as its edges. In order perform a conversion between a given input and output format we search this graph for a shortest path between the formats, identifying applications capable of performing the conversion and then calling the corresponding software server operations to carry it out.

Source code is available from our git repository at https://opensource.ncsa.illinois.edu/stash/scm/pol/polyglot.git and can be checked out as follows:

git clone https://opensource.ncsa.illinois.edu/stash/scm/pol/polyglot.git
If you wish to contribute code to the project please contact kmchenry@ncsa.illinois.edu.



An I/O-Graph, with vertices representing a number of file formats and edges representing a conversion between a source and target format. The highlighted edges indicate a conversion path between the *.stp and *.lwo file format given the 3rd party applications represented within the graph.




Videos

An overview of the prototype Polyglot 3rd party software extensible conversion system. Overview
Using GUI driven 3rd party software open/save capabilities to create an extensible conversion service. Converting Files
Previewing Files
Listening for Software
Monitoring Software

Screen Shots

Publications

McHenry K, Ondrejcek M, Marini L, Kooper R, Bajcsy P. Towards a Universal Viewer for Digital Content. In: International Conference on Computer Science, Executable Paper Workshop.; 2011. Abstract  Download: pdf (734.71 KB)
Bajcsy P, Kooper R, Marini L, McHenry K, Ondrejcek M. A Framework for Understanding File Format Conversions. In: ACM ICPS US Workshop on roadmap for Digital Preservation Interoperability Framework.; 2011. Download: pdf (1.35 MB)
McHenry K, Kooper R, Marini L, Bajcsy P. Designing a Scalable Cross Platform Imposed Code Reuse Framework. In: Microsoft Research eScience Workshop. Berkeley, CA,; 2010. Download: pdf (243.55 KB)

Downloads

No files for this project.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer