Vignette A — Ordering books by color and size

1. A patron at a library seeks help finding a book: “It had a purple cover,” the patron said. “It was about so big,” she continued, framing the size of the book with her hands. “About an inch thick, too.”

2. Strand Book Store (New York, NY) (http://www.strandbooks.com/books-by-the-foot):

Book by the foot collections can be made to order based on color, binding, material, size, and height to match your specific style and home decor. We can supply book by the foot collections on any subject or genre, including art, biography, literature, New York, history, music, film, etc. Our books by the foot collections are ideal for any book collector, decorator, or set designer. Our experienced, designated staff is more than happy to work with you to build a unique books by the foot collection that meets all you needs!

3. Etsy vendor sorrythankyou79 offers “books by color bundle.” A white collection includes Where Have I Been? by Sid Caesar, Allergy Cooking by Marion Conrad, The Power Eaters by Diana Davenport, and Malice in Blunderland by Thomas L. Martin.

Vignette B — From “certain Chinese encyclopedia” (Jorge Luis Borges)

Animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies.


Data and orders

Joel Herndon and I have had many stimulating conversations over the years, usually about matters of data management. We usually tend to the practical end of things, exchanging views on how to encourage better “data management practices” or frame more compelling and executable “data management plans” for grant proposals. Occasionally, we exchange more theoretical notions about underpinnings or the more philosophical “status” of data. He’s a librarian, rigorously trained in the social sciences. I’m more diffusely trained in the grab-bag known as intellectual history and literature, with a decent bit of science thrown in.

Our conversations have betrayed a certain unease that arises from a contradiction we sense about data: Our managerial concerns push us toward processes and “ways of handling” data that pay little heed to what data themselves depict or characterize. And at the same time, we assume data mean something on their own and our “ways of handling” them doesn’t taint their value. Indeed, we hope that our work makes data more useful. Our unease? We both suspect that in fact our processes for  data “handling” taint or influence — which I chalk up as an inevitable and all-too-human failing.

The two vignettes depict ways of ordering that clash with our sensibilities, placing books or animals into frameworks that are foreign to our consideration. (Michel Foucault unpacked the “Chinese encyclopedia” classification quite adeptly in The Order of Things.) But the affront of ordering needn’t be so jarring as the color-code of the clueless library patron, the Strand’s books-as-furniture service, and Borges’ categories of animals. Our scholarly orderings of things, so necessary for rediscovery of knowledge and materials, also influence and constrain the things placed into their “containers.” Nothing’s better than a scholarly box to contain unruly data.

Ontologies

Which brings us to ontologies. Ontologies name entities in a discipline, and that also can mean defining relationships and processes. The promise of ontologies lies in the standardization of naming, so that people in a discipline have a means of communicating and characterizing new knowledge using terms that are generally useful and clearly defined.

This is, of course, an enormously enticing prospect, since ontologies simplify communication and even (some think) make it possible for knowledge to arise from an ontology. An ontology, in this hopeful view,  might even give rise to knowledge — or at least make the muck obscuring discovery less opaque by limiting complexity and ordering.

"Standards" by xkcd.com (CC-BY-NC license)

“Standards” by xkcd.com (CC-BY-NC license)

I once held that hopeful view, probably more fervently than most of my peers. But a couple of problems presented themselves:

Ontologies are profoundly difficult things. And as with many profoundly difficult things, there is a mutual disagreement about the innards of an ontology when there should be concord and unity. The communities that ontologies might unite are — like the builders of the Tower of Babel — disunited by languages. The result? Competing standards and ontologies.

Ontologies simplify, though they simplify despite the complexity of reality. As a consequence, the language of ontology tends to misrepresent by design. This makes them poor engines of discovery, at least in and of themselves (and this was my hope and dream for them in bioinformatics).

Ontologies look back, not forward. At best, they describe things already known and thought-to-be understood. Sure, they can be used to mine structure in corpora (of publications, for example) that may reveal latent connections. But we can’t be assured that ontologies themselves faithfully represent the structure and shape of Reality-with-a-Capital-R.

Borges’ fictional note applies: “If there is a universe, its aim is not conjectured yet,” he wrote in “The Analytical Language of John Wilkins,” “We have not yet conjectured the words, the definitions, the etymologies, the synonyms, from the secret dictionary of God.”

You might think I’m anti-ontology, but you’d be wrong. I think they’re enormously useful.

How do we wash out this damned spot?

So, is it possible to make data speak — without hearing our own inflections and accents? How do we remove the taint?

For me, the problem is a matter of expectation. My vain hope was that an ontology represented a reality of things — of the objects of study — and that to refine an ontology in effect meant empowering a datum with an enriching and Real context. (I give myself a little slack; this is an easy hope to embrace, since the term ontology comes from philosophy and relates to “the nature of being.”) Now, I think of ontologies and controlled vocabularies as representing a community’s understanding of their objects of study. The shift of focus is important, because a community’s understanding can be assumed to be flawed. The ontology, framed in that way, should shift to accommodate the new — since it ends up being a representation of a current language and a current shared set of understandings and not some abiding or essential (“ontological”!) standard. This mature view is more in line with Thomas Gruber’s classical definition of ontology in information science.

Competing ontologies, like XKCD’s standards, come up in the jostling and adjustment within scholarly communities, and it could be that all the discussion of ontologies in, say, bioinformatics, is in part a sign of the health of the discipline and its hopefulness and energy — a sign that intellects are struggling to put discoveries into a broad scholarly context with an eye to expanding understanding and engaging a whole community.

Data management uses classifications and things like ontologies as tools to organize data. These things place data into a relationship with a purpose or intent (hence, perhaps, the “taint” that Joel and I worry about). The objects are organized, and anyone with a knowledge of the organizational structure can find objects within it. And the objects have meaning within the context.

You use ontologies because they are useful, not necessarily because they are true. (They are probably — even most assuredly — not true.)

Which brings me to the picture at the top of this little post. It’s part of a screenshot from Etsy.com, an offering from sorrythankyou79 of “ocean teal” books. Of course, I look at the titles, and behold one happily called The Design of Life, though jarringly near 1001 Questions Answered About Trees. Yet, titles are not the point; subjects are not the point. Color is, for the books are colored objects — materials for decor.

It’s a little bit of an effrontery to me, to think of books this way. But sorrythankyou79 has a different means of organizing wares — one that fits another purpose and intent.

Mark R DeLong, PhD (mark.delong@duke.edu)


This is the first of an occasional series of posts relating to data management.


Jorge Luis Borges. “The Analytical Language of John Wilkins.”

Michel Foucault. The Order of Things. London and New York: Routledge, 1989.

Thomas R. Gruber. “A Translation Approach to Portable Ontology Specifications” Knowledge Systems Laboratory, Technical Report KSL 92-71, September 1992. Appeared in Knowledge Acquisition, 5(2):199-220, 1993. http://tomgruber.org/writing/ontolingua-kaj-1993.pdf Downloaded 27 May 2015.

Mireille Silcoff. “On Their Death Bed, Physical Books Have Finally Become Sexy.” The New York Times Magazine. 25 April 2014. http://www.nytimes.com/2014/04/27/magazine/on-their-death-bed-physical-books-have-finally-become-sexy.html; downloaded 26 May 2015