Data is the history of the modern world

September 02, 2018

Keynote Speaker Daniel Rosenberg is Professor of History at the University of Oregon. He writes on a wide range of topics related to history, epistemology, language, and visual culture. Rosenberg has held fellowships from a range of institutions including the National Endowment for the Humanities, the Max Planck Institute for the History of Science, the Stanford Humanities Center, and the American Academy in Berlin. In this interview we talk about the history of data, the impact of the emergence of the internet on the nature of data and TimeOnline, a project by Daniel dedicated to reverse engineering infographic artifacts from the print era.

What can computer scientist learn from the history of data?

We can think about the history of data in at least two ways: as the history of a problem and as the history of an idea. Evidently, the two things are related, but they are not identical, and their changing relationship is at the heart of my research. Often in history, people encounter intellectual, cultural, and scientific problems without having a self-conscious language for dealing with them. And then, for any number of reasons, new terms for these problems are articulated. In many cases, the act of naming reflects the emergence of a new approach to the problem. In some cases, naming itself is an important step in developing a new approach. One might think, for example, of the word society, which was new in the eighteenth century, which helped to crystallize philosophical and political thought around forms of human organization, and which at the same time reflected the tremendous interest of the Enlightenment philosophers in these same phenomena.

So too with data. The history of the problem of data is very old. The story of the systematic production, aggregation, quantification, classification, manipulation, storage, management, communication of information stretches to Babylon and further. Indeed, the oldest artifacts of writing are not stories of gods or heroes; they are inventories, receipts, and tax rolls—the very sorts of records that we unreflectively refer to as data today. The title of the Book of Numbers in the Hebrew scripture refers to a census. Early empires around the world were supported by sophisticated systems of information gathering and processing. Ships docking at the ancient port of Alexandria were to surrender any written texts they were carrying to be copied into the great library of that city. I could go on. Archaeologists have uncovered animal bones inscribed with lunar observations more than thirty thousand years ago. A lot of what we think of as modern in our data practices is, in fact, extraordinarily old. Not just a little old—like a couple of millennia—but as old as writing.

For computer scientists who don't want to reinvent the wheel—or the algorithm—I'm guessing there's a thing or two to be learned from this history. Before electronic computing and the other mechanical aids to data processing that are ubiquitous today, brute force solutions to computational and informational problems were mostly inaccessible: elegance was everything.

So, that's one dimension of the question that I think may be interesting to computer scientists. A second dimension has to do with language and with the concept or category of data itself. Prior to the modern period, when one spoke of what we now call data, one spoke instead of numbers, enumerations, records, observations, facts—all manner of things, depending on context—but the covering term data was not used. In contrast to the very long history of data practices, the history of the category data is only about three or four hundred years long. Attuned as they are to the power of ontologies, I think that computer scientists are likely to grasp the significance of the articulation of the category data intuitively.

In effect, the thinging of data is a development of the eighteenth century. Knowing this helps us, in part, because it directs our attention to a number of phenomena that drove the emergence of the early concept. These include the emergence of large state bureaucracies, global financial instruments, quantitative science. It also helps us see how ideas about data informed all of those institutions and practices. It helps us perceive at once the power of the category data to unify disparate kinds of objects under a single analytic as well as the hardiness of the old practices of record making and information processing that proceeded for millennia without a unifying concept. Finally, this kind of nominalist approach sensitizes us to the constitutive power of category in and of itself. It helps us see more clearly that data does not name a kind of object, but rather a perspective that one brings to informational objects.

How did the nature of data change since the emergence of the internet, increasing availability of processing power and semantic interpretability of information? Evolution or revolution?

I’m a skeptic about the notion that there is any such thing as a nature to data. Data is a rhetorical category. The very same stuff (numbers, reports, what have you) may be treated as data or as something else, and the difference matters. In some contexts, the color of my eyes is a physical attribute. When I pass through border control entering the EU from the United States, it is an aspect of my personal data.

Interestingly, the rhetorical character of data was an important aspect of the concept from the very beginning. Without going into too much detail—you’ll have to attend my talk for that—I will just point out that from the very beginning the term data was employed to draw a distinction between things and representations. Etymologically, data simply means givens. That’s the meaning of the term in Latin, and in Latin, data can refer to anything that is given. In the Latin Vulgate version of Exodus, for example, data refers to a very wide range of things given from gold, silver, brides, and fields to rights, glory, honor, and counsel. In early modern vernaculars, the term data often evokes the usage derived from Euclid’s work of the same name. In Euclid, data or dedomena in Greek, are givens in a mathematical problem. One might, for example, give some of the angles and some of the lengths of a geometrical figure, and then ask the reader to solve for another. The former would be the data (the givens) and the latter, the quaesita (the unknowns) in the problem. The power of the category data here, is that it frees writers from all concern over whether or not such data as described by Euclid have any reference in the world, allowing them to focus on the analysis and manipulation of the data in and of themselves.

From a historical point of view, what happens next is a fascinating twist: scholars in the eighteenth century increasingly employed the category data in order to talk about measurements, observations, and the like. This ran counter to the tradition of Euclid, but it made use of the same conceptual dynamic. When they treated collected information as data, eighteenth-century scholars gave themselves permission to move straight to analysis while bracketing the problem of the veracity of the information with which they were working. There is a powerful and instructive irony here: while the term data was used in mathematics (and notably also in theology, which is an interesting story in itself) in order to delineate a conceptual territory apart from the world of things, the broad cultural importance of the concept comes from its use in empirical pursuits, from its use in pursuits in which, eventually, the question of observational truth and reference really does matter. And yet, just as in mathematics, in empirical science, government record making, statistics, and so forth, one of the most important implications of the data concept is the production of a brand new kind of fact: a fact of representation that has its own reality and efficacy regardless of its verisimilitude. A Twitter trend is a real and powerful social fact regardless of what it may be said to actually represent.

To get back to your initial question: how did the nature of data change since the emergence of the Internet? That’s something that computer scientists probably have more to say about than I do. What I prefer to emphasize is what has not changed around the concept data during this period, and that is mostly everything.

What do you perceive as the most exciting or challenging developments in the years to come? Can history tell us something about the future?

History tells us very little about the future, though it can tell us a great deal about the present. As I noted in response to your previous question, the history of the concept data is pretty much the history of the modern world and of its characteristic institutions and intellectual practices. Among the intellectual characteristics of our period—and this differentiates us from our predecessors prior to the seventeenth and eighteenth centuries—is a tendency to think of history as principally a process of change and of novelty. In other words, a key innovation of the modern world is the notion of innovation itself. And there’s nothing wrong with this perspective: new things are interesting. But, as moderns, we often overlook historical continuity in favor of change. This is nowhere more evident than around tech. So, rather than speculate what might be the next big thing, I’ll suggest that a persistent challenge for us in years to come is to maintain an appropriate skepticism as each next big thing is announced. Theory is dead. The data deluge makes the scientific method obsolete. You know the drill.

What can you tell us about recent milestones or achievements with your current project TimeOnline?

TimeOnline is a digital experiment produced by our team based at the University of Oregon, partnered with scholars and designers at several other universities in the US and Europe. Broadly speaking, the project explores themes articulated in the book I wrote with Anthony Grafton in 2010 called Cartographies of Time. In TimeOnline, we reverse engineer infographic artifacts from the print era—typically the eighteenth and nineteenth centuries—and rebuild them for the Web. Part of the payoff of this work is making some really interesting and hard to find old graphics available to experience online. We tell their stories, and we set up ways of interacting with them that reproduce and enhance the print experience. But our main interest as a research group is exploring what you might call the algorithmic logic of print graphics. Our working hypothesis is that these print artifacts imply interaction protocols that are as or more sophisticated than those we encounter in digital environments today. In many cases, the very limitations of the print medium make it necessary for designers to be that much cleverer than digital designers. We think there’s a lot to learn from these explorations that can be applied in the digital sphere. We also think there’s a lot to be learned about the specific dynamics and power of the print medium.

To date, we’ve made public working versions of three such artifacts. It’s still work in progress, but I think they are already quite interesting to play with. The earliest artifact we’ve built out so far is Elizabeth Palmer Peabody’s 1850 Polish-American System of Chronology. Peabody was an important American educator and a member of the transcendentalist movement in the US. In this work, she adapted a mnemonic system popular in Europe, originally Poland, since the 1820s. The Polish-American system teaches students to memorize historical names and dates using a visual grid filled with colors, looking a bit like a psychedelic punch card. It’s an interesting mashup of ideas about reference and memory. Our second artifact, James Ludlow’s 1885 Concentric Chart of History, shows how paper artifacts could be dynamically interactive. His chart is a fan of minutely printed cards rotating around a pin. It’s a masterpiece of data compression and interactive design. Part of what’s telling about this one is how insufficient the digital version feels. Ours has a good look and feel, but it’s nowhere near as intuitive, quick, or satisfying as the paper original. We’re interested in why that’s the case. Finally, we have a version of Mark Twain’s Memory Builder, a chronology game designed and marketed by the American writer in the 1880s and 90s. Twain was fascinated by the subject of history and by the problem of memory and mnemonic systems. These concerns come together in his trivia game. In some ways, Twain’s game board reminds us of Peabody’s grid, but here something else is going on, which has to do with play, a subject in which the humorist was an expert. Twain’s game, however, is challenging for modern users: as a person educated in nineteenth-century schools, Twain had memorized many historical dates. We don’t do that so much anymore, and so Twain’s game is difficult—again, in historically interesting and telling ways.

So, those were our first experiments. We’re now working on a very robust module based on the famous 1765 Chart of Biography and 1769 New Chart of History by the great eighteenth-century scientist and theologian, Joseph Priestley. In our book, Cartographies of Time (2010), Anthony Grafton and I make the case that these charts were among the key works in establishing our modern visual vocabulary for infographics. In TimeOnline, we’ve taken them apart and put them together again, and we think we’ve got both a compelling historical exploration and a piece of data visualization relevant today. Users will be able to use the charts the way Priestley intended, as tools to examine chronological relationships among historical actors and thinkers as well as nations and empires. Users will also be able to learn a lot about the dynamics and presuppositions of eighteenth-century historical thought. When this work is done, we plan to publish it in the Stanford University Press digital projects series.

Discuss about the quantitative history of  data with Daniel at SEMANTiCS 2018. Register now!


The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, and understand its benefits and know its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers, and researchers, from organisations ranging from NPOs, universities, public administrations to the largest companies in the world.