Big Data and the Digital Humanities

  • Datum: –16.00
  • Plats: Zoom
  • Föreläsare: William A. Kretzschmar, Jr. University of Georgia and Uppsala University
  • Arrangör: CDHU and the Department of English, Uppsala University
  • Sista anmälningsdag: 2021-05-25 kl. 13:00.
  • Kontaktperson: Karl Berglund
  • Seminarium

Bill Kretzschmar is Harry and Jane Willson Professor in Humanities at the University of Georgia, and visiting professor at the Department of English at Uppsala University. He has for long been influential in the development in digital methods in studying English. In this talk, Kretzschmar will address the role and implications of big data for such tasks.


Big Data is a term to conjure with among grant funders these days. For example, in 2014 at the American National Science Foundation there was $28 million available in the Big Data program and only $2 million in the Linguistics program. The question for us is how we can participate in all of this activity. Perhaps the best way for us to do so, given the fact that much Big Data is composed of language, is to promote the fact that human language and human culture more generally is a complex system. The patterns of behavior that we call languages, and all of the many patterns in human culture, arise because of the continuing interaction of people who are interacting with the people around them. For the last century we have all tried to apply logical tools to language, to make grammars, but now that we have digital tools we can make new studies of language based on Big Data collections that document and describe the patterns that emerge within a language. The benefit of doing so is a finer understanding of how language, and by extension culture, is used differently at different times and in different situations, so that we can be more effective in our different situations of language use and so that machines, computers, can be more effective in linguistic interactions with people. 

This talk will introduce the kind of patterns that we should expect to find emerging from the complex system of English. Besides counting how often each word and each meaning of a word occur, we should document the frequency of collocates for word/meaning sets. According to the complex system, a few meanings and collocates will be very common in each situation of use (what C. S. Lewis called the "dangerous sense" of a word) while most of the words or collocates that can occur are rare. The same nonlinear profile will exist at every level of scale from each genre in a national setting at each time all the way up to usage in English overall, but the top-ranked words and collocates are likely to be different at every level of scale. This scale-free patterning suggests the importance of dimensionality, the idea that things look different at different scales of observation. The potential benefit of studying English as a complex system is enormous, in the understanding of the real basis for grammar that can permit more effective teaching of English, in the understanding of differences in meaning and usage between situations of use, and in the creation of industrial products that can work much more successfully with human speech (e.g. speech recognition and speech synthesis).  And Big Data in the digital humanities is one way to achieve those benefits, not only for language but also for other aspects of culture.