TY - ELEC TI - Topic Modeling and Digital Humanities AU - Blei, David M. T2 - Journal of Digital Humanities AB - Topic modeling provides a suite of algorithms to discover hidden thematic structure in large collections of texts. The results of topic modeling algorithms can be used to summarize, visualize, explore, and theorize about a corpus. A topic model takes a collection of texts as input. It discovers a set of “topics” — recurring themes that are discussed in the collection — and the degree to which each document exhibits those topics. Figure 1 illustrates topics found by running a topic model on 1.8 million articles from the New York Times. The model gives us a framework in which to explore and analyze the texts, but we did not need to decide on the topics in advance or painstakingly code each document according to them. The model algorithmically finds a way of representing documents that is useful for navigating and understanding the collection. In this essay I will discuss topic models and how they relate to digital humanities. I will describe latent Dirichlet allocation, the simplest topic model. I will explain what a “topic” is from the mathematical perspective and why algorithms can discover topics from collections of texts.[1] I will then discuss the broader field of probabilistic modeling, which gives a flexible language for expressing assumptions about data and a set of algorithms for computing under those assumptions. With probabilistic modeling for the humanities, the scholar can build a statistical lens that encodes her specific knowledge, theories, and assumptions about texts. She can then use that lens to examine and explore large archives of real sources. DA - 2012/// PY - 2012 LA - en UR - http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ Y2 - 2019/01/14/08:37:22 KW - Topic model introductions and tutorials KW - Topic modeling ER -