Counting Words in Social Science - Matthew Taddy
Matthew Taddy, Associate Professor of Econometrics and Statistics at The University of Chicago
Abstract
Social scientists are embracing the idea of using `text as data’ as a way to quantify, measure and discover social concepts. Professor Taddy will discuss a brief history of how this strategy has worked and evolved, and present the high dimensional multinomial logistic regression models that he uses as a basis for text analysis. Illustrated with a series of applications — tweets about politicians, reviews on yelp.com, congressional speech — Professor Taddy will give the how and why of this approach. The “how" touches on distributed computing and regularized estimation techniques. The “why" considers questions of prediction, treatment effects estimation, and finally inference about the content of text itself. Despite all being based on the same models, these goals each involve a different set of assumptions and challenges.