Agenda: Conference on Educational Data Science

To inquire about video recordings of any of the talks presented at this conference, or to request a copy of any of the corresponding papers, slide decks, or posters, iriss-news [at] stanford.edu (contact the staff at the Institute for Research in the Social Sciences). Please note that our archive may not include papers or slide decks for every single participant.

September 18, 2020

7:50am to 8:00am - Greeting and Introduction

8:00am to 8:40am - Session 1 - Data Science and Education: Broad Strokes

Judith Singer

PROFESSOR OF EDUCATION, HARVARD UNIVERSITY

Educational Data Science: Opportunities and Challenges

Nichole Pinkard

ASSOCIATE PROFESSOR, LEARNING SCIENCES, NORTHWESTERN UNIVERSITY

Moving from Theory to Systems: Infrastructuring Hyper-Local Opportunity Landscapes

David Williamson Shaffer

DAVID WILLIAMSON SHAFFER IS THE VILAS DISTINGUISHED PROFESSOR OF LEARNING SCIENCES, UNIVERSITY OF WISCONSIN-MADISON

The Role of Meaning in Educational Data Science

In the age of big educational data, researchers have tools to find ever more subtle patterns in data about teaching and learning – and about teachers and students. But big data presents challenges to traditional research methods, both qualitative and quantitative: challenges to our understanding of utility, reliability, validity, replicability, interpretability, and even significance itself. This talk looks at the reasons to – and ways to – address these challenges by keeping the concept of meaning central to the emerging field of educational data science.

Alyssa Wise

ASSOCIATE PROFESSOR OF LEARNING SCIENCES/EDUCATIONAL TECHNOLOGY, NYU

The Complementarity of Human Insight and Computational Power in Learning Analytics

Learning analytics is a technology for enabling better decision-making by teachers, students, and other educational stakeholders by providing them with timely and actionable information about learning-in-process on an ongoing basis. To be effective learning analytics must thus not only be technically robust but also designed to support human use. While much has been said about the benefits that can be reaped by applying computational methods to educational big data, the role and importance of human insight in the creation and use of analytics is less clear. In this talk I'll present a variety of different ways that human insight can be incorporated into analytic design and interpretation to offers important and complementary value to that provided by algorithmic processing.

Elizabeth Stuart

PROFESSOR, JOHNS HOPKINS BLOOMBERG SCHOOL OF PUBLIC HEALTH

Discussant

8:45am to 9:25am - Session 2 - Learning Analytics and Online Learning

Nia Dowell

ASSISTANT PROFESSOR, UC IRVINE

Creating Scalable Models of Collaborative Interaction Dynamics and Outcomes

In the current globalized world, innovation in science and technology are vital for economic competitiveness, quality of life, and national security. This trend is accelerating the increasing reliance on virtual teams and their collaborative effort to solve complex environmental, social and public health problems. To contend with these dynamic conditions, communication, and collaborative problem-solving (CPS) competencies have taken a principal role in educational policy, research, and technology. Adaptive educational technologies provide a platform to deliver personalized training to improve learners’ CPS skills. However, for these systems to optimally tailor instruction, they must have key insights into learners’ interaction dynamics and team behaviors. We have been exploring these properties by employing Group Communication Analysis (GCA), a computational linguistics methodology for quantifying and characterizing the socio-cognitive processes between learners in online interactions. This talk will focus on recent studies where we have used GCA to gain a deeper understanding of role ecologies, learning and problem-solving, and issues of inclusivity in digitally-mediated group interactions. The scalability of GCA opens the door for future research efforts directed towards improving collaborative competencies and creating more inclusive online interactions.

Marcelo Worsley

ASSISTANT PROFESSOR OF COMPUTER SCIENCE, NORTHWESTERN UNIVERSITY

The Broader Impacts of Multimodal Learning Analytics

Rene Kizilcec

ASSISTANT PROFESSOR, CORNELL INFORMATION SCIENCE

Scaling Up Behavioral Science Interventions in Online Education: Part 1

Justin Reich

ASSISTANT PROFESSOR IN THE COMPARATIVE MEDIA STUDIES/WRITING DEPARTMENT, MIT

Scaling Up Behavioral Science Interventions in Online Education: Part 2

Online education is rapidly expanding in response to rising demand for higher and continuing education, but many online students struggle to achieve their educational goals. Several behavioral science interventions have shown promise for aiding students’ persistence and completion in a handful of courses. In this study, we tested a set of behavioral interventions over two-and-a-half years, with ¼ million students, from nearly every country, across 248 online courses offered by Harvard, MIT, and Stanford. Our iterative scientific process -- cyclically pre-registering new hypotheses in between waves of data collection -- enabled us to identify individual and contextual conditions under which the interventions can benefit students in developing countries in courses with an achievement gap between students in more and less developed countries. Our findings encourage funding agencies and researchers conducting large-scale field trials to reevaluate study guidelines that emphasize static investigations of average treatment effect over dynamic investigations of contextual heterogeneity.

Alyssa Wise

ASSOCIATE PROFESSOR OF LEARNING SCIENCES/EDUCATIONAL TECHNOLOGY, NYU

Discussant

9:30am to 10:10am - Session 3 - Computer Science and AI Approaches to Education

Anna Rafferty

ASSISTANT PROFESSOR AT CARLETON COLLEGE, COMPUTER SCIENCE

Recognizing student strategies and misunderstanding using inverse planning

Online educational technologies provide opportunities for students to engage in complex tasks like games or virtual chemistry labs in which they may make many different choices. Tools from machine learning can be used to make sense of students' choices in these environments, and to use their choices and the way they execute these choices to make fine-grained inferences about their understanding and their strategy. In this talk, I will present an inverse planning framework for reasoning about students’ choices. This framework is based on inverse reinforcement learning and takes a Bayesian approach to combine information about common misunderstandings with the choices of a specific student. I will present behavioral experiments demonstrating the effectiveness of the framework, including applications both in education and psychology, and discuss how this framework might be used in future educational applications.

Nick Haber

ASSISTANT PROFESSOR, GRADUATE SCHOOL OF EDUCATION, STANFORD UNIVERSITY

Learning artificial agents and cognitive models

Chris Piech

ASSISTANT PROFESSOR OF COMPUTER SCIENCE EDUCATION, STANFORD UNIVERSITY

AI teaching assistants that don't need much student data to train

Mark Warschauer and Ying Xu

PROFESSOR OF EDUCATION AND INFORMATICS, U.C. IRVINE

Can conversational agents support children’s learning?

View Ying Xu's profile

With the development of natural language processing technologies, conversational agents can now be integrated into children’s media to support everyday learning through engaging children in meaningful conversation. We have explored such potential through developing and researching two educational applications of conversational agents, including interactive audio storybooks to promote early language and literacy skills and interactive videos to foster scientific knowledge and curiosity. In these two applications, children listen to a story or watch a video while responding to questions asked by a conversational agent and receiving elaborative feedback to their responses. Through studies with preschool-aged children’s interaction with the conversational agents, we have demonstrated the effectiveness of conversational agents in enhancing children’s learning from and engagement with storybook reading or video watching.

Zach Pardos

ASSOCIATE PROFESSOR, GRADUATE SCHOOL OF EDUCATION, UC BERKELEY

Discussant

10:15am to 10:55am - Session 4 - Data Science and Educational Measurement

Andrew Ho

CHARLES WILLIAM ELIOT PROFESSOR OF EDUCATION, HARVARD GRADUATE SCHOOL OF EDUCATION

When does measurement error matter in educational data science?

Ben Domingue

ASSISTANT PROFESSOR, GRADUATE SCHOOL OF EDUCATION, STANFORD UNIVERSITY

Interplay between speed and accuracy: Novel empirical insights based on 1/4 billion item responses

Response time is an intriguing process data element but relatively limited large-scale empirical investigations have examined its implications for respondent behavior. We take advantage of a unique dataset---roughly 1/4 billion item responses from the NWEA MAP assessment---to shed light on two important test-taker behaviors. The first, response acceleration, is a reduction in response time for responses that occur relatively late on the assessment. Further, such reductions are heterogeneous as a function of estimated ability and may have implications for our understanding of ability estimates. The second, heterogeneous processing, suggests that response time has a different relationship with the ultimate response depending on the underlying difficulty of the particular item for an individual. This indicates different processes driving responses that could potentially be modeled. These empirical findings offer potential insight on how response times could be used to improve measurement processes.

Shayan Doroudi

ASSISTANT PROFESSOR SCHOOL OF EDUCATION, UC IRVINE

The Bias-Variance Tradeoff: How Data Science Can Inform Educational Debates

Best paper submission at the Conference on Educational Data Science 2020

In addition to providing a set of techniques to analyze educational data, we claim that data science as a field can provide broader insights to education research. In particular, we show how the bias-variance tradeoff from machine learning can be formally generalized to be applicable to several prominent educational debates, including debates around learning theories and pedagogy. We further show how various data science techniques that have been proposed to navigate the bias-variance tradeoff can yield insights for productively navigating these educational debates.

Zachary Pardos

ASSOCIATE PROFESSOR, GRADUATE SCHOOL OF EDUCATION, UC BERKELEY

Data-assistive course articulation using machine translation

Higher education at scale, such as in the California public post-secondary system, has promoted upward socioeconomic mobility by supporting student transfer from 2-year community colleges to 4-year degree granting universities. Among the barriers to transfer is earning the right credit at a 2-year institution that qualifies for degree credit in a 4-year program. Course articulation is defining how course credit earned outside of an institution maps to credit within the institution, and it is an intractable task when attempting to manually articulate all courses among the colleges and universities in a state. In this talk, I will present a methodology towards making tractable this process of defining and maintaining articulations by leveraging information contained within historic enrollment patterns and course catalog descriptions. Limitations of the approach and its future integration plans will be discussed.

Joanna Gorin

VICE PRESIDENT OF RESEARCH, ETS

Discussant

11:00am to 11:40am - Session 5 - Computational Linguistics Approaches

Lily Fesler

PH.D. CANDIDATE IN ECONOMICS OF EDUCATION, STANFORD UNIVERSITY

Opening the Black Box of College Counseling using Text-as-Data Methods

Honorable mention for best paper submission at the Conference on Educational Data Science 2020

Although many programs remotely disseminate information to students about the college application process, there is little evidence as to how students experience these programs. This paper examines a large-scale remote counseling program in which college counselors initiated interactions with 15,000 high school seniors via text message to support them through the college application process. Given the passive nature of text messaging, not all of the counselors' prompts elicited similar responses from students. I use text-as-data methods to measure which interactions lead to productive engagement between counselors and students, and which do not. I show that interactions about financial aid offers and financial aid applications are much more likely to generate productive engagement than interactions about college list.

Renzhe Yu

PH.D. CANDIDATE, UC IRVINE

Unsupervised Representations Predict Popularity of Peer-Shared Artifacts in Online Learning Environments

Honorable mention for best paper submission at the Conference on Educational Data Science 2020

In online collaborative learning environments, students create content and construct their own knowledge through complex interactions over time. To facilitate effective social learning and inclusive participation in this context, insights are needed into the correspondence between student-contributed artifacts and their subsequent popularity among peers. In this study, we represent student artifacts by their (a) contextual clickstream of interactions (b) textual content, and (c) set of instructor-specified features, and use these representations to predict artifact popularity scores. Through a mixture of predictive analysis and visual exploration, we find that the neural embedding representation, learned from contextual clickstream, has the strongest predictions of popularity, ahead of instructor's knowledge, which includes academic value and creativity ratings. Because this representation can be learnt without human labeling, it opens up potential possibilities for shaping student interactions towards the more inclusive and pedagogically valuable on the fly.

Ha Nguyen

PH.D. CANDIDATE, UC IRVINE

In or Out of Sync: Federal Funding and Research in Early Childhood

Honorable mention for best paper submission at the Conference on Educational Data Science 2020

Researchers have examined the influence of federal investment on productivity in science and higher education research, but not in early childhood. In addition, existing research tends to use funding and citation metrics, as opposed to examining content shifts. This study applies text mining and fixed effect models on 44,337 articles and federal grants abstracts to examine trends of research areas in early childhood and the extent to which topics in grants map onto topics from previous publications and funding amount. First, we find significant changes in trends of research and grants in early childhood over time, with an increasing distribution of topics in education and care (i.e., teacher training, education technology, and parenting) and evaluation. Second, one-way fixed effect models indicate that funding from the previous year significantly predicts the extent to which a topic appears in subsequent grants. However, topic distribution in prior publications does not strongly predict topic distribution in grants. Third, there exists variation in the association between grant and publication topic distribution across disciplines. Understanding the relation between research and government agenda has implications for promoting scientific knowledge production in early childhood, a rapidly expanding policy area that requires complicated funding and administration.

Li Lucy & Dora Demszky

PH.D. CANDIDATE, SCHOOL OF INFORMATION, UC BERKELEY

Content Analysis of Textbooks via Natural Language Processing: Novel Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks

Best paper submission at the Conference on Educational Data Science 2020

Cutting-edge data science techniques can shed new light on fundamental questions in educational research. We highlight the insights that can be gained about the representation of historically marginalized groups by applying natural language processing (NLP) to fifteen of the most widely used U.S. history textbooks in Texas between 2015 and 2017. First, Hispanic/Latinx people are rarely discussed, and the most common famous figures are nearly all white men. Secondly, lexicon-based approaches show that Black people perform actions with lower agency and power than others. Thirdly, topic models and word embeddings reveal that women are described less diversely than men and are associated with domestic roles. We also find that more conservative counties tend to purchase textbooks with less representation of women and Black people. Building on a rich tradition of textbook analysis, we release our toolkit for computational analyses of textbooks to support new research directions.

Sebastian Munoz-Najar Galvez

BLUHM FAMILY ASSISTANT PROFESSOR OF DATA SCIENCE AND EDUCATION

Discussant

11:45am to 12:45pm - Poster session on Discord

Enter Discord: https://discord.com/invite/DARKKNw

AJ Alvero, Quentin Sedlacek, and Klint Kanopka

A Call for Critical and Constructive Data Science in Teacher Education

Daniel Anderson

Using extreme gradient boosting to estimate community effects on school readiness

Kylie Anglin

Using Natural Language Processing Methods to Assess Treatment Fidelity

Cynthia D’Angelo

Better together? Initial findings and implications from combining qualitative coding and computational methods to analyze classroom audiovisual data

Bernard David

Towards a Typology of School Choice: Characterizing Charter and Non-charter Public Schools in Texas

Jacob C. Fisher

Describing inequality and collaboration in grant funding for 1/3 of federal research grant spending

Jaren Haber

Sorting schools: A computational analysis of charter school identities and stratification

Hang Li

Multimodal learning for classroom activity detection

Jose M. Hernandez

Extracting Indicators for Education Research from Administrative Data Using Machine Learning Methods

Oluwaseun Ijiwade

A scientometric review of educational learning analytics research: Trends and visualization

Cary Jim and Omar Aljawfi

Exploratory Cluster Analysis of U.S. Adults Characteristics in PIAAC data

Cary Jim and Omar Aljawfi

Exploratory Cluster Analysis of U.S. Adults Characteristics in PIAAC data

David Lang

Remote Tutoring and Digital Canvases: A text analyses

Ji-Eun Lee and Mimi Recker

The Effects of Instructors’ Use of Online Discussions Strategies on Student Participation and Performance in University Online Introductory Mathematical Courses

Yiwen Lin

Gendered Patterns in Online Collaborative Discourse Over Time

Zitao Liu

Dolphin: A Spoken Language Proficiency Assessment System for Elementary Education

Kelun Lu

The Effect of Equalization Reform on Elite and Disadvantaged Elementary Schools: Evidence from the Text Mining of Social Media

Jin Mao, James Mundie, Allan H. K. Yuen, and Dirk Ifenthaler

Ethical Issues in Using Mobile/Wearable Technologies for Research: Opportunities and Challenges

Long Pei

Educational Vision via Data Science: Insights from Alumni Networks on LinkedIn

Quan Nguyen

Modelling students’ social network structure from spatial-temporal network data

Fernando Rodriguez

Can Learning Analytics Help Us Understand Differences in Behaviors and Achievement Among Diverse Learners? Results from an Online Chemistry Course

Joshua Rosenberg

If you’re happy and you know it, post a tweet? A study of the sentiment of posts to the #NGSSchat hashtag on Twitter

Kathleen Scalise

A Taxonomy of Critical Dimensions in Learning Analytics: Some Key Elements for Interpretation of Data

SuYeong Shin, Heeyun Kim, Nicholas Bowman, and Michael Bastedo

Data-driven Insights in the Consideration of Noncognitive Attributes and Equity-promoting Practice in Selective Undergraduate Admissions

Anjali Singh

Understanding Students’ Behavioral Patterns on Interactive E-books using Doc2vec Embeddings

Ivan Smirnov

Emotional pulse of schools

FayeMarie Vassel

Finding the Leaks in the Pipeline: A Random Forest Approach to Understanding Women’s Persistence in STEM

Nicole Wang

Towards a new measurement of MOOCs

Korah Wiley

Using Foil Analysis to Develop Pedagogically Valuable Analytics

Peter Wulff

Automated classification of preservice science teachers' written reflections

Nursel Yilmaz, Arthur Baroody, and Volkan Sahin

What do eye-tracking data say about the cognitive mechanisms underlying the pattern extension skills of young children?

Renzhe Yu

Predicting College Success: What Data Are Useful and for Whom?

Agenda: Conference on Educational Data Science

September 18, 2020

7:50am to 8:00am - Greeting and Introduction

8:00am to 8:40am - Session 1 - Data Science and Education: Broad Strokes

8:45am to 9:25am - Session 2 - Learning Analytics and Online Learning

9:30am to 10:10am - Session 3 - Computer Science and AI Approaches to Education

10:15am to 10:55am - Session 4 - Data Science and Educational Measurement

**Best paper submission at the Conference on Educational Data Science 2020**

11:00am to 11:40am - Session 5 - Computational Linguistics Approaches

**Honorable mention for best paper submission at the Conference on Educational Data Science 2020**

**Honorable mention for best paper submission at the Conference on Educational Data Science 2020**

**Honorable mention for best paper submission at the Conference on Educational Data Science 2020**

**Best paper submission at the Conference on Educational Data Science 2020**

11:45am to 12:45pm - Poster session on Discord

Best paper submission at the Conference on Educational Data Science 2020

Honorable mention for best paper submission at the Conference on Educational Data Science 2020

Honorable mention for best paper submission at the Conference on Educational Data Science 2020

Honorable mention for best paper submission at the Conference on Educational Data Science 2020

Best paper submission at the Conference on Educational Data Science 2020