In the age of big educational data, researchers have tools to find ever more subtle patterns in data about teaching and learning – and about teachers and students. But big data presents challenges to traditional research methods, both qualitative and quantitative: challenges to our understanding of utility, reliability, validity, replicability, interpretability, and even significance itself. This talk looks at the reasons to – and ways to – address these challenges by keeping the concept of meaning central to the emerging field of educational data science.
Learning analytics is a technology for enabling better decision-making by teachers, students, and other educational stakeholders by providing them with timely and actionable information about learning-in-process on an ongoing basis. To be effective learning analytics must thus not only be technically robust but also designed to support human use. While much has been said about the benefits that can be reaped by applying computational methods to educational big data, the role and importance of human insight in the creation and use of analytics is less clear. In this talk I'll present a variety of different ways that human insight can be incorporated into analytic design and interpretation to offers important and complementary value to that provided by algorithmic processing.
In the current globalized world, innovation in science and technology are vital for economic competitiveness, quality of life, and national security. This trend is accelerating the increasing reliance on virtual teams and their collaborative effort to solve complex environmental, social and public health problems. To contend with these dynamic conditions, communication, and collaborative problem-solving (CPS) competencies have taken a principal role in educational policy, research, and technology. Adaptive educational technologies provide a platform to deliver personalized training to improve learners’ CPS skills. However, for these systems to optimally tailor instruction, they must have key insights into learners’ interaction dynamics and team behaviors. We have been exploring these properties by employing Group Communication Analysis (GCA), a computational linguistics methodology for quantifying and characterizing the socio-cognitive processes between learners in online interactions. This talk will focus on recent studies where we have used GCA to gain a deeper understanding of role ecologies, learning and problem-solving, and issues of inclusivity in digitally-mediated group interactions. The scalability of GCA opens the door for future research efforts directed towards improving collaborative competencies and creating more inclusive online interactions.
Online education is rapidly expanding in response to rising demand for higher and continuing education, but many online students struggle to achieve their educational goals. Several behavioral science interventions have shown promise for aiding students’ persistence and completion in a handful of courses. In this study, we tested a set of behavioral interventions over two-and-a-half years, with ¼ million students, from nearly every country, across 248 online courses offered by Harvard, MIT, and Stanford. Our iterative scientific process -- cyclically pre-registering new hypotheses in between waves of data collection -- enabled us to identify individual and contextual conditions under which the interventions can benefit students in developing countries in courses with an achievement gap between students in more and less developed countries. Our findings encourage funding agencies and researchers conducting large-scale field trials to reevaluate study guidelines that emphasize static investigations of average treatment effect over dynamic investigations of contextual heterogeneity.
Online educational technologies provide opportunities for students to engage in complex tasks like games or virtual chemistry labs in which they may make many different choices. Tools from machine learning can be used to make sense of students' choices in these environments, and to use their choices and the way they execute these choices to make fine-grained inferences about their understanding and their strategy. In this talk, I will present an inverse planning framework for reasoning about students’ choices. This framework is based on inverse reinforcement learning and takes a Bayesian approach to combine information about common misunderstandings with the choices of a specific student. I will present behavioral experiments demonstrating the effectiveness of the framework, including applications both in education and psychology, and discuss how this framework might be used in future educational applications.
With the development of natural language processing technologies, conversational agents can now be integrated into children’s media to support everyday learning through engaging children in meaningful conversation. We have explored such potential through developing and researching two educational applications of conversational agents, including interactive audio storybooks to promote early language and literacy skills and interactive videos to foster scientific knowledge and curiosity. In these two applications, children listen to a story or watch a video while responding to questions asked by a conversational agent and receiving elaborative feedback to their responses. Through studies with preschool-aged children’s interaction with the conversational agents, we have demonstrated the effectiveness of conversational agents in enhancing children’s learning from and engagement with storybook reading or video watching.
Response time is an intriguing process data element but relatively limited large-scale empirical investigations have examined its implications for respondent behavior. We take advantage of a unique dataset---roughly 1/4 billion item responses from the NWEA MAP assessment---to shed light on two important test-taker behaviors. The first, response acceleration, is a reduction in response time for responses that occur relatively late on the assessment. Further, such reductions are heterogeneous as a function of estimated ability and may have implications for our understanding of ability estimates. The second, heterogeneous processing, suggests that response time has a different relationship with the ultimate response depending on the underlying difficulty of the particular item for an individual. This indicates different processes driving responses that could potentially be modeled. These empirical findings offer potential insight on how response times could be used to improve measurement processes.
In addition to providing a set of techniques to analyze educational data, we claim that data science as a field can provide broader insights to education research. In particular, we show how the bias-variance tradeoff from machine learning can be formally generalized to be applicable to several prominent educational debates, including debates around learning theories and pedagogy. We further show how various data science techniques that have been proposed to navigate the bias-variance tradeoff can yield insights for productively navigating these educational debates.
Higher education at scale, such as in the California public post-secondary system, has promoted upward socioeconomic mobility by supporting student transfer from 2-year community colleges to 4-year degree granting universities. Among the barriers to transfer is earning the right credit at a 2-year institution that qualifies for degree credit in a 4-year program. Course articulation is defining how course credit earned outside of an institution maps to credit within the institution, and it is an intractable task when attempting to manually articulate all courses among the colleges and universities in a state. In this talk, I will present a methodology towards making tractable this process of defining and maintaining articulations by leveraging information contained within historic enrollment patterns and course catalog descriptions. Limitations of the approach and its future integration plans will be discussed.
Although many programs remotely disseminate information to students about the college application process, there is little evidence as to how students experience these programs. This paper examines a large-scale remote counseling program in which college counselors initiated interactions with 15,000 high school seniors via text message to support them through the college application process. Given the passive nature of text messaging, not all of the counselors' prompts elicited similar responses from students. I use text-as-data methods to measure which interactions lead to productive engagement between counselors and students, and which do not. I show that interactions about financial aid offers and financial aid applications are much more likely to generate productive engagement than interactions about college list.
In online collaborative learning environments, students create content and construct their own knowledge through complex interactions over time. To facilitate effective social learning and inclusive participation in this context, insights are needed into the correspondence between student-contributed artifacts and their subsequent popularity among peers. In this study, we represent student artifacts by their (a) contextual clickstream of interactions (b) textual content, and (c) set of instructor-specified features, and use these representations to predict artifact popularity scores. Through a mixture of predictive analysis and visual exploration, we find that the neural embedding representation, learned from contextual clickstream, has the strongest predictions of popularity, ahead of instructor's knowledge, which includes academic value and creativity ratings. Because this representation can be learnt without human labeling, it opens up potential possibilities for shaping student interactions towards the more inclusive and pedagogically valuable on the fly.
Researchers have examined the influence of federal investment on productivity in science and higher education research, but not in early childhood. In addition, existing research tends to use funding and citation metrics, as opposed to examining content shifts. This study applies text mining and fixed effect models on 44,337 articles and federal grants abstracts to examine trends of research areas in early childhood and the extent to which topics in grants map onto topics from previous publications and funding amount. First, we find significant changes in trends of research and grants in early childhood over time, with an increasing distribution of topics in education and care (i.e., teacher training, education technology, and parenting) and evaluation. Second, one-way fixed effect models indicate that funding from the previous year significantly predicts the extent to which a topic appears in subsequent grants. However, topic distribution in prior publications does not strongly predict topic distribution in grants. Third, there exists variation in the association between grant and publication topic distribution across disciplines. Understanding the relation between research and government agenda has implications for promoting scientific knowledge production in early childhood, a rapidly expanding policy area that requires complicated funding and administration.
Cutting-edge data science techniques can shed new light on fundamental questions in educational research. We highlight the insights that can be gained about the representation of historically marginalized groups by applying natural language processing (NLP) to fifteen of the most widely used U.S. history textbooks in Texas between 2015 and 2017. First, Hispanic/Latinx people are rarely discussed, and the most common famous figures are nearly all white men. Secondly, lexicon-based approaches show that Black people perform actions with lower agency and power than others. Thirdly, topic models and word embeddings reveal that women are described less diversely than men and are associated with domestic roles. We also find that more conservative counties tend to purchase textbooks with less representation of women and Black people. Building on a rich tradition of textbook analysis, we release our toolkit for computational analyses of textbooks to support new research directions.