Keynote Talk from Rakesh Agrawal
Enriching Education Through Data Mining
Education is acknowledged to be the primary vehicle for improving the economic well-being of people [1,6]. Textbooks have a direct bearing on the quality of education imparted to the students as they are the primary conduits for delivering content knowledge . They are also indispensable for fostering teacher learning and constitute a key component of the ongoing professional development of the teachers [5,8]. Many textbooks, particularly from emerging countries, lack clear and adequate coverage of important concepts . In this talk, we present our early explorations into developing a data mining based approach for enhancing the quality of textbooks. We discuss techniques for algorithmically augmenting different sections of a book with links to selective content mined from the Web. For finding authoritative articles, we first identify the set of key concept phrases contained in a section. Using these phrases, we find web (Wikipedia) articles that represent the central concepts presented in the section and augment the section with links to them . We also describe a framework for finding images that are most relevant to a section of the textbook, while respectingglobal relevancy to the entire chapter to which the section belongs. We pose this problem of matching images to sections in a textbook chapter as an optimization problem and present an efficient algorithm for solving it .
We also present a diagnostic tool for identifying those sections of a book that are notwell-written and hence should be candidates for enrichment. We propose a probabilistic decision model for this purpose, which is based on syntactic complexity of the writing and the newly introduced notion of the dispersion of key concepts mentioned in the section. The model is learned using a tune set which is automatically generated in a novel way. This procedure maps sampled text book sections to the closest versions of Wikipedia articles having similar content and uses the maturity of those versions to assign need-for-enrichment labels. The maturity of a version is computed by considering the revision history of the corresponding Wikipedia article and convolving the changes in size with a smoothing filter .
We also provide the results of applying the proposed techniques to a corpus of widely-used, high school textbooks published by the National Council of Educational Research and Training (NCERT), India. We consider books from grades IX--XII, covering four broad subject areas, namely, Sciences, Social Sciences, Commerce, and Mathematics. The preliminary results are encouraging and indicate that developing technological approaches to enhancing the quality of textbooks could be a promising direction for research for our field.
 Knowledge for Development: World Development Report 1998/99. World Bank, 1998.
 R.Agrawal, S.Gollapudi, A.Kannan, and K.Kenthapadi. Enriching textbooks with web images. Working paper, 2011.
 R.Agrawal, S.Gollapudi, A.Kannan, and K.Kenthapadi. Identifying enrichment candidates in textbooks.In WWW, 2011.
 R.Agrawal, S.Gollapudi, K.Kenthapadi, N.Srivastava, and R.Velu. Enriching textbooks through data mining. In First Annual ACM Symposium on Computing for Development (ACM DEV), 2010.
 J.Gillies and J.Quijada. Opportunity to learn: A high impact strategy for improving educational outcomes in developing countries. USAID Educational Quality Improvement Program (EQUIP2), 2008.
 E.A. Hanushek and L.Woessmann. The role of education quality for economic growth. Policy Research Department Working Paper 4122, World Bank, 2007.
 R.Mohammad and R.Kumari. Effective use of textbooks: A neglected aspect of education in Pakistan. Journal of Education for International Development, 3(1), 2007.
 J.Oakes and M.Saunders. Education's most basic tools: Access to textbooks and instructional materials in California's public schools. Teachers College Record, 106(10), 2004.
 M.Stein, C.Stuen, D.Carnine, and R.M. Long. Textbook evaluation and adoption. Reading & Writing Quarterly, 17(1), 2001.