Understanding Instructional Support Needs of Emerging Internet Users for Web-based Information Seeking



Published Dec 1, 2010
Naman K. Gupta Carolyn Penstein Rosé


As the wealth of information available on the Web increases, Web-based information seeking becomes a more and more important skill for supporting both formal education and lifelong learning. However, Web-based information access poses hurdles that must be overcome by certain student populations, such as low English competency users, low literacy users, or what we will refer to as emerging Internet users. The challenge springs from the fact that the bulk of information available on the Web is provided in a small number of high profile languages such as English, Korean, and Chinese. These issues continue to be problematic despite research in cross-linguistic information retrieval and machine translation, These technologies are still too brittle for extensive use by these user populations for the purpose of bridging the language gulf. In this paper, we propose a mixed-methods approach to addressing these issues specifically in connection with emerging Internet users, with data mining as a key component. Our target emerging Internet users are rural children who have recently become part of a technical university student population in the Indian state of Andhra Pradesh. As Internet penetration increases in the developing world and at the same time populations shift from rural to urban life, such populations of emerging Internet users will be an important target for design of scaffolding and educational support. In this context, in addition to using the Internet for their own personal information needs, students are expected to be able to receive assignments in English and use the Web to meet the information needs specified in their assignments. Thus, we begin our investigation with a small, qualitative study in which we investigate in detail the problems faced by these students responding to search tasks given to them in English. We first present a qualitative analysis of the result write-up in response to the given information-seeking task along with some observations about the corresponding search behavior. This analysis reveals difficulties posed by the strategies students were observed to employ to compensate for difficulties understanding the search task statement and retrieved materials. Based on these specific observations, we present an extensive controlled study in which we manipulate both characteristics of the search task as well as the manner in which it was presented (i.e., in English only, in the native language of Telugu only, or presented both in English and the native language) in order to understand how a light form of support might impact task success for these information seeking tasks. One important contribution of this work is a dataset from roughly 2,000 users including their pre-search response to the task statement, a log of their click behavior during search, and their post-search write up. A data mining methodology is presented that allows us to understand more broadly the difficulties faced by this student population as well as how the experimental manipulation affects their search behavior. Results suggest that using machine translation for the limited task of translating information seeking task statements, which is more feasible than translating queries or large scale translation of search results, may be beneficial for these users depending on the type of task. The data mining methodology itself, which can be applied as an assessment technique for evaluating search behavior in subsequent research, is a second contribution. Finally, the findings from statistical analysis of the study results and data mining are a third contribution of the work.

How to Cite

Gupta, N. K., & Rosé, C. P. (2010). Understanding Instructional Support Needs of Emerging Internet Users for Web-based Information Seeking. Journal of Educational Data Mining, 2(1), 38–82. https://doi.org/10.5281/zenodo.3554739
Abstract 195 | PDF Downloads 335



personalization, emerging internet users, nonnative English speakers, web-log analysis, information seeking task, search strategies

AGICHTEIN, E., BRILL, E., and DUMAIS, S.T. 2006. Improving web search ranking incorporating user behavior. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 19-26.

AULA, A. 2003. Query Formulation in Web Information Search. In Proceedings of IADIS International Conference WWW/Internet, 2003, P. ISAIAS and N. KARMAKAR, (Eds.) IADIS Press, 403-410.

AULA, A. and KELLAR, M. 2009. Multilingual search strategies. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, 3854-3870.

BACHMANN, M., GOBERT, J., and BECK, J. 2010. Tracking Students’ Paths through Student Transition Analysis, In Proceedings of the 3rd International Conference of Educational Data Mining, 269-270

BALFE, E. and SMYTH, B. 2005. An Analysis of Query Similarity in Collaborative Web Search. In Proceedings of the 27th European Conference on Information Retrieval, 330-344.

BHAVNANI, S.K. 2005. Strategy Hubs: Domain portals to help find comprehensive information. Journal of the American Society for Information Science and Technology 57(1), 4-24.

BIRRU, M., MONACO, V., CHALRLES, L., DREW, H., NJIE, V., BIERRIA, T., DETLEFSEN, E., and STEINMAN, R. 2004. Internet Usage by Low-Literacy Adults Seeking Health Information: An Observational Analysis, Journal of Medical Internet Research 6(3).

BRANDT, S. and UDEN, L. 2003. Insight into mental models of novice Internet searchers , Communications of the ACM 46(7), 133-13.

CHEN, S.F. and GOODMAN, J. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, 310-318.

DOWNEY, D., DUMAIS, S., LIEBLING, D., and HORVITZ, E. 2008. Understanding the relationship between searchers' queries and information goals. In Proceedings of 17th ACM Conference on Information and Knowledge Management, 449-458.

DUGGAN, G.B. and PAYNE, S.J. 2008. Knowledge in the head and on the web: Using topic expertise to aid search. In Proceedings of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, 39-48.

GAO, J., ZHOU, M., NIE, J., HE, H., and CHEN, W. 2002. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 183-190.

GRASSIAN, E. and KAPLOWITZ, J. 2001. Information Literacy Instruction: Theory and Practice, New York: Neal-Schuman Publishers.

GRIMES, C., TANG, D., and RUSSEL, D.M. 2007. Query Logs Alone are not Enough. In Workshop on Query Log Analysis at WWW 2007: International World Wide Web Conference, Banff, Alberta, Canada.

GUINEE, K., EAGLETON, M.B., and HALL, T.E. 2003. Adolescents’ Internet search strategies: Drawing upon familiar cognitive paradigms when accessing electronic information sources. Journal of Educational Computing Research 29, 363 – 374.

HATCH, E.M. 1983. Psycholinguistics: A second language perspective. Newbury House Publishers, Inc., Rowley, MA.

HENRY, L. A. 2005. Information search strategies on the Internet: A critical component of new literacies. Webology 2(1), Article 9.

HOLSCHER, C. and STRUBE, G. 2000. Web search behavior of Internet experts and newbies. Computer Networks: The International Journal of Computer and Telecommunications Networking 33, 337-346.

HOWARD, L., JOHNSON, J., and NEITZEL, C. 2010. Examining Learner Control in a Structured Inquiry Cycle Using Process Mining. In Proceedings of the 3rd International Conference on Educational Data Mining, 71-80.

IVONEN, M. and SONNENWALD, D.H. 1998. From translation to navigation of different discourses: A model of search term selection during the pre-online stage of search process. Journal of the American Society for Information Science 49, 312-326.

INGWERSEN, P. and JARVELIN, K. 2005. The turn: Integration of information seeking and retrieval in context. Dordrecht, The Netherlands: Springer.

JENKINS, C., CORRITORE, C.L., and WIEDENBECK. S. 2003. Patterns of Information Seeking on the Web: A Qualitative Study of Domain Expertise and Web Expertise. Information Technology and Society 1(3), 64-89.

JEONG, H., BISWAS, G., JOHNSON, J., and HOWARD, L. 2010. Analysis of Productive Learning Behaviors in a Structured Inquiry Cycle Using Hidden Markov Models. In Proceedings of the 3rd International Conference on Educational Data Mining. 81-90

KELLAR, M., HAWKEY, K., INKPEN, K.M., and WATTERS, C. 2008. Challenges of Capturing Natural Web-Based User Behaviors. International Journal of HumanComputer Interaction 24, 385 – 409.

KELLY, D., DUMAIS, S., and PEDERSEN, J. 2009. Evaluation challenges and directions for information seeking support systems. IEEE Computer 42, 60-66.

NEELY, T. 2006. Information Literacy Assessment: Standards-Based Tools and Assignments, Chicago: American Library Association.

KELLY, D. and COOL, C. 2002. The effects of topic familiarity on information search behavior. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 74-75.

KIM, K. and ALLEN, B. 2002. Cognitive and task influences on Web searching behavior. Journal of the American Society for Information Science and Technology 53(2), 109-119.

KRAAIJ, W., NIE, J.Y., and SIMARD, M. 2003. Embedding web-based statistical translation models in cross-language information retrieval. Computational Linguistics 29, 381–419.

KULLBACK, S. 1987. The Kulback-Leibler distance. The American Statistician 41, 340- 341.

KULES, B. 2008. Speaking the same language about exploratory information seeking. In Information Seeking Support Systems Workshop, Chapel Hill, NC.

LAVRENKO, V., CHOQUETTE, M., and CROFT, W.B. 2002. Cross-lingual relevance models. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 175–182.

LIMBERG, L. 1999. Experiencing information seeking and learning: A study of the interaction between the two phenomena. Information Research 5(1). 68.

LIN, C. Y. and HOVY, E. H. 2003. Automatic Evaluation of Summaries using N-gram Co-occurrence Statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 71-78.

MARCHIONI, G. 1989. Information seeking strategies of novices using a full-text electronic encyclopedia. Journal of the American Society for Information Science 40(1), 54-66.

MARCHIONI, G. 1995. Information seeking in electronic environments. Cambridge, UK: Cambridge University Press.

MARCHIONI, M. 2006. Exploratory Search: From Finding to Understanding. Communications of the ACM 49, 41-46.

METZLER, D., DUMAIS, S., and MEEK, C. 2007. Similarity measures for short segments of text. In Proceedings of the 29th European Conference on Information Retrieval, 16-27.

MONTALVO, O., BAKER, R. S., SAO PEDRO, A., and GOBERT, J. 2010. Identifying Students’ Inquiry Planning Using Machine Learning. In Proceedings of the 3rd International Conference on Educational Data Mining. 141-150.

NEELY, T. 2006. Information Literacy Assessment: Standards-Based Tools and Assignments. Chicago: American Library Association.

NETZ-TIPP.DE 2002. Distribution of languages on the Internet. http://www.netztipp.de/languages.html.

PONTE, J.M. and CROFT, W.B. 1998. A Language Modeling Approach to Information Retrieval. Research and Development in Information Retrieval, 275–281.

QIU, L. 1993. Analytical searching vs. browsing in hypertext information retrieval systems. Canadian Journal of Information and Library Science 18(4), 1-13.

RICE, R., MCCREADIE, M., and CHANG, S. 2001. Accessing and Browsing Information and Communication. Cambridge, MA: The MIT Press.

SUTCLIFFE A. and ENNIS, M. 1998. Towards a cognitive theory of information retrieval. Interacting with Computers 10, 321-351.

TEEVAN, J., ALVARADO, C., ACKERMANN, M.S., and KARGER, D.R. 2004. The Perfect Search Engine Is Not Enough: A Study of Orienteering Behavior in Directed Search. In Proceedings of the ACM Conference on Human Factors in Computing Systems, 415-422.

VASILYEVA, E., PECHENIZKY, M., TESANOVIC, A., KNUTOV, E., VERWER, S., and DE BRA, P. 2010. Towards EDM Framework for Personalization of Information Services in RPM Systems. In Proceedings of the 3rd International Conference on Educational Data Mining, 331-332.

WHITE, R.W., DUMAIS, S.T., and TEEVAN, J. 2009. Characterizing the influence of Domain Expertise on Web Serach Behavior. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, 132-141.

WITTEN, I. H. and FRANK, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, second edition. San Francisco, CA: Elsevier.

WOOLF, B. P., SHUTE, V. J., VANLEHN, K., BURLESON, W., KING, J., SUTHERS, D., BREDEWEG, B., LUCKIN, R., BAKER, R.S.J.d., and TONKIN, E. 2010. A roadmap for Education Technology. Monograph prepared for the Computing Community Consortium, Washington, DC. Retrieved from http://www.cra.org/ccc/docs/groe/GROE%20Roadmap%20for%20Education%20Technology%20Final%20Report.pdf

ZHAI, C. and LAFFERTY, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 334-342.

ZHANG, X., ANGHELESCU, H.G.B., and YUAN, X. 2005. Domain Knowledge, search behavior and search effectiveness of engineering and science students. Information Research 10, 217.