As the wealth of information available on the Web increases, Web-based information seeking becomes a more and more important skill for supporting both formal education and lifelong learning. However, Web-based information access poses hurdles that must be overcome by certain student populations, such as low English competency users, low literacy users, or what we will refer to as emerging Internet users. The challenge springs from the fact that the bulk of information available on the Web is provided in a small number of high profile languages such as English, Korean, and Chinese. These issues continue to be problematic despite research in cross-linguistic information retrieval and machine translation, These technologies are still too brittle for extensive use by these user populations for the purpose of bridging the language gulf. In this paper, we propose a mixed-methods approach to addressing these issues specifically in connection with emerging Internet users, with data mining as a key component. Our target emerging Internet users are rural children who have recently become part of a technical university student population in the Indian state of Andhra Pradesh. As Internet penetration increases in the developing world and at the same time populations shift from rural to urban life, such populations of emerging Internet users will be an important target for design of scaffolding and educational support. In this context, in addition to using the Internet for their own personal information needs, students are expected to be able to receive assignments in English and use the Web to meet the information needs specified in their assignments. Thus, we begin our investigation with a small, qualitative study in which we investigate in detail the problems faced by these students responding to search tasks given to them in English. We first present a qualitative analysis of the result write-up in response to the given information-seeking task along with some observations about the corresponding search behavior. This analysis reveals difficulties posed by the strategies students were observed to employ to compensate for difficulties understanding the search task statement and retrieved materials. Based on these specific observations, we present an extensive controlled study in which we manipulate both characteristics of the search task as well as the manner in which it was presented (i.e., in English only, in the native language of Telugu only, or presented both in English and the native language) in order to understand how a light form of support might impact task success for these information seeking tasks. One important contribution of this work is a dataset from roughly 2,000 users including their pre-search response to the task statement, a log of their click behavior during search, and their post-search write up. A data mining methodology is presented that allows us to understand more broadly the difficulties faced by this student population as well as how the experimental manipulation affects their search behavior. Results suggest that using machine translation for the limited task of translating information seeking task statements, which is more feasible than translating queries or large scale translation of search results, may be beneficial for these users depending on the type of task. The data mining methodology itself, which can be applied as an assessment technique for evaluating search behavior in subsequent research, is a second contribution. Finally, the findings from statistical analysis of the study results and data mining are a third contribution of the work.
How to Cite
personalization, emerging internet users, nonnative English speakers, web-log analysis, information seeking task, search strategies
AULA, A. 2003. Query Formulation in Web Information Search. In Proceedings of IADIS International Conference WWW/Internet, 2003, P. ISAIAS and N. KARMAKAR, (Eds.) IADIS Press, 403-410.
AULA, A. and KELLAR, M. 2009. Multilingual search strategies. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, 3854-3870.
BACHMANN, M., GOBERT, J., and BECK, J. 2010. Tracking Students’ Paths through Student Transition Analysis, In Proceedings of the 3rd International Conference of Educational Data Mining, 269-270
BALFE, E. and SMYTH, B. 2005. An Analysis of Query Similarity in Collaborative Web Search. In Proceedings of the 27th European Conference on Information Retrieval, 330-344.
BHAVNANI, S.K. 2005. Strategy Hubs: Domain portals to help find comprehensive information. Journal of the American Society for Information Science and Technology 57(1), 4-24.
BIRRU, M., MONACO, V., CHALRLES, L., DREW, H., NJIE, V., BIERRIA, T., DETLEFSEN, E., and STEINMAN, R. 2004. Internet Usage by Low-Literacy Adults Seeking Health Information: An Observational Analysis, Journal of Medical Internet Research 6(3).
BRANDT, S. and UDEN, L. 2003. Insight into mental models of novice Internet searchers , Communications of the ACM 46(7), 133-13.
CHEN, S.F. and GOODMAN, J. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, 310-318.
DOWNEY, D., DUMAIS, S., LIEBLING, D., and HORVITZ, E. 2008. Understanding the relationship between searchers' queries and information goals. In Proceedings of 17th ACM Conference on Information and Knowledge Management, 449-458.
DUGGAN, G.B. and PAYNE, S.J. 2008. Knowledge in the head and on the web: Using topic expertise to aid search. In Proceedings of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, 39-48.
GAO, J., ZHOU, M., NIE, J., HE, H., and CHEN, W. 2002. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 183-190.
GRASSIAN, E. and KAPLOWITZ, J. 2001. Information Literacy Instruction: Theory and Practice, New York: Neal-Schuman Publishers.
GRIMES, C., TANG, D., and RUSSEL, D.M. 2007. Query Logs Alone are not Enough. In Workshop on Query Log Analysis at WWW 2007: International World Wide Web Conference, Banff, Alberta, Canada.
GUINEE, K., EAGLETON, M.B., and HALL, T.E. 2003. Adolescents’ Internet search strategies: Drawing upon familiar cognitive paradigms when accessing electronic information sources. Journal of Educational Computing Research 29, 363 – 374.
HATCH, E.M. 1983. Psycholinguistics: A second language perspective. Newbury House Publishers, Inc., Rowley, MA.
HENRY, L. A. 2005. Information search strategies on the Internet: A critical component of new literacies. Webology 2(1), Article 9.
HOLSCHER, C. and STRUBE, G. 2000. Web search behavior of Internet experts and newbies. Computer Networks: The International Journal of Computer and Telecommunications Networking 33, 337-346.
HOWARD, L., JOHNSON, J., and NEITZEL, C. 2010. Examining Learner Control in a Structured Inquiry Cycle Using Process Mining. In Proceedings of the 3rd International Conference on Educational Data Mining, 71-80.
IVONEN, M. and SONNENWALD, D.H. 1998. From translation to navigation of different discourses: A model of search term selection during the pre-online stage of search process. Journal of the American Society for Information Science 49, 312-326.
INGWERSEN, P. and JARVELIN, K. 2005. The turn: Integration of information seeking and retrieval in context. Dordrecht, The Netherlands: Springer.
JENKINS, C., CORRITORE, C.L., and WIEDENBECK. S. 2003. Patterns of Information Seeking on the Web: A Qualitative Study of Domain Expertise and Web Expertise. Information Technology and Society 1(3), 64-89.
JEONG, H., BISWAS, G., JOHNSON, J., and HOWARD, L. 2010. Analysis of Productive Learning Behaviors in a Structured Inquiry Cycle Using Hidden Markov Models. In Proceedings of the 3rd International Conference on Educational Data Mining. 81-90
KELLAR, M., HAWKEY, K., INKPEN, K.M., and WATTERS, C. 2008. Challenges of Capturing Natural Web-Based User Behaviors. International Journal of HumanComputer Interaction 24, 385 – 409.
KELLY, D., DUMAIS, S., and PEDERSEN, J. 2009. Evaluation challenges and directions for information seeking support systems. IEEE Computer 42, 60-66.
NEELY, T. 2006. Information Literacy Assessment: Standards-Based Tools and Assignments, Chicago: American Library Association.
KELLY, D. and COOL, C. 2002. The effects of topic familiarity on information search behavior. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 74-75.
KIM, K. and ALLEN, B. 2002. Cognitive and task influences on Web searching behavior. Journal of the American Society for Information Science and Technology 53(2), 109-119.
KRAAIJ, W., NIE, J.Y., and SIMARD, M. 2003. Embedding web-based statistical translation models in cross-language information retrieval. Computational Linguistics 29, 381–419.
KULLBACK, S. 1987. The Kulback-Leibler distance. The American Statistician 41, 340- 341.
KULES, B. 2008. Speaking the same language about exploratory information seeking. In Information Seeking Support Systems Workshop, Chapel Hill, NC.
LAVRENKO, V., CHOQUETTE, M., and CROFT, W.B. 2002. Cross-lingual relevance models. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 175–182.
LIMBERG, L. 1999. Experiencing information seeking and learning: A study of the interaction between the two phenomena. Information Research 5(1). 68.
LIN, C. Y. and HOVY, E. H. 2003. Automatic Evaluation of Summaries using N-gram Co-occurrence Statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 71-78.
MARCHIONI, G. 1989. Information seeking strategies of novices using a full-text electronic encyclopedia. Journal of the American Society for Information Science 40(1), 54-66.
MARCHIONI, G. 1995. Information seeking in electronic environments. Cambridge, UK: Cambridge University Press.
MARCHIONI, M. 2006. Exploratory Search: From Finding to Understanding. Communications of the ACM 49, 41-46.
METZLER, D., DUMAIS, S., and MEEK, C. 2007. Similarity measures for short segments of text. In Proceedings of the 29th European Conference on Information Retrieval, 16-27.
MONTALVO, O., BAKER, R. S., SAO PEDRO, A., and GOBERT, J. 2010. Identifying Students’ Inquiry Planning Using Machine Learning. In Proceedings of the 3rd International Conference on Educational Data Mining. 141-150.
NEELY, T. 2006. Information Literacy Assessment: Standards-Based Tools and Assignments. Chicago: American Library Association.
NETZ-TIPP.DE 2002. Distribution of languages on the Internet. http://www.netztipp.de/languages.html.
PONTE, J.M. and CROFT, W.B. 1998. A Language Modeling Approach to Information Retrieval. Research and Development in Information Retrieval, 275–281.
QIU, L. 1993. Analytical searching vs. browsing in hypertext information retrieval systems. Canadian Journal of Information and Library Science 18(4), 1-13.
RICE, R., MCCREADIE, M., and CHANG, S. 2001. Accessing and Browsing Information and Communication. Cambridge, MA: The MIT Press.
SUTCLIFFE A. and ENNIS, M. 1998. Towards a cognitive theory of information retrieval. Interacting with Computers 10, 321-351.
TEEVAN, J., ALVARADO, C., ACKERMANN, M.S., and KARGER, D.R. 2004. The Perfect Search Engine Is Not Enough: A Study of Orienteering Behavior in Directed Search. In Proceedings of the ACM Conference on Human Factors in Computing Systems, 415-422.
VASILYEVA, E., PECHENIZKY, M., TESANOVIC, A., KNUTOV, E., VERWER, S., and DE BRA, P. 2010. Towards EDM Framework for Personalization of Information Services in RPM Systems. In Proceedings of the 3rd International Conference on Educational Data Mining, 331-332.
WHITE, R.W., DUMAIS, S.T., and TEEVAN, J. 2009. Characterizing the influence of Domain Expertise on Web Serach Behavior. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, 132-141.
WITTEN, I. H. and FRANK, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, second edition. San Francisco, CA: Elsevier.
WOOLF, B. P., SHUTE, V. J., VANLEHN, K., BURLESON, W., KING, J., SUTHERS, D., BREDEWEG, B., LUCKIN, R., BAKER, R.S.J.d., and TONKIN, E. 2010. A roadmap for Education Technology. Monograph prepared for the Computing Community Consortium, Washington, DC. Retrieved from http://www.cra.org/ccc/docs/groe/GROE%20Roadmap%20for%20Education%20Technology%20Final%20Report.pdf
ZHAI, C. and LAFFERTY, J. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 334-342.
ZHANG, X., ANGHELESCU, H.G.B., and YUAN, X. 2005. Domain Knowledge, search behavior and search effectiveness of engineering and science students. Information Research 10, 217.
Authors who publish with this journal agree to the following terms:
- The Author retains copyright in the Work, where the term “Work” shall include all digital objects that may result in subsequent electronic publication or distribution.
- Upon acceptance of the Work, the author shall grant to the Publisher the right of first publication of the Work.
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons 4.0 License (Attribution-Noncommercial-No Derivatives 4.0 International), or its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- Noncommercial—other users (including Publisher) may not use this Work for commercial purposes;
- No Derivative Works—other users (including Publisher) may not alter, transform, or build upon this Work,with the understanding that any of the above conditions can be waived with permission from the Author and that where the Work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 6 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.