Comparative analysis of information retrieval and analysis of open access tools from an educational concept

Authors

  • Armando PLASENCIA-SALGUEIRO
  • Bárbara de los Milagros BALLAGAS-FLORES

Keywords:

Database, Text mining, Searching engines, Information retrieval

Abstract

In the Institute of Cybernetics, Mathematics, and Physics in the Republic of Cuba the course “Databases and digital Library” is a discipline in the Master’s degree program of Applied Cybernetics. An essential part of the course is the creation of documental databases starting from information retrieval from the Internet. To equip the laboratories required for better learning, the most suitable tools for information retrieval are needed, both from an educational point of view as well as the easiness for their acquisition. Therefore, the characteristics to evaluate these tools and the methodology for selecting them were defined. As a result, of the thirteen recovery tools and data analysis from free softwares available to be downloaded, the following eight tools were selected: Lemur Toolkit with Indri, Sphinx, WebSphinx with Rapid Miner, Solr / Lucene / Hadoop / Mahout, Terrier and Dragon, which guaranteed the quality of the course and the connection with other courses in the Master’s degree program. 

Downloads

Download data is not yet available.

References

Bernal, J. Data mining and cross-validation over distributed: Grid enabled networks: Current state of the art. Florida: Atlantic University Spring, 2008. Available from: <http://latinamericangrid.org/elgg/juan.bernal/files/2/13/Project+-+DataMining+-+CrossValidation+in+Grid+Enabled+Networks.ppt>. Cited: Dec. 20, 2013.

Cardoso, Y. et al. Herramientas de minería de datos. 2011. Disponible en: <http://www.monografias.com/trabajos92/herramientas-mineria-datos/herramientas-mineria-datos.shtml>. Acceso en: 13 enero. 2014.

Dopico, I.; Plasencia, A. Diplomado control avanzado: pertinencia y concepción curricular. In: Convención y Feria Internacional Informática, 24., 2011. La Habana. Resumen…La Habana: CLAD, 2011. p.5-6.

Fan, W. et al. Tapping into the power of text mining. Communications of the ACM, v.49, n.9, p.77-82, 2005. Available from: <http://filebox.vt.edu/users/wfan/paper/text_mining_final_preprint.pdf>. Cited: Jan. 13, 2014.

FindTheBest. Compare full text search software. 2014. Available from: <http://full-text-search.findthebest.com/>. Cited: Jan. 13, 2014.

Grobelnik, M. Text-Garden software suite quick overview. Ljubljana, Slovenia: Jozef Stefan Institute. 2007. Available from: <http://www.powershow.com/view1/f5de0-ZDc1Z/TextGarden_Software_Suite_Quick_Overview_powerpoint_ppt_presentation>. Cited: Sep. 17, 2009.

Hatcher, E.; Gospodnetic, O.; McCandless, M. Lucene in action. 2nd ed. 2010. E-book. Stamford: Manning Publications. Available from: . Cited: Jan. 13, 2014.

Jenkin, N. Distributed machine learning with Hadoop. 2009. Disponible en: <http://wenku.it168.com/d_000575816.shtml> Acceso en: 13 enero 2014.

Middleton, C.; Baeza-Yates, R. A comparison of open source search engines. Barcelona: Universitat Pompeu Fabra, 2011. Available from: <http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf>. Cited: Jan. 13, 2014.

Miller, R.; Bharat, K. SPHINX: A framework for creating personal, site-specific web crawlers. In: International World Wide Web Conference, 7., 1998. Brisbane, Australia. Proceedings…Brisbane, Australia: Computer Network and ISDN Systems, v.30, p.119-130, 1998.

Pathrey, R. et al. Discovering knowledge patterns from Integration of clustering and classification techniques. International Journal of Advanced Research in Computer Science and Software Engineering, v.3, n.4, p.338-343, 2013.

Plasencia, A. et al. Concepción de un buscador web soportado en tecnología grid e interrelación de herramientas Apache. La Habana: Instituto de Cibernética Matemática y Física, 2012.

Trotman, A. et al. Towards an efficient and effective search engine. In: International ACM SIGIR Conference on Research on Development in Information Retrieval, 35., 2012, Portland. Proceedings… Portland: University of Otago, 2012, p.40-47.

Turtle, H; Hegde, Y.; Rowe S. Yet another comparison of Lucene and Indri performance. In: International ACM SIGIR Conference on Research on Development in Information Retrieval, 16., 2012, Portland. Proceedings… Portland: University of Otago, 2012, p.64-67.

Zhou, X.; Zhang, X.; Hu, X. The dragon toolkit developer guide. 2007. Philadelphia: Drexel University. Available from: <http://dragon.ischool.drexel.edu/tutorial.pdf>. Cited: Jan. 13, 2014.

Published

2014-11-25

How to Cite

PLASENCIA-SALGUEIRO, A. ., & de los Milagros BALLAGAS-FLORES, B. . (2014). Comparative analysis of information retrieval and analysis of open access tools from an educational concept. Transinformação, 26(3). Retrieved from https://periodicos.puc-campinas.edu.br/transinfo/article/view/6109

Issue

Section

Original