Applying latent Dirichlet allocation for analysis of publications in scientometric databases
DOI:
https://doi.org/10.15276/opu.1.43.2014.32Keywords:
model, latent, semantic, Dirichlet, topic, publicationAbstract
The aim of the work is to determine the most appropriate model for a thematic classification of scientific publications by author with the same sirname. The probabilistic models are analyzed and it is proposed to use the model of latent Dirichlet allocation — the leading one among probabilistic models thanks to numerous generalizations and applications to the analysis of collections of text documents. For comparison the latent semantic analysis model is chosen. The model is used in the project for the extraction of publications from scientometric databases. In this project the usage of topic modeling solves the problem of separation of publications of authors with the same sirname, where titles of publications are selected as collection of documents. The results show that the model of latent Dirichlet allocation yield to the latent semantic analysis with usage of small volume of the contents of documents. Therefore, for small collections of documents of volume it is preferable to use latent semantic analysis, and for large volumes — latent Dirichlet allocation.
Downloads
References
Коляда, А.С. Автоматизация извлечения информации из наукометрических баз даннях / А.С. Коляда, В.Д. Гогунский // Управління розвитком складних систем. - 2013. - Вип. 16. - С. 96 - 99.
Коляда, А.С. Латентно семантический подход для анализа информации из наукометрических баз даннях / А.С. Коляда // Управління розвитком складних систем. - 2014. - Вип. 17. - С. 101 -108.
Воронцов, К.В. Вероятностное тематическое моделирование [Электронный ресурс] / К.В. Воронцов // MachineLearning.ru. - Режим доступа:
http://www.machinelearning.ru/wiki/images/2/22/Voron-2013-ptm.pdf (Дата обращения: 03.03.2014).
Daud, A. Knowledge discovery through directed probabilistic topic models: a survey / A. Daud, J. Li, L. Zhou, F. Muhammad // Frontiers of Computer Science in China. - 2010. - Vol. 4, Iss. 2. - PP. 280 - 301.
Blei, D.M. Latent Dirichlet Allocation / D.M. Blei, A.Y. Ng, M.I. Jordan // Journal of Machine Learn-ing Research. - 2003. -Vol. 3. - PP. 993 - 1022.