Identification of the Leading Research Domains and Grouping of Articles on the Smart City Using Text Mining
Purpose: The objective of the paper is to use text mining to identify leading research domains concerning the smart city following an analysis of research articles with a high citation index according to the Web of Science. Design/methodology/approach: An original method is proposed for analysing academic texts using the R language, tokenisation, lemmatisation, n-grams and correspondence analysis. The author analysed fifty of the most cited articles indexed in the Web of Science from 2014 to 2019. Findings: The paper presents the advantages and drawbacks of the proposed method of analysing research publications. The assets include automation and repeatability of the analysis of a large number of documents and improved knowledge about links among the articles in terms of research domains. The disadvantage is the loss of information from diagrams and figures. The method identified two leading research domains related to the notion of the smart city, technologies and systems. The analysed publications were categorised by selected keywords. Research limitations/implications: Future work should include further refinement of the assumptions for the method, analyses of a more significant number of research texts and a narrowing down of the domain of the smart city. It is desirable to consider other functional domains of the city, such as energy, public health, environmental protection or transport. Practical implications: The proposed method can complement a standard literature analysis regarding the smart city. The leading research domains related to the smart city in the analysed articles were systems and technologies employed to improve how the city operates. Social implications: Text mining can be employed by various experts focusing on the smart city and constitutes a refreshing complement for other research methods, such as questionnaire surveys, interviews or observations. Originality/value The publication can be useful for researchers from various fields and managers seeking to create and use simple, useful methods and tools for analysing unstructured text documents for decision-making. The paper proposes a separate text mining analysis of abstracts and whole documents using n-grams. This yielded a more precise list of areas relevant to the smart city. The grouping was done using correspondence analysis of the fifty most cited articles indexed in the Web of Science from 2014 to 2019.(original abstract)
- 1. Albino, V., Berardi, U., & Dangelico, R.M. (2015). Smart cities: Definitions, dimensions, performance, and initiatives. Journal of urban technology, 22(1), 3-21.
- 2. Internet source: Wikipedia - Smart city, Available online https://en.wikipedia.org/wiki/ Smart_city, 29.10.2019.
- 3. Deakin, M. (Ed.) (2013). Smart cities: governing, modelling and analysing the transition. Routledge.
- 4. Deakin, M., & Al Waer, H. (2011). From intelligent to smart cities. Intelligent Buildings International, 3(3), 140-152.
- 5. Internet source: Web of Science, Available online https://apps.webofknowledge.com, 28.11.2019.
- 6. Internet source: Google Trends, Available online https://trends.google.com/trends/ explore?date=2008-01-01%202020-01-02&q=%22smart%20city%22, 28.11.2019.
- 7. Internet source: Networkworld article, Available online https://www.networkworld.com/ article/3325397/idc-expect-175-zettabytes-of-data-worldwide-by-2025.html, 28.11.2019.
- 8. Reinsel, D., Gantz, J., & Rydning, J. (2018). The digitisation of the world: from edge to core. Framingham: International Data Corporation. Available online: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage- whitepaper.pdf, 11.12.2019.
- 9. Internet source: Report: Data Never Sleeps, Available online https://www.domo.com/ learn/data-never-sleeps-5, https://www.domo.com/learn/data-never-sleeps-6, 28.11.2019.
- 10. Suhaib Peerzada. What is Text Mining? - The Complete Beginner's Guide (5.07.2018). Available online https://www.digitalvidya.com/blog/what-is-text-mining-guide, 14.11.2019.
- 11. Fan, W., Wallace, L., Rich, S., & Zhang, Z. (2006). Tapping the power of text mining. Communications of the ACM, 49(9), 76-82.
- 12. Vijayarani S., Ilamathi J., Nithya, Phil M., (2015), Preprocessing Techniques for Text Mining - An Overview. International Journal of Computer Science & Communication Networks, Vol 5(1), 7-16.
- 13. Silge, J., Robinson, D. (2017). Text mining with R: A tidy approach. O'Reilly Media, Inc. Available online https://www.tidytextmining.com/index.html, 10.11.2019.
- 14. Internet source: EASE Guidelines for Authors and Translators of Scientific Articles to be Published in English. Available online https://ease.org.uk/publications/author-guidelines- authors-and-translators, 11.12.2019.
- 15. Mogull, S.A. (2017). Scientific and medical communication: a guide for effective practice. Routledge.
- 16. Sollaci, L.B., & Pereira, M.G. (2004). The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey. Journal of the medical library association, 92(3), 364.
- 17. Nakagawa, T., & Uchimoto, K. (2007, June). A hybrid approach to word segmentation and pos tagging. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 217- 220.