On the Horizon: Making the Best Use of Free Text Data With Shareable Text Mining Analyses

Main Article Content

Jill Rowan Deans MacKay


The current sector-wide Enhancement Theme of ‘optimising the use of existing evidence’ encourages the sector to identify what evidence exists, and to explore associated opportunities for best practice. Across the higher education sector, there is a prevalence of free text datasets which are generated through annual surveys and rarely explored across institutions, partly because of the privacy concerns that exist due to the nature of the data. In a recent project exploring secondary analyses of National Student Survey data, the University of Edinburgh also explored text mining approaches to offer fast and repeatable analyses of free text data that can be adopted by other institutions and researchers, without sharing sensitive data. This method has been trialed on institutional level data from the 2016 National Student Survey simultaneously with an in-depth open coding approach to the same data. In this horizons paper, comparisons are drawn between some types of text mining analyses and what can be explored in an open-coding approach, and some recommendations for future use. Alongside this paper is the shareable code for other groups to replicate this approach on their own datasets, to contribute to the optimisation of existing evidence use.

Article Details

On the Horizon


Adams, M. J. D., & Umbach, P. D. (2012). Nonresponse and Online Student Evaluations of Teaching: Understanding the Influence of Salience, Fatigue, and Academic Environments. Research in Higher Education, 53(5), 576–591. http://doi.org/10.1007/s11162-011-9240-5
Baepler, P., & Murdoch, C. J. (2010). International Journal for the Scholarship of Teaching and Learning Academic Analytics and Data Mining in Higher Education Academic Analytics and Data Mining in Higher Education. International Journal for the Scholarship of Teaching and Learning, 4(2), Article 17. http://doi.org/10.20429/ijsotl.2010.040217
Bennett, R., & Kane, S. (2014). Students’ interpretations of the meanings of questionnaire items in the National Student Survey. Quality in Higher Education, 20(2), 129–164. http://doi.org/10.1080/13538322.2014.924786
Bergmanis, T., & Goldwater, S. (2018). Context Sensitive Neural Lemmatization with Lematus. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 1391–1400).
Blair, B., Orr, S., & Yorke, M. (2012). Erm, That Question... I Think I Probably Would’ve Just Put Something in the Middle and Sort of Moved on to the Next One, Because I Think It’s Really Unclear’: How Art and Design Students Understand and Interpret the National Student Survey. Group for Learning in Art and Design.
Buckley, A. (2012). Making it count: Refecting on the National Student Survey in the process of enhancement.
Burgess, A., Senior, C., & Moores, E. (2018). A 10-year case study on the changing determinants of university student satisfaction in the UK. PLoS ONE, 1–15. http://doi.org/10.1371/journal.pone.0192976
Cocksedge, S. T., & Taylor, D. C. M. (2013). The National Student Survey: Is it just a bad DREEM? Medical Teacher, 35(12), e1638–e1643. http://doi.org/10.3109/0142159X.2013.835388
Cornock, M. (2018). General Data Protection Regulation (GDPR) and implications for research. Maturitas, 111, A1–A2. http://doi.org/10.1016/j.maturitas.2018.01.017
Grolemund, G., & Wickham, H. (2017). R for Data Science (1st ed.). O’Reilly.
IPSOS Mori. (2017). The National Student Survey Privacy Statement. Retrieved 8 January 2018, from http://www.thestudentsurvey.com/privacy-statement.php
MacKay, J. R. D. (2018). GitHub: NLPforNSS. Retrieved 23 July 2018, from https://github.com/jillymackay/NLPforNSS
MacKay, J. R. D., Hughes, K., Lent, N., Marzetti, H., & Rhind, S. M. (2018). What do Edinburgh Students Want? A mixed methods analysis of NSS 2016 Free Text Data. In The University of Edinburgh Learning and Teaching Conference: Inspiring Learning (p. 19). Edinburgh.
Mourby, M., Mackey, E., Elliot, M., Gowans, H., Wallace, S. E., Bell, J., … Kaye, J. (2018). Are ‘pseudonymised’ data always personal data? Implications of the GDPR for administrative data research in the UK. Computer Law and Security Review, 34(2), 222–233. http://doi.org/10.1016/j.clsr.2018.01.002
Neary, M. (2016). Teaching excellence framework: A critical response and an alternative future. Journal of Contemporary European Research, 12(3), 690–695.
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425. http://doi.org/10.1126/science.aab2734
NSSE. (2017). Engagment Insights: Survey Findings on the Quality of Undergraduate Education. National Survey of Student Engagement.
QAA. (2017). Evidence for Enhancement: Improving the Student Experience. Enhancement Theme 2017-2020.
QILT. (2018). 2017 Student Experience Survey National Report.
Richardson, J. T. E., Slater, J. B., & Wilson, J. (2007). The National Student Survey: development, findings and implications. Studies in Higher Education, 32(5), 557–580. http://doi.org/10.1080/03075070701573757
Shattock, M. (2018). Better Informing the Market ? The Teaching Excellence Framework in British Higher Education. International Higher Education, 92, 21–22.
Silge, J., & Robinson, D. (2016). tidytext: Text Mining and Analysis Using Tidy Data Principles in R. The Journal of Open Source Software, 1(3), 37. http://doi.org/10.21105/joss.00037
Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10). http://doi.org/10.18637/jss.v059.i10
Yorke, M. (2009). ‘Student experience’ surveys: Some methodological considerations and an empirical investigation. Assessment and Evaluation in Higher Education, 34(6), 721–739. http://doi.org/10.1080/02602930802474219