Nikola Ljubešić, Anita Peti-Stantić

Moving from static to dynamic computational word representations: predicting word concreteness in context

Large pre-trained neural language models have transformed the area of natural language processing, significantly improving the state-of-the-art on many end tasks such as summarisation, translation and question answering. Besides these application-oriented improvements, large pre-trained models have the ability to obtain numerical representations of words in context, which has opened up a large new venue for language analysis. In this preliminary study we investigate to what extent we can use the MEGACRO lexical resource, where we have context-independent assessments of a word’s concreteness, to learn to predict the concreteness of a word in context. We show that such contextual predictions are very much possible, and perform first analyses on the interaction of the variability of concreteness of specific lexemes in context, and their other features, such as the context-independent human concreteness ratings, variance of the ratings, the lexeme’s frequency and animacy. With this first analysis we only touch the tip of the iceberg of analyses that large corpora, annotated for in-context word concreteness, are enabling.

508 total views, 1 views today

This post is also available in: Hrvatski (Croatian)