The Building Blocks of Croatian Mental Grammar: Constraints of Information Structure
Anita Peti-Stantić, Mateusz-Milan Stanojević
The characteristics of words such as their concreteness and imageability have been shown to influence the way in which people use them in a variety of situations. For instance, concreteness – the degree to which a word refers to an entity that can be experienced by the senses (Paivio et al., 1968) – has an effect on how quickly we process words (concrete words are processed quicker). Imageability – how easily and quickly a word evokes a mental image in different modalities (Paivio et al.,1968) – is highly correlated with concreteness, but it seems to be a different category, as seen, for instance, from the difference in affective saturation (Kousta et al., 2011). Both categories, moreover, seem to be related to grammatical characteristics such as word class, as shown by different human ratings, and suggested by different computational predictions (Peti-Stantić et al. 2021). Finally, both categories also have practical, real-world effects, such as their influence on how we learn new vocabulary (concrete words are easier to learn).
With this in mind, the aim of the MEGACRO project (The Building Blocks of Croatian Mental Grammar: Constraints of Information Structure, funded by the Croatian Science Foundation, 2017-2021) was to determine how concreteness and imageability work on the lexical level, and how they constrain grammatical patterns and the information structure of Croatian. To do this, the MEGACRO core team of ten linguists, psycholinguists, psychologists, and computational scholars, along with four external advisors, worked in four stages.
Firstly, we built the Croatian Psycholinguistic Database (CPD; freely available at https://doi.org/10.17234/megahr.2019.hpb), with human ratings by an average of 30 participants each for 6000 Croatian nouns, verbs, adjectives and adverbs. The words were selected from the hrWaC corpus and the Croatian Frequency Dictionary (Moguš et al., 1999) based on their frequency and everyday use as the main criterion. An additional criterion was to include for 1500 words related to content-specific and academic vocabulary used in primary schools (excerpted from textbooks of Croatian, mathematics, history, geography and science used in primary school grades 4, 5 and 6). The database includes values for word length, word class, animacy and corpus frequency, as well as ratings of concreteness, imageability, age of acquisition, and subjective frequency. The values of all word features are comparable to values available in databases for other languages, and the correlations follow the expected patterns found across previous studies and databases (Peti-Stantić et al., 2021).
In the second stage, we used computational modeling to extrapolate the ratings of concreteness and imageability to 100,000 words in the Croatian lexicon for which the ratings were not collected in the CPD. Theoretically, computational modelling offers a glimpse into the significance of distribution for word ratings. Practically, it is a relatively economical way to expand the human rating database, particularly important for an understudied language such as Croatian. Extrapolations were performed by using pretrained fastText word embeddings as explanatory variables and either concreteness or imageability as the response variable. We used the support-vector machine model with a radial basis kernel and evaluated the results by fivefold cross-validation in five iterations. The predicted values of concreteness and imageability exhibit high correlations with the human ratings. The computationally obtained ratings are freely available at https://github.com/megahr/lexicon/blob/master/predictions/hr_c_i.predictions.txt.
Thirdly, we took a two-pronged approach to determine how the psycholinguistic characteristics (potentially) influence linear and global (hierarchical) processing constraints in the mental grammar of Croatian. From the bottom-up perspective we sought to determine how concreteness and imageability interact with word class and contextualization. The results show that concreteness and imageability differ depending on word class, with nouns being rated as highest in concreteness and imageability, followed by verbs, adjectives and adverbs. Preliminary results of contexutalized ratings show lower scores for concreteness than non-contexutalized ratings. Together, this suggests that humans draw more on idealized conceptual knowledge when rating concreteness in isolation, relegating part of their idealized knowledge to the background when forced to do online contextual integration. From the top-down perspective, we looked at how clitic ordering and ellipsis influence sentence processing. Generally speaking, the results suggest that there are preferred constructionalized variants in both clitic ordering and elliptical sentences, determined by information structure. These results are yet to be theoretically integrated with the bottom-up approach to concreteness and imageability, however their practical consequences are clear and explored in the final stage of the project.
In the fourth stage of the project, the results of the first three stages were combined with research into primary school textbook analysis and into predictors of literacy skills on Croatian speaking children. Based on all of these, learning materials have been developed to help improve the productive linguistic competence of primary-school students. We developed a workshop students and to and to sensitize Croatian primary school teachers to the importance of adequate vocabulary use, vocabulary development and constructional knowledge for better reading skills, where we piloted the teaching materials (available at: http://megahr.ffzg.unizg.hr/en/?page_id=749). The teaching materials are based on selecting vocabulary with adequate concreteness ratings to enable better reading comprehension, and on using adequate constructionalized formulae that make processing easier and signal elements of discourse structure. These theoretical and practical ideas were further developed in a book aimed at schoolteachers (Peti-Stantić 2019).
Overall, the MEGACRO project theoretically explores and puts into practice the view whereby language is seen as sociocognitively grounded and constructional in all its guises: our mental lexicon, our knowledge of grammatical structure and its everyday use and processing. Further research includes looking into affective variables, the figurative potential of individual words and their relation to constructions and their processing.
Kousta, Stavroula-Thaleia, Gabriella Vigliocco, David P. Vinson, Mark Andrews, and Elena Del Campo. 2011. “The Representation of Abstract Words: Why Emotion Matters.” Journal of Experimental Psychology: General 140 (1): 14–34. https://doi.org/10.1037/a0021446.
Moguš, Milan, Maja Bratanić, and Marko Tadić. 1999. Hrvatski čestotni rječnik. Zagreb: Školska knjiga.
Paivio, Allan, John C. Yuille, and Stephen A. Madigan. 1968. “Concreteness, Imagery, and Meaningfulness Values for 925 Nouns.” Journal of Experimental Psychology 76 (1, Pt.2): 1–25. https://doi.org/10.1037/h0025327.
Peti-Stantić, Anita. 2019. Čitanjem do (spo)razumijevanja: od čitalačke pismenosti do čitateljske sposobnosti. Zagreb: Naklada Ljevak
Peti-Stantić, Anita, Maja Anđel, Vedrana Gnjidić, Gordana Keresteš, Nikola Ljubešić, Irina Masnikosa, Mirjana Tonković, Jelena Tušek, Jana Willer-Gold, and Mateusz-Milan Stanojević. 2021. “The Croatian Psycholinguistic Database: Estimates for 6000 Nouns, Verbs, Adjectives and Adverbs.” Behavior Research Methods, April. https://doi.org/10.3758/s13428-020-01533-x.
17 total views, 1 views today
This post is also available in: Hrvatski (Croatian)