Denis Vlašiček, Mirjana Tonković, Francesca Dumančić

Determinants of lexical decision times for nouns from the Croatian Psycholinguistic Database

Denis Vlašiček, Mirjana Tonković, Francesca Dumančić

Department of Psychology, Faculty of Humanities and Social Sciences, University of Zagreb


Large databases containing lexical norms (such as word frequency or word length) and semantic norms (such as imageability or emotional valence) have been developed in many languages and have proven to be a useful research tool. When combined with data from cognitive tasks (such as the lexical decision task or word-naming task), they allow researchers to explore the contribution of various variables to language processing, and may serve as a training and testing ground for novel hypotheses and language models. The Croatian Psycholinguistic Database (CPD, Peti-Stantić et al., 2021) contains norms for the categories of concreteness, imageability, subjective frequency and age-of-acquisition for 6000 Croatian nouns, verbs, adverbs, and adjectives. Objective word frequency and word length are also available. We have used this database as a starting point for a megastudy, and have collected lexical decision data for all nouns present in the Croatian Psycholinguistic database.

The aim of this study was to explore the predictive power of psycholinguistic and lexical variables available in the CPD for lexical decision times of 2613 nouns from the CPD. We have additionally calculated and explored the role of a measure of orthographic similarity – the orthographic Levenshtein distance (Yarkoni et al., 2008) – in explaining the variance of lexical decision times. Orthographic similarity refers to the number of letters that two words have in common. The orthographic Levenshtein distance is a continuous measure of that similarity, that is, of the density of words’ orthographic neighborhoods. Larger values are given to words with sparser neighborhoods, i.e. to words that do not share a lot of letters with other words in a given language. Also, we were interested in determining the amount of variance in lexical decision reaction times that can be explained based on those variables.

Participants completed a lexical decision task in two separate one-hour sessions. They were presented with letter strings that were either nouns from the CPD or pseudowords (generated in Wuggy (Keuleers & Brysbaert, 2010)). Each participant responded to altogether 1000 words, and 1000 pseudowords, divided into ten blocks of 200 trials each. Participants completed five blocks in each session with a short break after every block. Blocks and stimuli within a block were presented randomly. Stimuli were presented in the E-Prime 3.0 software (Psychology Software Tools, Pittsburgh, PA) and responses were collected using a Chronos device. We currently have between 24 and 77 responses for each noun.

We conducted a hierarchical linear regression in three steps, using lexical decision times (in milliseconds) as the criterion. In the first step, we entered word frequency (per million). In the second step, we introduced a measure of orthographic similarity – the orthographic Levenshtein distance. In the final step, we introduced three psycholinguistic variables – age of acquisition, concreteness, and subjective frequency ratings. We did not include word length in the regression model because it is highly correlated with the orthographic Levenshtein distance. Imageability was also excluded, because it is highly correlated with concreteness.

In the first step, word frequency per million (b = -0.30, p < .001)  is identified as a      predictor of lexical decision times. More frequent words have shorter lexical decision times. The first model accounts for 7.22% of the variance in reaction times. In the second step of the hierarchical regression, we added the orthographic Levenshtein distance. The results show that word frequency per million (b = -0.25, p < .001) and orthographic Levenshtein distance (b = 25.68, p < .001) are both predictive for lexical decision reaction times. More frequent words and words with more dense orthographic neighborhoods were identified faster in the lexical decision task. Adding the orthographic Levenshtein distance allowed us to explain an additional 4.83% of variance, raising the percentage of variance explained to 12.05%. This change is statistically significant (F(1, 2610) = 143.43, p < .001).

In the final step, we have added age of acquisition, concreteness, and subjective frequency ratings. Word frequency per million ceased to be a significant predictor (b = -0.002, p = .93); instead, subjective frequency ratings appear to be predictive of lexical decision times (b = -83.89, p < .001). In this final step, orthographic similarity as measured by the orthographic Levenshtein distance (b = 11.08, p < .001) remained predictive.  Age of acquisition (b = 7.2, p < .001) also appears to be a statistically significant predictor of lexical decision reaction times, but concreteness (b = -4.3, p = .113) is not predictive. The final model explains 44.03% of the variance in lexical decision times, which is 31.97% more than the previous model (F(3, 2607) = 496.44, p < .001). Subjectively more frequent words, words with more dense orthographic neighborhoods and words acquired earlier in life had shorter lexical decision times. 

These findings are in line with previous studies (e.g. Brysbaert et al., 2016; Soares et al., 2019), and are, in that sense, not surprising. They show that response times to nouns in the lexical decision task can be explained by lexical and psycholinguistic variables. Among the variables examined, frequency and subjective frequency seem to be the most relevant, which is also in line with many earlier findings.



Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42, 441-458.

Keuleers, E. & Brysbaert, M. (2010). Wuggy: A multilingual pseudoword generator. Behavior Research Methods, 42, 627

Peti-Stantić, A., Anđel, M., Gnjidić, V., Keresteš, G., Ljubešić, N., Masnikosa, I., … & Stanojević, M. M. (2021). The Croatian psycholinguistic database: estimates for 6000 nouns, verbs, adjectives and adverbs. Behavior Research Methods, 1-18.

Soares, A.P., Lages, A., Silva, A., Comesaña, M., Sousa, I., Pinheiro, A.P., & Perea, M. (2019). Psycholinguistic variables in visual word recognition and pronunciation of European Portuguese words: a mega-study approach, Language, Cognition and Neuroscience, 34, 689-719

Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15, 971-979.

 16 total views,  1 views today

This post is also available in: hrHrvatski (Croatian)