Research aims and objectives, methodology

RESEARCH AIMS AND OBJECTIVES, METHODOLOGY

The overarching goal of the project The Building Blocks of Croatian Mental Grammar: Constraints of Information Structure is to determine the building blocks of the information structure of Croatian mental grammar and describe the correlations between lexical and grammatical patterns that determine the mental grammar of the information structure of Croatian. On the basis of this, we will establish a set of necessary, universal and language-specific patterns that determine the information structure of Croatian, both on the morphosyntactic and the semantic level. We are predominantly interested in local (linear) and global (hierarchical) processing constraints in the mental grammar of Croatian (Peti-Stantić 2005, Pickering and Ferreira 2008).

More specifically, based on corpus and psycholinguistic research, we will look into the ways in which speakers process morphosyntactic and semantic agreement of concrete and/or abstract lexical items in coordinated constructions and elliptical conditions, as well as the ways in which they process sentence information structure with regard to complex clitic cluster positioning.

The computational objective of the project is to compile a detailed annotated and structured database (which goes beyond a mere list), which will be linked to corpus data and will be a prerequisite for artificial intelligence tasks.

The psycholinguistic analytical objective is to build a repository of constructional frames at the interface of syntax and semantics that will deal with gender agreement, ellipsis and clitic cluster scrambling in Croatian. These frames will be grounded in fine-grained semantic analysis, which will take into account the gradience of the categories of concreteness and imageability, as well as the variability of semantic representation of the three grammatical phenomena. The phenomena in question have been selected because of the gradual complexity of interface relations between syntax and semantics and will serve as the foundation of a network of grammatical patterns that form the mental grammar of Croatian.

The socially-relevant goal of the project is to sensitize a group of Croatian language teachers and their pupils about the importance of lexical and grammatical relations in the mental grammar of Croatian for the development of language competence.

As a result, our approach will have theoretical consequences for modelling speakers’ language comprehension and production as well as for identifying and connecting the global and local building blocks in the mental grammar of Croatian. Practical consequences of the project are related to the improvement of pupils’ productive linguistic competence within the Croatian educational system through curricular reform.

We envisage three possible future research directions that will be the result of the project. The first is a comparison between typological similarities in the selected parameters, firstly only between Slavic languages, and then between other languages of the world. We believe that the linguistic insights resulting from the analysis of our material will eventually facilitate the establishment of hierarchical and linear constraints of the syntactic and semantic argument structure and the constraints of information structure. Secondly, we envisage the development of possible links between the semantic and grammatical results in our project with figurative language, particularly metaphor and metonymy, which may also have psycholinguistic significance, as well as significance for computational modelling of figurative language. Finally, the results of the project will establish a baseline for future applied linguistic research of productive language competence of learners of various age groups.

The proposed project is envisaged as a series of interlinked quantitative and qualitative studies based on corpus analyses, experimental and correlational approaches, all of which are based on the overarching topic of examining the necessary, universal and language-specific phenomena that represent the conditions determining the agreement potential in Croatian on morphosyntactic levels and the deeper, semantic ones.

In targeting the syntax-semantics interface, we combine a theory-driven introspective top-down approach with a corpus-based bottom up approach. This enables us to address two main research points, namely the complexity of systematic relations and the prototypicality and variability of constructions (e.g. Vranić i Tonković 2011; Peti-Stantić 2014; Peti-Stantić 2014a; Klubička i Ljubešić 2015).

When analyzing data, in order to establish relational patterns both in lexical (categories of concreteness and imageability) and grammatical domain (agreement, ellipsis and scrambling) we will conduct quantitative statistical analysis alongside with the qualitative linguistic one. These complementary approaches enable us to analyze and describe the phenomena at hand at different levels of specificity.

Methodologically, in conducting research of the semantic category of concreteness, we will build human analyst intuitions into a computational system in order to produce semi-automated system with carefully calibrated constraints (Agić i Ljubešić 2014).

Pre-experiment/Experiment preparation: All team members will cooperate in order to provide viable design (factorial design, Latin square design, inter-or intra-group design etc.), experimental material (dependent on the task (Elicited production, Self-paced Reading, RSVP, Lexical decision task – target words, filler words and non-words, Gap-Filler Task, Recognition Memory Task, Syntactic Classification Task) and manner/modality of execution – production, written, judgement) and adequate number of participant to support experimental design and provide sufficient statistical power, and if necessary conduct adaption to different age ranges.

The first phase is a corpus study, where the 1.9 billion hrWaC corpus and the hrLeX inflectional lexicon will be used to extract a sample of 3000 high-frequency and low-frequency nouns, verbs and adjectives. The hrWaC corpus is used because its characteristics (size, up-to-date language, non-edited texts) make it a good source for modelling and data extraction. Moreover, hrWaC will be used to extract all possible combinations of verbal and pronoun clitic clusters along with their frequency data. (Ljubešić, Stanojević, doctoral researcher, postdoctoral researcher, Tušek).

In the second phase, these lexical and grammatical items will be edited and divided into manageable sets to be used in psycholinguistic experiments and questionnaires (Peti-Stantić, Stanojević, Anđel, Willer Gold).

In the third phase, the questionnaires and experiments will be conducted on the adult and school populations. The research will include: speaker judgments about abstractness and imageability, production (gender agreement), grammaticality judgments (clitic placement, agreement and ellipsis), comprehension studies and masked priming (ellipsis, clitic placement). Interviews will be conducted with teachers and pupils in order to obtain qualitative data on their understanding of possible difficulties in using abstract and concrete vocabulary and ellipsis, clitic placement, and agreement (Tonković, Keresteš, Peti-Stantić, Stanojević, doctoral researcher, postdoctoral researcher).

In the fourth phase, based on the aforementioned results, we will simulate the obtained classification using an artificial neural network in supervised learning, which will facilitate modelling the parameters of mental grammar (Anđel, Ljubešić).

The fifth phase will consist of theoretical modelling of the obtained results using parallel architecture (all researchers and consultants).

Post-experiment/Data analysis: All team members will, within their particular areas of expertise, work on an interpretation of results with the aim to extract maximal information out of the collected data, create easily searchable and accessible presentation of the results that could be further used as data in follow-up experiments or in project presentations. Based on the results, the team will form data/results-informed new hypothesis that would, with more sophisticated statistical analytics (linear mixed effects models, Bayesian statistics) allow to dig deeper into the collected data and provide more nuanced information.

In what follows we focus on the methodological details connected with the central, third phase, given that it is methodologically the most complex one and relates to working with participants.

Sample

Two groups of participants will take part in the study. One group will consist of healthy native speakers of Croatian (mainly university students). The second part will consist of three subgroups of primary and secondary school pupils (4th grade of primary school, 7th or 8th grade of primary school and 3rd grade of secondary school). These groups have been selected on the basis of existing research which suggests that there is a difference between mental grammars of the same language depending on different social and age groups (Dąbrowska, 2012). These studies particularly emphasize that one segment of the differences, which relates to the reproductive and productive use of complex morphosyntactic and lexical structures, is quantitatively and qualitatively different and contingent on education. This crucial difference manifests itself as the (varied) ability to acquire basic lexical and grammatical patterns of a language as well as complex patterns whose acquisition during the education process is stimulated by a number of related procedures of awareness-raising, primarily related to semantic lexical networks which both widen and deepen the mental lexicon of an individual. Moreover, these groups, particularly the pupil population, have been selected in order to determine the decisive moment in the educational process when it is developmentally and educationally necessary to change the approach and move from a reproductive acquisition of grammatical principles to productive acquisition in context.

Procedures

Quantitative research:

The studies will be conducted using computers, pen-and-paper and online questionnaires, depending on the aim of the particular study and the planned sample.

Experimental studies will include lexical or semantic decision tasks, sentence completion tasks, and classification, recognition, recall and judgment tasks. Participants will answer using devices that can measure reaction times precisely, using a keyboard or orally. Pen-and-paper studies will include tasks to assess the abstractness and imageability of words, using Likert-type scales. Moreover, studies will include tasks assessing reading comprehension.

Qualitative studies will comprise of targeted interviews with teachers and pupils.

Data analysis will be conducted using SPSS or the publicly available statistical software JASP.

The methodological innovativeness of this research is primarily reflected in connecting theoretically-informed qualitative and quantitative psycholinguistically relevant studies which will enable the establishment of a predictive lexical analysis on the level of the corpus, as well as on the level of mental grammar. In this way we will establish a completely new link between lexicon and grammar and enable comprehending the deep relations that are a result of conceptualizing linguistic structure without making a clear-cut distinction between lexicon and grammar, but rather seeing them as dependent on each other. Such research has not thus far been done for Croatian. Because of the specific features and richness of the morphological and syntactic structure of Croatian, our research will both enable looking into assumed universal lexical features (such as abstractness and concreteness) as well as specific grammatical features of Croatian (such as syntactic agreement of coordinated elements differing in gender, discourse semantic agreement with pronoun omission, and the influence of word order shift on the information structure of a sentence). On the one hand, this will lead to new insights which will inform theoretical linguistics, placing Croatian as a relevant point on the linguistic map of the world. On the other hand, it will enable establishing a database of specific lexico-grammatical problems connected with Croatian which will be used as a basis to improve education, primarily the skills of deep reading and reading comprehension.

Feasibility and risk management. Previous experience of project collaborators, both in their individual studies and their work in teams, qualifies all the researchers to conduct such a comprehensive study which includes a sequence of psycholinguistic experiments. The project coordinator in particular has taken part in a number of psycholinguistic and sociolinguistic studies with adults, teachers and primary and secondary school pupils throughout Croatia, most of which she conducted as the leader. This qualifies her to lead the team in conducting such a challenging series of psycholinguistic studies that deal with information structure and the relationship between deep lexical – grammatical relations. One of the possible risks is the need for good organization of all research phases so as to obtain the projected goals. This will be achieved by providing a detailed plain of all the phases and by holding regular meetings (at least six months apart, or more frequent if needed) and by monitoring project tasks and making adjustments if necessary. Moreover, each research segment will be coordinated by a single researcher, and the leader will coordinate work on the entire project. Some, although minimal, risks are related to the need to plan our research in a way to include a representative sample of students and pupils from various parts of Croatia. Given that the project coordinator has organized and conducted sociolinguistic and psycholinguistic research on multiple occasions and taken part in the development of the proposal of the linguistic-communicational document within the framework of the comprehensive curricular reform, the contacts from these activities will be helpful in enlisting a sufficient number of schools interested in participation. In the last eight years, Gordana Keresteš has also been the leader of the Croatian segment of two international scholarly projects dealing with initial reading and has developed a cooperation with several primary schools in Zagreb and the Zagreb County, which is why we believe that we will have no significant problems in enlisting a sufficient number of interested schools to be able to conduct a relevant and controlled sequence of psycholinguistic studies. In order to make this possible, the first and second project year, in addition to the main task of determining the specific relevant issues in the relationship between the lexical and grammatical structure, will be devoted to establishing contacts with participants interested in our research project and obtaining the necessary documents so as to conduct the study according to all relevant ethical principles. We consider that in any research, and particularly a study looking into the native language working with students and pupils, it is necessary to establish a partnership with individual participants as well as representatives of competent institutions, as this is the only way to ensure that participants remain interested in the research itself as well as its results, and the possible improvements that these results may bring to the teaching process.

Although we are aware of the possible risk related to the perception of school principals and/or teachers that this study will be an additional burden on the time planned for teaching, which is limited anyway, we hope that the positive atmosphere created in teacher meetings throughout Croatia that was connected with the presentation of the Integrated curricular reform, as well as an awareness of relatively poor scores by Croatian students on PISA tests and the need to improve the teaching process with regard to the linguistic and communication competence, will all contribute to their willingness to take part in our research project.