Below you'll find the official releases of datasets (zipped csv) in Dutch and English. Both projects are a work-in-progress and new releases will be added. Many hours of work have been put in this project and many people have dedicated their time to contribute, so if you find any of this useful, please consider sharing the URL of the study. Also, don't hesitate to let us know if you have any difficulties importing data, find mistakes or have other suggestions.

English Data

Word association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues. These data were collected between 2011 and 2018 and are currently submitted for publication. Note that the final version is subject to change. The preprocessed data consist of normalizations of cues and responses by spell-checking them, correcting capitalization and Americanizing. In addition to normalizing cues and responses, this script will also extract a balanced dataset, in which each cue is judged by exactly 100 participant.

Sometimes it's convenient to know how many participants give a specific response to a cue. In this case, you should download the associative strength files (i.e. the conditional probability of a response given a cue).

Cue statistics provides information about which words were known, and how many responses for each cue were missing. Response statistic includes response counts for tokens and different types.

When using these data, please cite: De Deyne, S., Navarro, D., Perfors, A., Brysbaert, M. & Storms, G. (2018). Measuring the associative structure of English: The “Small World of Words” norms for word association. Manuscript submitted for publication.

Scripts with a processing pipeline to analyse these data in R can be obtained from the SWOWEN-2018 github repository.

Dutch Data

Word association and participant data for 100 primary, secondary and tertiary responses to 12,571 cues as reported in De Deyne, Navarro and Storms (2013).

The most recent reference described the Dutch data for cues and their responses until November 2010. It still represents the most up-to-date reference which can be sited as: De Deyne, S., Navarro, D., Storms, G. (2013). Better explanations of lexical and semantic cognition using networks derived from continued rather than single word associations. Behavior Research Methods, 45 (2), 480-498. Related publications about earlier versions of these data can be found here.

Data in other languages.

Contact [email protected] for work-in-progress files in other languages.

Project interface

Note that the data can be accessed on the project page as well, but these data correspond to a work-in-progress snapshot with limited preprocessing. Furthermore, the purpose of the project interface explore and visualizations options is to make the data accessible to a wide audience, not necessarily an audience of scientists only. Among others, this means that only the strongest responses are shown. We are currently working on an interface to facilitate querying the published datasets online, which will provide more advanced functionality.

License and fair use

The data are licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. They cannot be redistributed or used for commercial purposes.

