Below you'll find the official releases of datasets (zipped csv) in Dutch and English. Both projects are a work-in-progress and new releases will be added.
Many hours of work have been put in this project and we gratefully acknowledge all the volunteers who have dedicated their time to contribute.
If you find these data useful, please share the link to the study: smallworldofwords.org.
Don't hesitate to get in touch if you have any difficulties importing data, find mistakes or have other suggestions.
Word association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues.
The data published in Behavior Research Methods
were collected between 2011 and 2018.
The preprocessed data consist of normalizations of cues and responses by spell-checking them, correcting capitalization and Americanizing.
In addition to normalizing cues and responses, the preprocessed file contains data in which each cue is judged by exactly 100 participants (see Github repository for details).
Sometimes it's convenient to know how many participants give a specific response to a cue. In this case, you should download the associative strength files (i.e. the conditional probability of a response given a cue).
Cue statistics provides information about which words were known, and how many responses for each cue were missing. Response statistic includes response counts for tokens and different types.
When using these data, please cite: De Deyne, S., Navarro, D. J., Perfors, A., Brysbaert, M., & Storms, G. (2018). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research Methods. DOI 10.3758/s13428-018-1115-7.
Scripts with a processing pipeline to analyse these data in R can be obtained from the SWOWEN-2018 github repository.
Note to R users: use the following command to deal with quotation, otherwise the entire file might not be read in correctly.
X= read_delim('strength.SWOW-EN.R123.csv',delim='\t',quote = '',escape_backslash = F,escape_double = F)
Word association and participant data for 100 primary, secondary and tertiary responses to 12,571 cues as reported in De Deyne, Navarro and Storms (2013).
The most recent reference described the Dutch data for cues and their responses until November 2010. It still represents the most up-to-date reference which can be sited as: De Deyne, S., Navarro, D., Storms, G. (2013). Better explanations of lexical and semantic cognition using networks derived from continued rather than single word associations. Behavior Research Methods, 45 (2), 480-498. Related publications about earlier versions of these data can be found here.
Contact [email protected] for work-in-progress files in other languages.
Note that the data can be accessed on the project page as well, but these data correspond to a work-in-progress snapshot with limited preprocessing. Furthermore, the purpose of the project interface explore and visualizations options is to make the data accessible to a wide audience, not necessarily an audience of scientists only. Among others, this means that only the strongest responses are shown. We are currently working on an interface to facilitate querying the published datasets online, which will provide more advanced functionality.
The data are licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. They cannot be redistributed or used for commercial purposes.
The datasets has been downloaded times.