Research resources

Below you'll find the official releases of datasets (zipped csv) in Rioplatense Spanish, English and Dutch. Please check release version when trying to replicate published results.

Many hours of work have been put in this project and we gratefully acknowledge all the volunteers who have dedicated their time to contribute. If you find these data useful, please share the link to the study: smallworldofwords.org.
Don't hesitate to get in touch if you have any difficulties importing data, find mistakes or have other suggestions.

Rioplatense Spanish Data

Updated 17 September 2022

Rioplatense is Spanish variant spoken in South America, primarily in Uruguay and Argentine. The following dataset is currently under review and is likely to be subject to minor updates (e.g spellchecks).
Here we release the full raw data of the SWOW-RP project as well as a balanced preprocessed dataset where each cue are judged by exactly 70 participants and responses were normalized and spellchecked. Scripts for preprocessing and evaluation can be found at https://github.com/almadana/SWOW-RP.

Sometimes it's convenient to know how many participants give a specific response to a cue. In this case, you should download the associative strength files (i.e. the conditional probability of a response given a cue).

Cue statistics provides information about which words were known, and how many responses for each cue were missing.
Two files are available. The first contains statistics based on the first response a participant gave (R1), the second file contains all three responses given by participant (R123).
Response statistic includes response counts for tokens and different types.


When using these data, please cite our psyarxiv preprint:
Cabana, Á., Zugarramurdi, C., Lisboa, J. V., & De Deyne, S. (2022, September 19). The "Small World of Words" Free Association Norms for Rioplatense Spanish. https://doi.org/10.31234/osf.io/w6hc8

English Data

Updated 18 October 2018

Word association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues. The data published in Behavior Research Methods were collected between 2011 and 2018. The preprocessed data consist of normalizations of cues and responses by spell-checking them, correcting capitalization and Americanizing. In addition to normalizing cues and responses, the preprocessed file contains data in which each cue is judged by exactly 100 participants (see Github repository for details).

Sometimes it's convenient to know how many participants give a specific response to a cue. In this case, you should download the associative strength files (i.e. the conditional probability of a response given a cue).

Cue statistics provides information about which words were known, and how many responses for each cue were missing. Response statistic includes response counts for tokens and different types.


When using these data, please cite: De Deyne, S., Navarro, D. J., Perfors, A., Brysbaert, M., & Storms, G. (2018). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research Methods. DOI 10.3758/s13428-018-1115-7.

Scripts with a processing pipeline to analyse these data in R can be obtained from the SWOWEN-2018 github repository. Note to R users: use the following command to deal with quotation, otherwise the entire file might not be read in correctly. X= read_delim('strength.SWOW-EN.R123.csv',delim='\t',quote = '',escape_backslash = F,escape_double = F)

Dutch Data

Word association and participant data for 100 primary, secondary and tertiary responses to 12,571 cues as reported in De Deyne, Navarro and Storms (2013).

The most recent reference described the Dutch data for cues and their responses until November 2010. It still represents the most up-to-date reference which can be sited as: De Deyne, S., Navarro, D., Storms, G. (2013). Better explanations of lexical and semantic cognition using networks derived from continued rather than single word associations. Behavior Research Methods, 45 (2), 480-498. Related publications about earlier versions of these data can be found here.

Data in other languages.

Contact [email protected] for work-in-progress files in other languages.

Project interface

Note that the data can be accessed on the project page as well, but these data correspond to a work-in-progress snapshot with limited preprocessing. Furthermore, the purpose of the project interface explore and visualizations options is to make the data accessible to a wide audience, not necessarily an audience of scientists only. Among others, this means that only the strongest responses are shown. We are currently working on an interface to facilitate querying the published datasets online, which will provide more advanced functionality.

License and fair use

The data are licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. They cannot be redistributed or used for commercial purposes.

Creative Commons License

Statistics

The datasets has been downloaded times.