Research resources

This page contains the officially released datasets for Slovenian (SWOW-SL), Rioplatense Spanish (SWOW-RP), English (SWOW-EN), Dutch (SWOW-NL), and Mandarin Chinese (SWOW-ZH). Please check the release version when replicating published results, and use the release date as the version identifier together with the dataset acronym for consistency (SWOW-EN, SWOW-NL, SWOW-RP, SWOW-ZH). Do not hesitate to get in touch if you have any questions or suggestions.


Acknowledgements and Fair Use

Many hours of work have been put into this project, and we gratefully acknowledge all the volunteers who have dedicated their time to contribute. If you find these data useful, PLEASE SHARE THE LINK TO THE STUDY. This small gesture makes a huge difference in helping us keep the project going.

https://smallworldofwords.org

Your support is greatly appreciated!


Releases

Raw and processed data. For each of these languages, we release the raw data and a balanced data file where each cue has the same number of associate responses.

Associative strength. Sometimes it's convenient to know how many participants give a specific response to a cue. In this case, you should download the associative frequency files (i.e., the conditional probability of a response given a cue, where the number of responses may differ across cue words). The first file contains statistics based on the first response a participant gave (R1), and the second file contains all three responses (R123).

Cue and response statistics. Cue statistics provides information about which words were known, and how many responses for each cue were missing. Two files are available: one based on the first response a participant gave (R1), and a second file containing all three responses given by participants (R123). Response statistics include response counts for tokens and types.

Slovenian Data (SWOW-SL24 1.0)

Updated: 18 March 2026

The word association norms for Slovenian SWOW-SL 1.0 contain words and their associations collected in the scope of the project "Mali svet besed" or Small World of Words. SWOW-SL 1.0 contains free word associations for 1,000 different cues in Slovenian collected up to November 5, 2024. It includes all 19,898 responses collected online from more than 1,100 native Slovenian speakers, each providing up to 3 associations per given cue. The word association norms - the associative frequency and associative strength - comprise more than 37,000 unique cue-association pairs.

The file SWOW-SL1.0_responses.tsv contains all collected responses, which are provided both in their original form and in two normalized forms (word-lemmatized, normalized). SWOW-SL1.0_participants.tsv contains participant metadata collected in the experiment, such as age and education. The file SWOW-SL1.0_statistics_normalized.tsv provides the aggregated word association norms, i.e., frequency statistics of all cue-association pairs on normalized responses, while SWOW-SL1.0_statistics_raw.tsv is based on raw, unprocessed responses. Additional information about the data and processing is provided in README.txt.

 SWOW-SL1.0.zip [1.3Mb]

Citation: Brglez, Mojca; Vintar, Špela and De Deyne, Simon, 2024, Word association norms for Slovenian SWOW-SL 1.0, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1980.

Mandarin Chinese Data (SWOW-ZH23)

Updated: 18 March 2026

Three-response associations were collected from 2016 to 2023 for 10,000 cue words, each answered by about 76 participants on average. Participants were native speakers in Mandarin Chinese or other Chinese dialects. A dataset before word and participant cleaning is provided, where 85 taboo words in responses were masked, and 19 taboo words in cue words were deleted. Preprocessing ended up including 55 participants for each cue word. A dataset after preprocessing is provided, containing responses for 10,024 cue words contributed by 30,504 participants, for a total of 551,320 trials. Each row of datasets is a single trial, consisting of demographic information from one participant, and his or her three responses for one cue word. Scripts in MATLAB for preprocessing can be found on the SWOW-ZH GitHub page https://github.com/lib314a/SWOWZH and the raw and preprocessed data, cue and response statistics, and centrality measures can be found below:

 SWOW-ZH23 [49Mb]

We also provide relatedness measures for word pairs based on cosine similarity for associative strength distributions, pointwise mutual information weighted distribution (PPMI), random walk vectors (RW) and compressed random walk vectors. Note that files contain pairwise entries on each row and are xz compressed to reduce file size.

 SWOW-ZH23 Relatedness measures (R1) [321Mb]
 SWOW-ZH23 Relatedness measures (R123) [395Mb]

Citation: Li, B., Ding, Z., De Deyne, S., & Cai, Q. (2024). A large-scale database of Mandarin Chinese word associations from the Small World of Words project. Behavior Research, 57, 34 (2025). https://doi.org/10.3758/s13428-024-02513-1.

Rioplatense Spanish Data (SWOW-RP22)

Updated: 17 September 2022

Rioplatense is a Spanish variant spoken in South America, primarily in Uruguay and Argentina. The following dataset is currently under review and is likely to be subject to minor updates (e.g., spellchecks).
Here we release the full raw data of the SWOW-RP project as well as a balanced preprocessed dataset where each cue is judged by exactly 70 participants and responses are normalized and spellchecked. Scripts for preprocessing and evaluation can be found at https://github.com/almadana/SWOW-RP. Raw and processed data, together with cue and response statistics can be found below.

 SWOW-RP22 [79Mb]

Citation: Cabana, A., Zugarramurdi, C., Valle-Lisboa, J.C., & De Deyne, S. (2024). The “Small World of Words” free association norms for Rioplatense Spanish. Behavior Research Methods, 56 (2), 968-985.

English Data (SWOW-EN18)

Updated: 18 October 2018

Word association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues. The data published in Behavior Research Methods were collected between 2011 and 2018. The preprocessed data consist of normalizations of cues and responses by spell-checking them, correcting capitalization and Americanizing. In addition to normalizing cues and responses, the preprocessed file contains data in which each cue is judged by exactly 100 participants (see GitHub repository for details).

Scripts with a processing pipeline to analyse these data in R can be obtained from the SWOWEN-2018 GitHub repository. Note to R users: use the following command to deal with quotation, otherwise the entire file might not be read in correctly. X= read_delim('strength.SWOW-EN.R123.csv',delim='\t',quote = '',escape_backslash = F,escape_double = F) Raw and processed data, together with cue and response statistics can be found below.

 SWOW-EN18 [80Mb]

Citation: De Deyne, S., Navarro, D.J., Perfors, A. et al. (2019). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research, 51, 987–1006. https://doi.org/10.3758/s13428-018-1115-7

Dutch Data (SWOW-NL13)

Dutch word association data (SWOW-NL). Word association and participant data for 100 primary, secondary and tertiary responses to 12,571 cues as reported in De Deyne, Navarro and Storms (2013).

 Associative strength [22Mb]
  Matlab .mat file [10Mb]

Citation: De Deyne, S., Navarro, D.J. & Storms, G. (2013). Better explanations of lexical and semantic cognition using networks derived from continued rather than single-word associations. Behavior Research, 45, 480–498 (2013). https://doi.org/10.3758/s13428-012-0260-7

Data in other languages.

Contact [email protected] to discuss work-in-progress files in other languages.

Current data

Note that current data can be accessed on the project page as well, but there are some caveats. These data correspond to a work-in-progress snapshot with limited preprocessing. The purpose of the project interface, Explore, and visualization options is to make the data accessible to the general public in a concise form, which means only the strongest responses are shown.

License

The data are licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. They cannot be redistributed or used for commercial purposes.

Creative Commons License

Statistics

The datasets have been downloaded times.