What a Synonym is, how Languages Shape their Meaning, and how to make it a Game

Three years ago I became intrigued by what I would later come to find is a fairly common question. Even now, upon googling, I see a plethora of questions on the topic: is there such a thing as a “true synonym”?

Webster’s dictionary defines a synonym as the following, and implies synonyms can share the “same” meaning:

“One of two or more words or expressions of the same language that have the same or nearly the same meaning in some or all senses”

Certainly not all synonyms can be exactly the same, — if that were the case, one would assume by induction that all words should mean the same thing, or taken a bit more loosely, vast clusters of words would be strongly isolated from one another. The converse though, implies something interesting: that if synonyms are “nearly the same” in most cases, to what extent are two entirely disparate words related by meaning, if at all?

A 3d model of the 20,000 most commonly used english words and their synonymic relationships, mapped in 3D using force directed algorithms.

It was this question that led me to map the 20,000 most common english words and their synonymic relationships. The results were staggering and totally antithetical to what I had imagined. Rather than a vast array of isolated clusters, the model showed that every word in the english language is bound together meaning in a surprisingly centralized way. In other words, one can trace a relationship between any one word and any other.

This struck me as delightful — our language is homogenous and malleable in a way I did not think possible. It is romantic almost, if you’ll indulge a bit of sentimentality — no matter how vast the chasm seems between two words, say, “despair” and “hope”, or “ire” and “love”, — there exists at least one path uniting them. This compelled me to create a video game, (despite at the time not knowing the first thing about games), made reality with the help of Richard Dawkins, Stephen Fry, Noam Chomsky, George Steiner, Steven Pinker, David Crystal and Avital Ronnell.

But back to the original question — if there was such a thing as a “true synonym”, we would expect their nodes to overlap fully in the 3D model. The mapping uses software commonly used for astronomy, and a somewhat accurate analogy would be the engine imagines each word (node) as a planet with the same mass. When one word is a synonym of another, we apply a force, (a gravitational force of sorts) pulling the synonym towards its parent node. The words at the center are the most “vague” I suppose would be the best term I can think of, in that they have the most disparate quality of synonyms. The vector length between two nodes is an execellent estimation of lexical similarity, more on this below, and is how the game engine calculates your relative distance between one word and another.

Despite trying a variety of different force directed algorithms, (Yifan Hu, Fruchterman Reingold, etc…) no arrangement yielded any two nodes or words occupying the same space. In fact, even accounting for a fairly sizable amount of error, two nodes never really get appreciably close enough to where they would satisfy the condition of being “true synonyms”.

As a 2D rendering of a 3D model, it may appear that two words are close to another, but could actually be very far apart along the z-axis

With the success of Synonymy, my intention was to expand the game to other languages — even with the ability to travel between languages to win a game, i.e. going from a word in Japanese to a word in German. This of course sounds awesome…. in theory. I began applying the methodology above to mapping other languages only to find that in most cases, the results were similar to how I originally imagined English would look — island clusters of somewhat isolated words.

French, using the same methodology as outlined above for English

In most cases you could get from just about any one word to just about any other, but these paths were very few and specific — increasing the difficulty tenfold for an already very hard game. (I once had a group approach me at E3 saying they all wanted to play Synonymy because they were masochists). Two playable Alpha versions of Synonymy were ultimately made in French and German but never released officially. (I shared them with language professors and even they had difficulty on the easiest setting). If you don’t believe me and are something of a masochist yourself, you can try a web-based flash version of the unreleased builds here.

Japanese, using the same methodology as outlined above for English

All this begs the quesiton: why does English work so well when other languages (including romance languages) don’t? What is it about English that makes its synonymic relationships so much more taut? I’m afraid I can’t say with any certainty, but I would speculate that homonyms have much to do with it. These mappings consider a word distinct only by spelling, (“arm” for example can refer to a weapon or a limb) and the result in the game of synonymy is a kind of ‘worm hole’ where you can jump to other areas of the model very quickly (english has far more homonyms than French, German or Japanese). Another possible explanation is that as English contains considerably more words than most languages, meaning could become diluted. Finally it could be that English has simply adapted to become more lexically ambiguous. It makes sense if you think about it — we’re drawn to puns and wordplay culturally in a way that might affect our language. For those interested, Nguyễn Thị Vân Lam explores this more deeply in a marvelously crafted paper entitled “Lexical Ambiguity in the English Language”.

Synonymy was showcased at GDC in the experimental games workshop, at E3 through IndieCade and was honored as North American Educational Game of the Year at the BIG festival in Brazil. While always non-profit, today I am releasing Synonymy open-source on GitHub along with path data (which has always been publicly available to higher education institutions). Path data (the journey from one synonym to the next) was logged for NLTK purposes to contribute to a metric known as “perceived semantic distance” in Natural Languge Processing.