Article of the Month, July 2018 (Tishby's lab)

July 19, 2018

Efficient compression in color naming and its evolution

Authors: Noga Zaslavsky, Charles Kemp, Terry Regier and Naftali Tishby

Published in PNAS on July 2018

Link: www.pnas.org/cgi/doi/10.1073/pnas.1800521115



Languages package ideas into words in different ways. For example, English has separate terms for “black,” “blue” and “green,” but other languages have only a single term for these colors. At the same time, there are universal tendencies in word meanings, such that similar or identical meanings often appear in unrelated languages. A major question is how to account for such semantic universals and variation of the lexicon. In this work, we derive a principled information-theoretic account of cross-language semantic variation. Specifically, we argue that languages efficiently compress ideas into words by optimizing the Information Bottleneck (IB) tradeoff between the complexity and accuracy of the lexicon. We test this proposal in the domain of color naming, and show that: (i) color naming systems across languages achieve near-optimal compression; (ii) small changes in a single tradeoff parameter account to a large extent for observed cross-language variation; (iii) efficient IB color naming systems exhibit soft rather than hard category boundaries, and often leave large regions of color space inconsistently named, both of which phenomena are found empirically; and (iv) these IB systems evolve through a sequence of structural phase transitions, in a single process that captures key ideas associated with different accounts of color category evolution. These results suggest that a drive for information-theoretic efficiency may shape color naming systems across languages. This principle is not specific to color, and so it may also apply to cross-language variation in other semantic domains.