Auditory Streaming as an Online Classification Process with Evidence Accumulation

When human subjects hear a sequence of two alternating pure tones, they often perceive it in one of two ways: as one integrated sequence (a single “stream” consisting of the two tones), or as two segregated sequences, one sequence of low tones perceived separately from another sequence of high tones (two “streams”). Perception of this stimulus is thus bistable. Moreover, subjects report on-going switching between the two percepts: unless the frequency separation is large, initial perception tends to be of integration, followed by toggling between integration and segregation phases. The process of stream formation is loosely named “auditory streaming”. Auditory streaming is believed to be a manifestation of human ability to analyze an auditory scene, i.e. to attribute portions of the incoming sound sequence to distinct sound generating entities. Previous studies suggested that the durations of the successive integration and segregation phases are statistically independent. This independence plays an important role in current models of bistability. Contrary to this, we show here, by analyzing a large set of data, that subsequent phase durations are positively correlated. To account together for bistability and positive correlation between subsequent durations, we suggest that streaming is a consequence of an evidence accumulation process. Evidence for segregation is accumulated during the integration phase and vice versa; a switch to the opposite percept occurs stochastically based on this evidence. During a long phase, a large amount of evidence for the opposite percept is accumulated, resulting in a long subsequent phase. In contrast, a short phase is followed by another short phase. We implement these concepts using a probabilistic model that shows both bistability and correlations similar to those observed experimentally.

Publications