So I started working on a sketch today that links two ideas, and the combination is surprising.

On the one hand we have O’Regan and Noe’s take on sensorimotor correspondences.  This is actually not far from a lot of Gibsonian work within Ecological Psychology.  The basic idea is that in perceiving, we are skillfully engaging with the world, and that practiced and tuned action gives rise to a corresponding characteristic change in the sensory array.  Gibsonians would say this is the basis of direct perception.  Enaction-heads would say this is skillful coping, or some such.

On the other hand, we have the peculiar issue of sensorimotor synchronization, perhaps best illustrated by a group of people dancing or beating drums together.  In the scientific literature, this has withered to a laboratory situation in which people tap in time to a metronome. (The horror, the horror.)  This is a singularly human achievement, the very odd animal counterexample notwithstanding (yes, Snowball, I’m looking at you and the Gelada baboons).  A fuller account of the basis for sensorimotor synchronization would help us enormously.  It may underpin a burgeoning theory of memes; it speaks to Gibson’s intuition that the nervous system displays resonant properties; it fits with a range of specific situations, from air guitar to stuttering.  All can be described, in some fuzzy essence, with a conceptually simple model in which two processes enter into a coupled form of synergy which looks like resonance within and among coupled systems with many degrees of freedom.

These two themes arise and mingle in understanding the peculiar, skilled, coordinated action that is speech.  Almost invariably, there is a corresponding auditory accompaniment to any speaking act. The two are almost always linked.  Unlike feedback based on communicative effect, that we might receive from others as we speak, this feedback is immediate and tightly coupled to the ongoing action.  My hunch is that this cybernetic link provides stability.  I think it ought to bear mathematical modeling, though I do not have the chaps (yet?).   One might view speech as such from an O’Regan and Noe point of view.

I have the hunch (hypothesis) that normal skilled action is based upon a tuned relationship between the consequences of movement and the urge to move.  Ordinarily, when one moves, one experiences the consequences.  But under the cybernetic picture I am painting here, an exogenous element, the speech or actions of others, plays a role that your own productions normally play.  The feedback signal comes from your joint production or action, and not from you, as an individual, alone.

Now, two experimental strands lend credence to this suggestion.  In one long established experimental tradition, subjects are asked to speak while listening to their own voices with a clearly perceptible time lag.  This is a horrible thing to do to a person, but hey. It’s also kinda fun.  With a delay of about 180 ms., people start to make all kinds of errors.  Well, not all kinds of errors, but some specific and interpretable errors: they prolong segments, producing elongated syllables.  (I need to actually study the detail of exactly what they do a little more closely.)

In the second experimental strand, my own synchronous speech experiments, we see that the sound of the other can play the same stabilizing role.  In speaking synchronously, your speech plays a stabilization role in the production of my own speech, that normally only my own speech plays.  That’s a bit intimate!