The Science of Tones & Pitch Perception

How does the brain transform air pressure fluctuations into the rich experience of musical pitch? A deep dive into the neuroscience, psychoacoustics, and cognitive science behind tone perception.

Tonotopic Organization

The auditory system maintains a systematic spatial representation of frequency throughout its entire pathway, from the cochlea to the auditory cortex. This organizational principle, called tonotopy, means that different frequencies activate different locations in neural tissue - effectively creating a "map" of pitch in the brain.

From Cochlea to Cortex

The cochlea, a spiral-shaped organ in the inner ear, performs the initial frequency analysis. High frequencies cause maximum displacement near the base (closest to the middle ear), while low frequencies resonate near the apex. This spatial separation is preserved as signals travel through the auditory nerve, brainstem nuclei, thalamus, and finally to the auditory cortex.

Formisano et al. (2003) used high-resolution functional MRI to demonstrate that human primary auditory cortex (Heschl's gyrus) contains multiple tonotopic maps arranged in mirror-image fashion. Their research showed that the frequency gradient runs approximately along the medial-lateral axis, with low frequencies represented laterally and high frequencies medially.

Key Research Findings

  • The cochlea separates frequencies with approximately 1/3 octave resolution per millimeter
  • Primary auditory cortex contains 2-3 mirror-image tonotopic maps
  • Frequency representation is logarithmic - each octave occupies roughly equal cortical distance
  • Training and experience can modify tonotopic organization (cortical plasticity)

Clinical Implications

Tonotopic organization underlies the design of cochlear implants, which stimulate different regions of the cochlea to recreate frequency perception. Understanding these maps also helps explain why damage to specific regions of the auditory system produces selective frequency-specific hearing loss.

Pitch Perception

The Missing Fundamental Phenomenon

One of the most fascinating aspects of pitch perception is that we can perceive the pitch of a complex tone even when its fundamental frequency is absent. This "missing fundamental" or "residue pitch" phenomenon reveals that pitch is not simply the detection of the lowest frequency component, but rather a computational process that infers periodicity from harmonic relationships.

de Cheveigne (2005) provided a comprehensive review of pitch perception theories, distinguishing between "place" theories (based on tonotopic activation patterns) and "temporal" theories (based on neural timing). Modern understanding suggests both mechanisms contribute, with temporal coding dominant for frequencies below about 4-5 kHz.

Place Theory

Pitch is extracted from the pattern of which locations along the basilar membrane are activated. Different frequencies activate different places, and the brain reads this "place code" to determine pitch.

Temporal Theory

Pitch is extracted from the timing of neural firing patterns. Neurons phase-lock to the waveform, firing at particular phases of the cycle. The period between spike clusters encodes frequency.

Pattern Recognition

The brain recognizes harmonic patterns and infers the fundamental. When we hear 400, 600, 800 Hz together, we perceive a pitch of 200 Hz (the implied fundamental), even though 200 Hz is absent.

Pitch Discrimination Thresholds

Human frequency discrimination is remarkably acute. Under optimal conditions, trained listeners can detect frequency differences as small as 0.2% (about 3 cents) for pure tones in the 500-2000 Hz range. This corresponds to detecting a 1 Hz difference at 500 Hz.

Frequency Range Typical JND Musical Context
100-500 Hz 1-3 Hz (0.5-1%) Bass register - slightly less acute
500-2000 Hz 1-2 Hz (0.2-0.3%) Speech range - most acute discrimination
2000-8000 Hz 3-10 Hz (0.3-0.5%) Upper register - still quite acute
Above 8000 Hz Progressively worse Pitch perception becomes unreliable

Equal Loudness Contours

Fletcher-Munson Curves and ISO 226:2003

Human hearing sensitivity varies dramatically across the frequency spectrum. The original equal loudness contours were measured by Fletcher and Munson in 1933, establishing that we perceive different frequencies as equally loud only when their physical intensities differ substantially. These curves were refined by Robinson and Dadson (1956) and standardized internationally as ISO 226:2003.

The current ISO standard reflects measurements from multiple laboratories across different countries, providing more accurate contours than earlier versions. Key characteristics include:

  • Peak sensitivity at 3-4 kHz: The ear canal resonates at approximately 2.5-3 kHz, amplifying sounds by 10-15 dB in this region. Combined with middle ear transfer function, peak sensitivity occurs around 3-4 kHz.
  • Steep low-frequency rolloff: At 20 Hz, sounds must be approximately 70 dB more intense than at 1 kHz to be perceived as equally loud. This is why subwoofers require enormous power.
  • Level-dependent shape: The contours flatten at higher listening levels. At 90 phons, frequency sensitivity is nearly flat across the audible range. This explains why music sounds "fuller" when played loud.
  • Practical applications: A-weighting in sound level meters approximates the 40-phon curve. Loudness compensation circuits in audio equipment boost bass and treble at low volumes.

Why This Matters for Tone Generation

  • A 100 Hz tone at 60 dB SPL sounds much quieter than a 1 kHz tone at 60 dB
  • When testing speakers with sweeps, apparent loudness changes dramatically across frequency
  • Headphone frequency response interacts with equal loudness curves
  • Hearing tests must account for frequency-dependent sensitivity

Auditory Scene Analysis

Separating Simultaneous Sounds

Bregman (1994) introduced the framework of Auditory Scene Analysis (ASA) to describe how the auditory system parses complex acoustic environments into distinct "auditory objects" or "streams." When multiple tones sound simultaneously, the brain must determine which frequency components belong together and which come from separate sources.

Several principles govern this perceptual organization:

Harmonicity

Frequency components that form a harmonic series (integer multiples of a fundamental) tend to fuse into a single perceived sound. This is why we hear a single instrument rather than dozens of separate partials.

Common Onset/Offset

Components that start and stop together are grouped as a single sound. Even brief asynchronies of 30-50 ms can cause components to segregate perceptually.

Common Modulation

Frequency components that vibrato or tremolo together are grouped. Natural instruments produce correlated modulations across all partials, binding them into unified percepts.

Spatial Location

Sounds from the same location tend to group together. Binaural cues (interaural time and level differences) help segregate sources in space.

Streaming and the Cocktail Party Effect

When tones alternate rapidly between two frequency regions, perception can flip between hearing one stream (integrated) or two separate streams (segregated). The probability of streaming increases with frequency separation and presentation rate. This relates to the "cocktail party effect" - our ability to follow one voice among many.

Critical Bandwidth

Frequency Resolution of the Auditory System

Zwicker & Fastl (2007) extensively documented the critical bandwidth phenomenon in their comprehensive psychoacoustics textbook. Critical bandwidth refers to the frequency range within which sounds interact strongly - masking each other and combining loudness.

The auditory system can be modeled as a bank of overlapping bandpass filters, each tuned to a different center frequency. The bandwidth of these "auditory filters" varies with frequency:

Center Frequency Critical Bandwidth Bandwidth as % of CF
100 Hz ~100 Hz 100%
500 Hz ~100 Hz 20%
1000 Hz ~160 Hz 16%
2000 Hz ~300 Hz 15%
4000 Hz ~700 Hz 17.5%
10000 Hz ~1800 Hz 18%

Practical Implications

  • Masking: A tone can mask another tone within the same critical band. Noise bands wider than one critical band don't mask any more effectively than narrower bands.
  • Roughness and beating: Two tones within a critical band produce perceived roughness or beating. Beyond the critical band, they're heard as separate smooth tones.
  • MP3 compression: Perceptual audio codecs exploit masking within critical bands to reduce data without audible degradation.
  • Musical consonance: Intervals smaller than a critical band tend to sound dissonant due to roughness from interfering partials.

Waveform & Timbre Perception

How We Distinguish Instruments

Two sounds can have identical pitch and loudness yet sound completely different - a violin versus a trumpet playing the same note. This quality, called timbre or tone color, depends primarily on harmonic content (the amplitudes and phases of overtones) and temporal envelope (how the sound evolves over time).

McAdams & Giordano (2009) reviewed decades of timbre research, identifying key acoustic dimensions that listeners use to distinguish instruments:

Spectral Centroid

The "center of gravity" of the spectrum - higher values sound brighter. A sawtooth wave has a higher spectral centroid than a triangle wave at the same frequency.

Attack Time

How quickly the sound reaches peak amplitude. Percussion has fast attacks; bowed strings have slow attacks. This is a primary timbre cue.

Spectral Flux

How much the spectrum changes over time. Brass instruments have more spectral flux than woodwinds, contributing to their "brassy" quality.

Harmonic vs Inharmonic

Whether partials form a perfect harmonic series. Bells and gongs have inharmonic partials, giving them their distinctive metallic quality.

Waveform Shapes and Their Spectra

The relationship between waveform shape and harmonic content is governed by Fourier analysis:

  • Sine wave: Contains only the fundamental frequency. The purest possible tone - no harmonics, no timbre complexity.
  • Square wave: Contains odd harmonics (1, 3, 5, 7...) at amplitudes of 1/n. Sounds hollow and clarinet-like because even harmonics are missing.
  • Sawtooth wave: Contains all harmonics at amplitudes of 1/n. Bright and buzzy, like brass instruments. Maximum harmonic richness of simple waveforms.
  • Triangle wave: Contains odd harmonics at amplitudes of 1/n^2. Softer than square wave because higher harmonics are weaker. Sounds flute-like.

Phase and Perception: Although waveform shape depends on both amplitude AND phase of harmonics, human hearing is largely insensitive to phase relationships for steady-state tones. A square wave with randomized phases sounds identical to one with aligned phases, despite looking completely different on an oscilloscope. However, phase matters for transients and binaural processing.

Key Research References

  1. Bregman, A. S. (1994). Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press.
  2. de Cheveigne, A. (2005). Pitch perception models. In C. J. Plack, A. J. Oxenham, R. R. Fay, & A. N. Popper (Eds.), Pitch: Neural Coding and Perception (pp. 169-233). Springer.
  3. Fletcher, H., & Munson, W. A. (1933). Loudness, its definition, measurement and calculation. Bell System Technical Journal, 12(4), 377-430.
  4. Formisano, E., Kim, D. S., Di Salle, F., van de Moortele, P. F., Ugurbil, K., & Goebel, R. (2003). Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron, 40(4), 859-869.
  5. ISO 226:2003. Acoustics - Normal equal-loudness-level contours. International Organization for Standardization.
  6. McAdams, S., & Giordano, B. L. (2009). The perception of musical timbre. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford Handbook of Music Psychology (pp. 72-80). Oxford University Press.
  7. Moore, B. C. J. (2012). An Introduction to the Psychology of Hearing (6th ed.). Brill.
  8. Zwicker, E., & Fastl, H. (2007). Psychoacoustics: Facts and Models (3rd ed.). Springer.