A growing selection of terms encountered in acoustics,
psychoacoustics, vision, audio/video, entertainment technologies
and related fields.
A B C D
E F G H I J K L M N O P Q R S T U V W X Y Z
Generally, the sensitivity of human hearing is restricted to the frequency
range of 20 Hz to 20,000 Hz, with greatest sensitivity centered in the
500 to 8,000 Hz frequency range. Above and below this range, the ear becomes
progressively less sensitive. To account for this feature of human hearing,
sound level meters apply filtering of acoustic signals according to frequency.
This filtering is called A-weighting. Sound pressure level values obtained
using this weighting are referred to as A-weighted sound pressure levels
and are signified by the identifier dBA.
The exact pitch value of a musical note (for example, middle C) as opposed
to its position relative to other pitches. 
The point in space of the origin of sound. For a sound emitting
transducer (e.g., a loudspeaker), the point from which the spherical
waves appear to diverge as observed at remote points. (See also acoustic
The complete set of all objects and their respective physical properties
having an influence on the sound field that surrounds a listener. The
acoustic environment is a major determinant of perceived sound quality
because most of the sound emitted by a source (e.g., a loudspeaker)
typically arrives at the listener through a multiplicity of paths. (A
single bounce off an object is termed a "first-order" reflection,
two bounces a "second-order reflection," and so on.) Each time
a sound reflects off an object, the object's material properties affect
how much each frequency component of the sound wave is absorbed and how
much is reflected back into the environment. Sounds can also pass through
objects, including such "substantial" objects as walls, ceilings,
floors and windows. An object's material properties and its geometry—its
corners, edges, openings, shape, size, etc.—often
influence sound in ways more complex than just reflection, including diffraction,
refraction and diffusion.
The point in time at which the signal originates. (See also acoustic
The measured percentage of Articulation Loss of
Consonants by a listener. %ALCONS of 0 indicates
perfect clarity and intelligibility with no loss of consonant understanding,
while 10% and beyond is growing toward poor intelligibility, and 15% typically
representing the maximum loss acceptable. %ALCONS
can be measured by acoustic analyzers such as TEF.
In room acoustics, early reflections and reverberation. The audible sense
of a room or environment surround a sound source. 
A parameter of sound related to the extent of oscillation of a vibrating
body, of sound pressure, or of an analog voltage. 
The function describing how the maximum amplitude of a sound waveform
evolves over time. The amplitude envelope is often characterized as consisting
of four parts: The attack portion (i.e., the part during which
the amplitude is rapidly increasing); the decay portion (i.e.,
the "backside" of the attack, during which the amplitude is
rapidly diminishing); the sustain portion (i.e., the part during
which the amplitude is relatively stable); and the release portion
(i.e., the final part during which the amplitude diminishes into silence).
A change in amplitude according to a periodic or aperiodic function. If
the modulation is done periodically, its effects on the carrier tone can
be described in two equivalent ways. The first is by simply describing
the result as a repeating change in the amplitude of the carrier. The
second is to describe it as a mixture of a fixed intensity carrier with
a number of additional fixed intensity tones, called "side bands."
A general term referring to the impairment of musical abilities due to
damage to one or both cerebral hemispheres. 
The ability of a listener to perceptually isolate individual elements
of a complex sound or sequence, such as frequency components in a complex
sound or individual events in rapid sequences. In synthetic
listening the tendency is to perceive sound complexes or temporal
sequences in a global fashion. 
Literally, without echo. An anechoic chamber is a low-noise, highly absorptive
environment, often used in acoustical testing, that allows the direct
sound of the device under test (e.g., a loudspeaker) to be measured without
contamination from reflections off the chamber's walls, floor or ceiling.
A general term referring to the impairment of language abilities following
damage to the left hemisphere of right-handed people. 
If two lamps at two different locations in space are flashed in close
succession, the viewer obtains an impression of motion between them. 
Apparent source width (ASW).
Discovered and developed by A. H. Marshall, ASW is a subjective parameter
of spaciousness in concert halls, and is related to the level, at the
listenerís ears, of lateral reflections in the first 50 to 80 milliseconds
after the arrival of the direct sound. Increasing the ratio of this reflected
energy to the direct sound increases the sense of spaciousness. Narrow,
rectangular, ìshoebox-shapedî halls like the famous Musikvereinsaal
in Vienna and Symphony Hall in Boston tend to foster strong, early-arriving
reflections from the side walls, subjectively broadening the sound source
and imparting body and fullness to the music. 
(From the Italian appoggiare meaning to learn.) A short-duration
tone that is a neighboring note (a semitone or whole tone higher or lower)
of the principal note which it precedes. 
Articulation loss of consonants.
A measure of speech intelligibility. The percentage of consonants heard
incorrectly, strongly influenced by noise or excessive reverberation.
(See also %ALCONS.)
A type of very soft noise appearing in speech sounds. It occurs in the
phoneme "h" in English, or with less duration after the release
of an unvoiced consonant, for example, after the "p" in "pie."
(See tonal system.)
The lessening of sound signal level due to divergence, absorption, reflection,
refraction, diffraction, etc., typically expressed in decibels. 
A general term referring to impairments in recognizing auditory objects,
events, and sequences that usually follow damage to both temporal lobes.
The sensation of periodic fluctuation that results when two simultaneous
components are very close to one another in frequency. Listeners hear
the fluctuation pattern as consisting of beats when their auditory system
lacks enough frequency resolution to distinguish the component frequencies.
The perceived image of the acoustic environment; the way the acoustic
environment is perceived. (See also virtual
auditory environment.) 
What is auditorially perceived, in contrast to a sound event, which is
a physical phenomenon of vibrations and waves in air or other elastic
medium. The relationship between sound events and auditory events is the
subject of psychoacoustics. 
A mental description of a physical (or virtual) sound source and its behavior
through time. Auditory stream segregation refers to the process of perceptual
organization of sound that accomplishes the construction of this description.
(See Eustachian tube.)
The technique of using computer-based mathematical models of an acoustic
environment and 3-D sound processing methods to make audible the sound
field of a source in the modeled space. Somewhat analogous to building
and viewing a scale model of a contemplated building, auralization enables
an acoustician or sound designer to build a computer model of a listening
space and then "play" the room's sound through headphones. 
See also article, "Virtual Acoustic
Backward recognition masking (also called
The reduction in the ability to recognize a sound pattern due to the subsequent
presentation of another sound pattern with similar information content.
This kind of masking is thought to result from a process different from
that or normal (sensory) masking. 
The organ of hearing. More specifically, a membrane that runs the length
of the cochlea which is a bony, fluid-filled
spiral in the inner ear. The basilar membrane
performs a kind of frequency analysis of the incoming acoustic signal:
different locations along the membrane vibrate preferentially in response
to different frequencies. The hair cells connected to each part of the
membrane thus preferentially send neural information about the presence
of those frequencies to the brain. The spatial pattern of activity along
the basilar membrane thus encodes the frequency content of the signal.
(See auditory beats.)
In the home entertainment context, pertaining to presentations involving
the visual and auditory sensory modalities. 
Pertaining to two ears. A presentation of sound is binaural when both
ears are presented with the sound. Binaural sound also refers to a specific
sound playback technology, used mainly in headphones-based research and
virtual reality applications, in which an individual's HRTFs are determined
and synthesized to enable 3-D auditory experiences that are indistinguishable
from reality, or nearly so. 
BR (bass ratio).
In concert hall acoustics, the ratio of the average reverberation
times at 125 and 250 Hz to the average
of the RT's at 500 and 1000 Hz. It is determined only for a hall when
fully occupied. 
In concert hall acoustics, a bright, clear, ringing sound, rich in harmonics,
is called ìbrilliant.î In a brilliant sound the treble frequencies
are prominent and decay slowly. This means that the high frequencies are
diminished only by the natural absorption of the sound in the air itself.
C80(3) or clarity factor.
In concert hall acoustics, the ratio, expressed in decibels,
of the energy in the first 80 milliseconds of an impulse sound arriving
at a listener's position divided by the energy in the sound after 80 milliseconds.
The divisor is approximately the total energy of the reverberant sound.
The symbol (3) indicates the average of the C80
values in the 500, 1000 and 2000 Hz bands. . More generally, clarity
refers to the degree to which the separate strands in a musical performance
perceptually stand apart from one another; see also definition.
In musical research, a unit of pitch change equal to 0.01 semitones. 
The simultaneous sounding of a group of notes, usually three or more.
In Western music, chords of three notes consisting of the first, third
and fifth degrees of a scale are called triads. Major triads consist of
intervals of a major third (four semitones) and perfect fifth (seven semitones)
with respect to a reference pitch (the root). The third is minor (three
semitones) in a minor triad. The third is major and the fifth is augmented
(eight semitones) in an augmented triad. The third is minor and the fifth
is diminished (six semitones) in a diminished triad. When the notes of
a chord are played in ascending or descending succession, the melodic
figure is called an arpeggio. 
Going around the perimeter of the pinna; said
of certain headphones. 
The snail-shaped cavity, approximately 1-1/4 inches long, 3/8 inches wide
and 2 inches high, in the temporal bone that contains the basilar
membrane which is the organ of hearing. 
Cocktail party effect.
A form of auditory stream segregation by which
a listener's ability to localize sound sources (see localization)
can increase intelligibility. So called because at a cocktail party, a
listener can focus on and understand a conversation while dozens or even
hundreds of other conversations occur all around. If a conventional 2-channel
high-resolution recording were made and subsequently played back, the
listener would not be able to understand individual conversations because
they have been spatially blended into the two speakers. 
A sequence of evenly spaced peaks or dips in the frequency response when
viewed on linear scale caused by two or more identical signals which combine
at near equal amplitudes but at slightly different time intervals. So
called because the frequency response plot resembles the teeth of a comb.
A problem for virtual reality designers (who by definition must add interactivity
to immersiveness), because when a person can probe environment, the
VR designer must provide for nearly infinite possible simulations. 
A tone composed of two or more pure tones.  (See also spectrum.)
In acoustics, the portion of a sound wave in which air molecules are pushed
together, forming a region with higher-than-normal atmospheric pressure.
The opposite of rarefaction. In audio signal
processing, the reduction in dynamic range caused by a compressor. 
(See neural net.)
A range of frequencies surrounding the frequency of a designated pure
tone. When other pure tones whose frequencies are within this range are
played at the same time as the designated tone, the auditory system does
not hear the two completely independently. The designated tone may be
masked (see masking), beats may be heard,
or other forms of interaction may occur. The size of the critical band
increases for higher frequency tones, ranging from about 100 Hz for low-frequency
tones to above 2 kHz for very high ones. 
The distance from a sound source at which direct sound and reverberant
sound are at the same level. 
Abbreviation of decibel.
(See amplitude envelope.)
A unit of the intensity of sound. The decibel (abbreviated dB) is a relational
measure, expressing the relative intensity of the described sound to a
reference sound. The decibel is a logarithmic measure, specifically 10
times the logarithm of the ratio of two voltages, currents or sound pressures.
A difference of 20 dB between two sounds means that the more intense one
has 10 times the amplitude (100 times the power) of the softer. A single
decibel is commonly thought to be the smallest change in sound pressure
level that the trained human ear can detect. 
A trademark term of Keith Yates Design Group referring to highly immersive
entertainment characterized by the depth and multiplicity of sensory modalities
presented to the audience. 
In concert hall acoustics, definition, like clarity, refers to the degree
to which individual strands in a musical presentation can be differentiated
from each other. There are two kinds of definition: horizontal, which
applies to tones played in succession; and vertical, in which tones are
played simultaneously. Horizontal definition refers to the degree
to which sounds that follow one another stand apart. Composers can specify
certain musical factors that determine the horizontal definition, such
as tempo, repetition of tones in a phrase, and the relative loudness of
successive notes. Performers can vary the horizontal definition by the
manner they choose to phrase a passage. Acoustical factors that affect
horizontal definition are the length of the reverberation
and the ratio of the loudness of the early sound to that of the reverberant
sound--the same two factors that determine fullness of tone, but in inverse
relation. Vertical definition refers to the degree to which sounds
that occur simultaneously are heard separately. Composers specify vertical
definition by choosing simultaneous tones and their relation to the tones
surrounding them, and the choice of instruments on which theyíre
played. Performers can alter vertical definition by varying the dynamics
of their simultaneous sounds and through the precision of their ensemble.
Acoustical factors such as the energy ratio of early sound to reverberant
sound also affect vertical definition. 
The bending of a wave front around an obstacle in the sound field. 
Sound field in which the sound pressure level is the same everywhere and
the flow of energy is equally probable in all directions. 
The spatial and/or temporal scattering of sound energy.  See also
feature article, "A Matter of Diffusion"
by Keith Yates.
An acoustical device designed to spread sound reflections.  See
also feature article, "A Matter of
Diffusion" by Keith Yates.
Directional transfer function (DTF).
(See head-related transfer function.)
Directivity factor (Q).
The ratio of the sound pressure squared, radiated directly ahead of a
sound source, to the sound pressure squared radiated in all directions.
The perception of fine distinctions or differences between stimuli. 
In Western tonal music, the fifth degree of the diatonic scale or the
triad (see chord) built on it. This is an
important degree from the standpoint of the tonal hierarchy since, as
its name indicates, it dominates the other degrees (excepting the tonic).
 (See also tonal system.)
Early decay time (EDT).
In concert hall acoustics, the measurement, expressed in seconds, taken
in the same fashion as reverberation time
except that EDT is the time it takes for a signal to decay from 0 to -10
dB relative to its steady-state value. A multiplying factor of 6 is necessary
to make the EDT time comparable to RT. Short decay times cause music and
speech to sound dry or muffled. Long decay times make speech difficult
to understand or even unintelligible. , 
An individual's unique head-related transfer function
(HRTF), typically derived for each ear by placing a tiny probe
microphone inside the meatus, placing a loudspeaker
at a known location relative to the listener, playing a test signal through
the loudspeaker and recording the microphone signal. By comparing the
original test signal to the signal received by the probe microphone, the
filter function of a sound source at that position, and for that ear,
is known. The loudspeaker is then moved to another location and the process
is repeated until an entire, spherical map of filter sets has been devised.
A sound wave which has been reflected or otherwise returned with sufficient
magnitude and delay (typically >90 milliseconds) to be perceived as
distinct from that directly transmitted.
A hypothetical preperceptual sensory register within which auditory information
is temporarily stored without being recorded. The function of this memory
would be to preserve sensory information during the time needed for higher-level
processing mechanisms to extract useful information. Echoic memory does
not last more than a few seconds. It corresponds to iconic memory in the
visual modality. 
Energy-Time Curve (ETC).
In TEF measurements, a display of all the
energy returned during a specified time span. Time is displayed on the
abscissa (x axis) and energy on the ordinate (y axis). An ETC reveals
how energy is released from a system or room or device after it is hit
with a sudden application of input energy confined to a given frequency
In concert hall acoustics, envelopment is the second component of spaciousness,
and generally describes a listener's impression of the strength and directions
from which the reverberant sound appears to arrive. Listener
envelopment (abbreviated LEV) is judged highest when the reverberant
sound seems to arrive at a person's ears equally from all directions--forward,
overhead and behind. 
Trade name of an infrasonic floor-motion system developed by Keith Yates
and produced and marketed under the Immersive Technologies brand name.
The eQuake system relies on several proprietary elements, including a
real-time processor that takes a subwoofer-output audio feed from a surround-sound
processor, and synthesizes a sub-20 Hz signal that is output to a below-floor
excitation system. The eQuake is the world's first residential system
to add realistic, infrasonic, haptic content
to conventional audio/video playback, thereby marking the transition from
a bimodal to a trimodal sensory
Also known as the auditory tube, the Eustachian tube is an approximately
1-1/2 inch long conduit that serves to equalize air pressure on both sides
of the tympanic membrane (eardrum), and to
allow for drainage of the middle ear by serving as a portal into the nasopharynx
(a region of the alimentary canal). 
Judging that a signal is present when it is not or that a change occurred
when none did. Also called a false-positive response.
The distribution of sound energy at a very much greater distance from
a sources than the linear dimensions of the source and in which the sound
waves can be considered to be plane waves. 
A device that can change the relative amplitudes
and phases of the frequency components in
the spectrum of a signal. A high-pass filter
attenuates low frequencies and lets the high ones pass through. A low-pass
filter does the opposite. 
(See temporal coherence boundary.)
In room acoustics, a series of specific reflective returns caused by large
surfaces being parallel to each other. 
(See resonance structure.)
A mathematical analysis of waves, discovered by the French mathematician
Fourier (1768-1830). Fourier proved that any periodic sound, or any non-periodic
sound of limited duration, could be represented (Fourier analysis) or
created out of (Fourier synthesis) the sum of a set of pure tones
with different frequencies, amplitudes and phases. 
A mathematical description of the relationship between functions of time
and corresponding functions of frequency; a map for converting from one
domain to the other. For example, if we have a signal that is a function
of time--an impulse response--then the Fourier Transform will convert
that time domain data into frequency data, for example, a frequency response.
The central portion of the retina where visual acuity, or the ability
to distinguish small objects and details, is greatest. Only about half
a millimeter in diameter, the fovea is the retina's "rod-free zone"
and is densely packed with cones. (See also retina.)
An environment in which there are no reflective surfaces within the frequency
region of interest. 
A measure of the rate at which something repeats. This term usually refers
to the repetition rate of a periodic waveform and is expressed in Hz
(cycles per second) or kHz (thousands of cycles per second). The period
is the inverse of frequency, or the amount of time a single cycle lasts.
 (See also harmonicity.)
A speech sound produced by frication, that is, by forcing air through
a constriction in the vocal tract. Examples are "s" and "f."
G (strength factor).
In concert hall acoustics, the ratio, expressed in decibels,
of the sound energy at a seat in a hall that comes from a non-directional
source (usually located successively at one to three difference positions
on the stage) to the sound energy from the same source when measured in
an anechoic room at a distance of 10 meters.
G is measured in six frequency bands: 125, 250, 500, 1000, 2000 and 4000
Same as G (strength factor), except that the
decibel levels are the average of the G's measured in the 500 and 1000
Hz bands. 
Same as G (strength factor), except that the
decibel levels are the average of the G's measured in the 125 and 250
Hz bands. 
>From the German word for "form" or "shape." The
central idea of Gestalt psychology is that the properties of a whole form
cannot be derived by simply summing the properties of its individual parts.
The constitution of these forms obeys the perceptual laws (or principles)
that were demonstrated for visual perception by the Gestalt psychologists
in the early decades of the 20th century, but which have in general been
confirmed for auditory perception as well. These principles include the
grouping into forms of elements on the basis of their proximity, similarity,
continuity, symmetry and closure. A configuration of elements that obeys
one or more of these principles may be considered to be "well formed"
and as such is a preferred way of experiencing the sensory input. 
(See also auditory stream.)
In concert hall acoustics, if the side walls or the surfaces of hanging
panels are flat and smooth and are positioned to produce strong early
sound reflections, the sound from them may take on a brittle or harsh
quality, analogous to optical glare. Acoustical glare can generally be
prevented by adding irregularities to these surfaces or by curving them.
In the 18th and 19th centuries, fine-scale irregularities on sound-reflecting
surfaces were provided by baroque carvings or plaster ornamentation. 
(See auditory stream, segmentation.)
(See basilar membrane.)
Pertaining to the sense of touch, from the Greek word haptein,
to grasp. There are four types of sensory neurons (mechanoreceptors)
involved in the haptic modality. The haptic, or tactile, sensory modality
is the only active sense that can be used to explore our environment;
vision and hearing are passive senses since they cannot act upon the environment
[no e-mail regarding the Heissenberg Uncertainty Principle, please!].
One component (or partial, or overtone) of a complex tone whose component frequencies
are all integer multiples of a common fundamental frequency (see frequency).
The intervals between components of the harmonic series are defined
by harmonic ratios (i.e., ratios of simple integer numbers). The
term "harmonic ratios" can also be applied to very low frequency
rates of repetition as are found in rhythms.
The state of being harmonic or periodic. Periodicity is mathematically
synonymous with harmonicity, though the former refers to a regularity
in the sound's time description while the latter refers to a regularity
in its frequency description. Contrasting terms to this one include inharmonicity
or aperiodicity (usually for complex tones composed of inharmonically
related partials) and randomness (usually employed to refer to noise waveforms).
(See precedence effect.)
Head-related transfer function (HRTF).
The frequency response between the point in space where a sound source
is located, and the ear, due to anatomical features of the head, upper
torso and pinnae. These features shape the response in such a way as to
allow the ear to localize a sound source in space. (Also known as head
transfer function [HTF], pinnae transform, outer ear transfer
function [OETF], and directional transfer function [DTF]. See
Helmholtz, Hermann von.
Scientist who, during the second half of the 19th century, contributed
to our knowledge about almost every topic in the fields of perception
and sensory processes. Helmholtz argued that perception was based upon
a process of inference, in which, through past experience, we infer from
the sensations we receive at a given time the nature of the object or
event that they probably represent. 
(See temporal lobe.)
The organization of a set of elements into subsets according to relations
of dominance and subordination. Each element of a subset is subordinate
to the subset as a whole which itself is subordinate to the superset of
which it is an element, and so on. In a strict hierarchy no element can
be a member of more than one subset at a given level of the hierarchy.
Abbreviation for head-related transfer function.
(See also earprint and localization.)
Abbreviation for heating, ventilation and air conditioning. 
Abbreviation of Hertz. (See frequency.)
The measure of the difference in the sounds arriving at the two ears of
a listener facing the performing entity in a hall. IACC is usually measured
by recording on a digital tape recorder the outputs of two tiny microphones
located at the entrances to the ear canals of a person or a dummy head,
and quantifying the two ear differences with a computer program. IACCA
is determined with a frequency bandwidth of about 100 to 8000 Hz and for
a time period of 0 to about 1 second. No frequency weighting is used.
The interaural cross-correlation coefficient determined for a time period
of 0 to 80 milliseconds. It is the average of the values measured in the
three octave bands with mid-frequencies of 500, 1000 and 2000 Hz. It has
been shown to be a sensitive measure for determining the apparent
source width (ASW) of a performing entity as heard by a person seated
in the audience. 
The interaural cross-correlation coefficient determined by averaging the
values in the 500, 1000 and 2000 Hz bands, for a time period of 80 to
750 milliseconds. It correlates approximately to the state of sound diffusion
in a concert hall. 
The ability to retrieve from memory a name or concept associated with
an object or event. 
Abbreviation for inside-the-head localization.
Abbreviation for Impact Isolation Class.
Pertaining to "immersion," or the feeling of being present in
a mediated world rather than the immediate physical environment. The success
of the phenomenon is thus dependent on the absence of, or the ability
to block out, sensory cues associated with the immediate environment (the
"real world"), and the degree to which the cues supplied by
the mediated world are both deep (i.e., rich in informational content)
and broad (i.e., correlated across multiple sensory modalities; see also
haptic and eQuake).
The mediated environment can be purely fictional or a temporally and/or
spatially distant real environment. The question isn't whether the created
world is as real as the physical world, but whether the created world
is real enough for you to suspend your disbelief for a period of time.
The introduction of perspective in painting by Masaccio in the 1420s took
a first step toward immersion by creating a sense of depth that integrated
the spectator into the pictorial space. But because the medium of painting
simulates depth on a flat surface the spectator cannot break through the
canvas and walk into the pictorial space. (See also DeepEntertainment(tm).)
Immersive Technologies Corporation.
A California company founded by Keith Yates in 1998 to develop and manufacture
and/or license technologies to increase the immersive power of movie and
music playback experiences. See also eQuake.
Impact Isolation Class (IIC).
A measure or specification of isolation effectiveness of building structures
from impact noises such as slammed doors, dropped objects, footfalls,
shuffled furniture, etc. The higher the IIC rating, the better such isolation.
Impact noises can be transmitted through walls, floors, and ceilings throughout
a building and re-radiated at distant locations. Careful design and special
construction materials (floating floors, isolation pads, resilient channels,
spring rails, flexible connectors and hangers, for example) can help improve
IIC ratings, which may be thought of as the structure-borne equivalent
of the airborne noise ratings addressed by STC.
A measurement of sound pressure versus time, showing how a device responds
to an impulse. 
A key concept in cognitive psychology. Drawing on the image of the way
computers work, information resulting from stimulation of the sense organs
is analyzed and transformed by a number of serial and parallel processors
(see neural net) each of which takes as input
the information output by another processor. 
(See backward recognition masking.)
Pertaining to frequencies below the audible range, i.e., sub-20 Hz. Note:
Sound in the 2-5Hz range played at 100-125dB may produce difficulty in
swallowing and slight post-exposure headache. Sound in the 2-5Hz range
played at 125-137dB may produce chest wall vibration; difficulty in speaking
and voice modulation; swaying sensations; lethargy and drowsiness; and
post-exposure fatigue and headaches. Sound in the 5-15Hz range played
at 125-137dB may produce middle-ear pain; difficulty in speaking and voice
modulation; severe chest wall vibration; severe abdomen vibration and
associated feelings of nausea; a falling sensation; lack of concentration
and drowsiness; tinnitus; and severe post-exposure
fatigue and headaches. According to some researchers, 7Hz is possibly
the most disturbing frequency, being close to the natural resonance frequency
of many of the internal body organs and being the same frequency as the
alpha brainwaves. Sound in the 15-20Hz range played at 125-137dB may produce
severe middle ear pain; respiratory difficulties (gagging sensations);
nasal cavity vibration; persistent eye watering; tinnitus; sensation of
fear; excessive perspiration and shivering; and severe post-exposure fatigue
and headaches. 
A tone composed of partial that are not all
integer multiples of a common fundamental.
Initial time delay gap (ITDG).
The deepest part of the ear. It is contained within a system of spaces
and canals, known as the osseous or bony labyrinth, in the temporal bone.
These spaces and canals are divided into three sections: the vestibule,
which contains two balance organs, the utricle and saccule;
the semicircular canals, located behind the vestibule, and the
cochlea. The spaces between the bony walls
of the osseous labyrinth and the membranous labyrinth are filled with
one of several types of fluid, which deliver nutrients to the cells of
the inner ear; provide the chemical environment needed for transfer of
energy from a vibratory stimulus to a neural signal; and function as the
medium to carry vibratory stimuli from the oval window to the sensory
structures along the cochlear partition. , 
Inside the head localization (IHL).
The name given to the physical energy with which a sound is present. It
contrasts with "loudness," which is the perceptual experience
approximately correlated with that physical intensity. 
Intimacy (or presence).
In concert hall acoustics, a venue is said to have ìacoustical
intimacyî if music played in it gives the impression of being played
in a small hall. In the language of the recording and broadcast industries,
an intimate hall is said to have "presence." See also t1
(initial time-delay gap). 
A sequence of events is called isochronous if the time separating each
pair of successive events is strictly equal. The absence of isochrony
is called anisochrony. 
Abbreviation for just noticeable difference.
Just noticeable diference (jnd).
The smallest change in a stimulus parameter (frequency, intensity, duration)
that can be detected by a listener at a predefined level of performance
(e.g., 71 percent of the time).  (See also Weber's
Perceptual proximity of the keys of the Western tonal system. Keys sharing
more pitches are considered to be more closely related than those with
fewer pitches in common. 
Abbreviation of kiloHertz. (See frequency.)
Lateral Energy Fraction.
Lateral geniculate body.
A peanut-sized area of the brain to which the output of the retina is sent. Each lateral geniculate body (there
are two, one on each side of the brain) routes it output to the visual
The identification of a sound that is presented over headphones is described
as "lateralization" rather than localization in recognition
of the fact that sound playback over headphones is generally not "externalized,"
i.e., it is experienced as coming from somewhere between the two ears
rather than from somewhere in the surrounding environment. Lateralization
is the identification of the position of the sound on the left-right dimension.
Also referred to as inside-the-head localization (IHL). , 
Abbreviation for listener envelopment.
The lateral energy fraction determined by the ratio of the output
of a figure-8 microphone with its null axis pointed to the source of the
sound, divided by the output of a non-directional [i.e., omnidirectional]
microphone at the same position. LFE4 is determined for the time period of 0 to 80 milliseconds
and is the average of the LF's in the four frequency bands, 125, 250,
500 and 1000 Hz. It is equal to the ratio of the weighted energy in the
sound that does not come from the direction of the source to that which
comes from all directions including that of the source. LFE4
also correlates with the apparent source width
In concert hall acoustics, a component of spaciousness referring to a
listener's impression of the strengths and directions from which the reverberant
sound seems to arrive. Listener envelopment is judged highest when the
reverberant sound seems to arrive equally from all directions--forward,
overhead, behind. 
In concert hall acoustics, a subjective quality related primarily to the
reverberation times at the middle and high frequencies, those above about
350 Hz. A hall can sound "live" and still be deficient in bass.
If a room is sufficiently reverberant at low frequencies, it is said to
sound "warm." 
The judgment of the place of spatial origin of a sound. Humans localize
sounds based on two primary cues: interaural intensity difference (IID),
and interaural time difference (ITD). IID refers to the fact that
a sound is louder at the ear it is closer to (the "ipsilateral"
ear) for two reasons: because sound intensity diminishes with distance
traveled; and because the head itself blocks the sound path to the more
distant ("contralateral") ear). ITD refers to the fact
that a sound will arrive at the ipsilateral ear before the contralateral
ear. Generally speaking, the ear-brain system uses ITD cues to determine
the spatial origin of low-frequency sounds, and IID cues to determine
the spatial origin of higher frequency sounds. The IID/ITD keys to localization
were first proposed by Lord Rayleigh in the first decade of the 20th century,
and are sometimes referred to as the duplex theory of localization.
About 60 years later researchers discovered that, in addition to IID and
ITD information, the brain processes information about the sound source's
location based on how its energy has been accentuated or attenuated in
the mid- and high-frequency ranges by minute time delays caused by the
folds and depressions in the listener's pinnae
(and at lower frequencies by the shoulders and upper torso): Because of
the pinna's asymmetry, different angles of sound incidence produce different
characteristic filtering. (The spectral-shaping influence of the pinnae
can be readily verified by trying to localize sound after filling their
cavities with putty.) The effect of filtering by the pinna and upper body
is termed the head-related transfer function (HRTF) and is unique
for each individual, similar to a fingerprint. (In fact, an individual's
HRTF is sometimes called his or her earprint.)
Localization accuracy in humans is most precise for sound sources located
in front of the listener and at ear level. Localization is not simply
an auditory process, but includes higher order brain functions which combine
learned responses, complex pattern matching, and cross referencing with
other senses in the brain, resulting in a unified (though not always correct)
perception of the location of a sound source. (See also lateralization
and visual capture.) , 
A scale in which the logarithm of the physical variable is used instead
of the raw value. This has the effect that equal steps along the scale
represent equal ratios between the raw values. Examples in audition are
the decibel scale and the scale of musical pitch. 
The "hammer" bone of the middle ear.
The process by which one sound (the masker) affects the threshold of audibility
of another sound (the target or probe) when played at the same time. More
intense sounds mask less intense ones. The amount of masking depends on
the proximity of the frequency components (see critical
bands, frequency and harmonic)
of the two sounds, as well as on the global intensity of the masker. The
greater the level, the greater the extent to which a given masker frequency
can mask target components at higher frequencies (see backward
recognition masking). 
Meatus (also called the external
The ear canal, leading from the concha to the tympanic
membrane (eardrum). Approximately 1 inch long, the outer one-third
of the meatus is cartilaginous; the remaining two-thirds is bony. Ceruminous
(wax) and sebaceous (oil) glands are plentiful in the cartilaginous segment,
and are also found on the posterior and superior walls of the bony canal.
The wax and oil lubricate the canal and help keep it free of debris and
foreign objects. 
Mechanoreceptors are the receptors involved in the haptic
(tactile) sensory system and come in four distinct types: Merkel's
receptors and Meissner's corpuscles, both with relatively small
receptive fields and located in the dermal papillae (superficial skin);
and pacinian corpuscles and Ruffini corpuscles, both with
larger receptive fields and located deeper in the skin, i.e., subcutaneously.
The smaller receptive fields of the Merkel's and Meissner's structures
allow them to resolve finer spatial details that the pacinian and Ruffini
structures. The four mechanoreceptor types respond to different intensity
and frequency ranges of mechanical stimuli. Meissner's corpuscles are
most sensitive to low-frequency (< 100 Hz) sinusoidal mechanical stimuli;
their excitation is felt as a gentle fluttering in the skin, sometimes
termed flutter sense. In contrast, pacinian corpuscles are maximally
sensitive to higher frequency (50-500 Hz) stimuli, which evoke a diffuse,
humming sensation in the deeper tissue. Ruffini corpuscles and Merkel's
receptors respond to indentation of the skin. The spatial distribution
of mechanoreceptors is not uniform; the densest distribution can be found
in the fingertips. (See also sensory experience.)
The pattern of ascending and descending pitch changes in a melody. 
A hypothetical pattern of mental or brain activity that represents some
feature of the world, of the person, or of the interaction between the
person and the world. 
A mental program or formula that has been proposed by Jean Piaget and
other psychologists as a means by which people represent the world and
regulate their interactions with it. The concept implies more of an active
control mechanism than the concept of mental "representation."
The group of phenomena related to the musical measure. It consists of
the hierarchical ordering of the piece of music into units of equal duration
(beats; see also hierarchy).
This ordering is indicated by the time signature at the beginning of the
score. From a phenomenological point of view, the presence of a metric
organization in the heard piece is evidenced by the fact that one can
tap one's foot or dance in synchrony with the music. 
A six-sided cavity between the outer ear and
the inner ear, principally containing the
ossicles (often called the "hammer" [malleus], "anvil"
[incus] and "stirrup" [stapes], the three smallest bones in
the body); two muscles, the tensor tympani and the stapedius;
and the opening to the Eustachian tube. Sound
is transformed at the middle ear from acoustical energy at the eardrum
to mechanical energy at the ossicles; the ossicles convert the mechanical
energy into fluid pressure within the inner ear via motion at the oval
window. , 
The phenomenon of the "missing fundamental" is one in which
the listener, presented with a harmonic tone in which the fundamental
is absent, hears the same pitch as would be heard if the fundamental had
been present. Therefore, only some of the harmonics are needed to hear
the pitch. The pitch that is heard when the fundamental is absent is called
periodicity pitch because the period of the wave is the same whether
the fundamental is present or not. 
That part of a sound field, usually within about two wavelengths from
a sound source, where there is no simple relationship between sound level
and distance. 
A system composed of many simple processing units, formally mimicking
the operation of nerve cells, which are connected together in complex
patterns of excitation and inhibition and propagate activation to other
units by way of these connections. The current state of a given unit and
the degree to which it excites other units can be influenced by the success
it has had in activating them. Propagated activity among cells can lead
the system to stable states in which the activity of the units remains
relatively constant. These states constitute the "response"
of the system to a given stimulation by the (external or internal) environment.
The main hypothesis concerning this kind of architecture (also called
connectionist or parallel distributed processing networks),
is that it is better suited to modeling the microstructure of cognition
than more classical data flow or serial processing models: processing,
representation and memory are postulated to be distributed over units
in the net rather than being constrained to specific storage locations
and processing routines. 
A nerve cell. A neuron's job is to take in information from the cells
that feed into it; to integrate (sum up) that information; and to deliver
that integrated information to the next neuron. The information is usually
conveyed in the form of brief nerve impulses. In a given cell,
one impulse is the same as any other; they are "stereotyped"
events. Impulse rates vary from one every few seconds to about 1000 per
second. Anatomically, the nerve cells consists of a globular-shaped cell
body with a nucleus, mitochondria and other organelles; a cylindrical-shaped,
signal-transmitting nerve fiber called an axon; and a number of
branching and tapering fibers called dendrites, typically under
one millimeter in length. The entire nerve cell-the cell body, axon and
dendrites-is encased in the cell membrane. The cell body and dendrites
receive information from other nerve cells; the axon, which may be anywhere
from less than a millimeter to more than one meter in length, transmits
this information from the nerve cell to other nerve cells. Near the point
where they end, an axon typically splits into many smaller branches whose
ends come very close to, but do not touch, the cell bodies or dendrites
of other nerve cells. At these regions, called synapses, information
is conveyed from one nerve cell, called the presynaptic cell, to
the next, called the postsynaptic cell. Neural signals originate
at a point near where the axon joins the cell body, and travel down the
length of the axon, away from the cell body and toward the terminal branches.
At a terminal, the information is transferred across the synapse to the
next cell or cells by a process called chemical transmission. 
A random waveform whose frequency spectrum contains all audible frequencies,
called white noise. A noise signal that contains all frequencies
with equal energy per octave is called pink noise, commonly used
to test loudspeakers. A noise signal that is filtered, removing higher
and lower frequencies and just letting through a small band of frequencies,
is called narrow-band or band-pass noise. Filtering out
the high frequencies starting from a certain cut-off frequency gives low-pass
noise. Taking a noise waveform over a certain time period and then repeating
this segment gives what is called frozen noise. [1
Noise criteria (NC) curves.
A measure of background noise in rooms. Each NC curve is defined by its
sound pressure level at eight octave-band center frequencies: 63, 125,
250, 500 1000, 2000, 4000 and 8000 Hz. The lower the NC rating, the lower
the background noise level. The preferred range of NC performance for
sound-critical spaces (e.g., home theaters, home media rooms, home listening
rooms, concert and opera halls, recital halls and broadcasting and recording
studios) is < NC-20. Factors that must be addressed in achieving satisfactory
NC performance typically include mechanical (HVAC)
design and the construction detailing of the room's envelope, i.e., its
walls, ceiling, floor, windows and doors, in order to reduce noise infiltration
from areas exterior to the room.  (See also Room
Criteria, Sound Transmission Class (STC) and Impact Isolation Class (IIC).)
One of the pitch intervals in music. Physically,
a note that is an octave higher than another has a frequency that is twice
that of the lower one. 
Not directly in front of a microphone or loudspeaker. 
Pertaining to the ear; aural. 
Inflammation of the ear, which may be marked by pain, fever, hearing abnormalities,
deafness, tinnitus, and vertigo. 
The external structure of the ear, consisting of the pinna
Outer ear tranfer function (OETF).
(See head-related transfer function.)
A major clue to the perception of depth in vision, parallax arises from
the relative motions of near and far objects that is produced when the
viewer moves his or her head up and down or from side to side. See also
Parallel distributed processing.
(See neural net.)
Passing tone. Ornamental notes melodically
interleaved between two notes that are part of the triad (see chord)
of the principal key. 
What the perceiver sees or hears as a result of stimulation, as opposed
to the physical reality of the stimulation. The percept may be considered
the "object" of study in perceptual psychology. 
The fixing of one's gaze for sometimes very short periods of time in specific
areas as one explores a visual form. These fixation points constitute
the zones of perceptual centration. This term was applied to auditory
perception by Frances to designate the auditory information upon which
listeners focus their attention at a given moment. 
The impression of perceiving the same object, event or pattern in spite
of variations in stimulus structure, due, for example, to being played
louder or softer, faster or slower, higher or lower, or in different acoustic
The phase is the particular point in a wave that is passing a position
in space at a certain instant of time. Phase is measured in units of degrees,
with 360 degrees representing one complete cycle of the wave. If two tones
have the same period and are occurring at the same time, the temporal
lag of one with respect to the other can be described in terms of phase.
If two waves are out of phase by 180 degrees, the later one is lagging
by one-half a period. 
The basic classes of sounds used to form the words of a language. Examples
in English are "k," "oo," and "th." They
are often represented by single written letters. 
A hypothetical active process by which a speech sequence that is interrupted
by a noise sound in place of a given phoneme results in the listener's
impression of having heard the phoneme. This effect does not occur if
a silent gap is left at the place where the phoneme normally occurs. 
The external, visible, largely cartilaginous appendage of the outer ear.
Its perimeter is demarcated by a ridge-like rim called the helix,
which curves down to the earlobe (lobule) at its bottom. Roughly in the
middle is a relatively large, cup-shaped depression called the concha.
(See head-related transfer function.)
The auditory attribute on the basis of which tones may be ordered on a
musical scale. Two aspects of the notion of pitch can be distinguished
in music: one related to the frequency (or fundamental frequency) of a
sound (measured in Hz) which is called pitch
height, and the other related to its place in a musical scale which
is called pitch chroma. Pitch height varies directly with frequency
over the range of audible frequencies. This "dimension" of pitch
corresponds to the sensation of "high" and "low."
Pitch chroma, on the other hand, embodies the perceptual phenomenon of
octave equivalence, by which two sounds separated by an octave
(and thus relatively distant in terms of pitch height) are nonetheless
perceived as being somehow equivalent. This equivalence is demonstrated
by the fact that almost all scale systems in the world in which the notes
are named assign the same names to notes that are roughly separated by
an octave, i.e., the labeling system cycles at every octave. Thus pitch
chroma is organized in a circular fashion, with octave-equivalent pitches
considered to have the same chroma. Chroma perception is limited to the
frequency range of musical pitch (50-4000Hz). 
In TDS acoustical measurements, Polar Energy-Time
Curves (ETC) measure the magnitude and time of arrival of reflections,
and, importantly, display the direction of the reflecting surface relative
to the microphone placement. Polar ETC's can thus allow the operator to
pinpoint the location of one or many reflecting surfaces in a concert
hall, auditorium, theater, recording studio or residential playback venue.
The characteristic sound radiation pattern of a microphone and loudspeaker,
usually plotted to show sound sensitivity or output, respectively, at
various angles of sound incidence. 
The positive or negative direction of an electrical, acoustical or magnetic
force. Two identical signals in opposite polarity are 180 degrees apart
at all frequencies. Polarity is not frequency dependent. 
An effect in which the human auditory system suppresses early reflections
of a direct sound, i.e., it "fuses" the direct sound and its
early reflections and localizes the source on the basis of the earlier
(i.e., direct) sound. The basis for the distinction is that the reflections
arrive with a certain delay compared to the direct sound. Precedence effect
is sometimes referred to as the law of the first wavefront or the
Haas effect. 
Gradual and biologically normal loss of acute hearing with advancing age.
Primary auditory cortex.
(See temporal lobe.)
The travel of sound waves through a medium (e.g., air). 
The sense of body position. 
A notion introduced by Rosch to designate an abstract representation of
a whole class of objects, of which the prototype would constitute the
central tendency. 
The study of the relationship between physical measures of sound (e.g.,
amplitude and frequency) and the perception of them. 
A tone with a sinusoidal waveform is called
a pure tone because it is considered to be the simplest form of tone and
sounds pure when played in isolation. 
(See directivity factor.)
The portion of a sound wave in which air molecules are spread apart, forming
a region with lower-than-normal atmospheric pressure. The opposite of
An increase in correct recall rate for the most recently presented items
of a list compared with those presented earlier in the list. 
The impression that an object, event or sequence has been experienced
before or is familiar. 
In acoustics, the bouncing or return of a sound wave from an object larger
than one-quarter wavelength of the sound. When the object is one-quarter
wavelength or slightly smaller, it also causes diffraction
of the sound. 
The change in direction of a sound wave that occurs when sound passes
from one medium to another (e.g., from air to glass to air, or through
layers of air with different temperatures. 
The phase of one sine wave compared to another. 
(See absolute pitch.)
A resonance structure can be described in terms of the relative level
produced at each frequency by a resonating object. Most physical objects
(membranes, bars, air columns, strings) have several modes of vibration
that resonate at different frequencies, thus constituting a complex resonance
structure. In the case of speech, these resonance regions are called formants.
The placement of the formants is a major clue to the identity of a vowel.
The way resonant frequencies change rapidly over time is a clue to the
identity of several classes of consonants. 
Technically a part of the brain and located on the inner surface of the
eyeball, the retina translates light into nerve signals, which are then
routed via the optic nerve to the lateral geniculate
body. The retina consists of three layers of nerve-cell bodies. The
layer at the back of the retina contains roughly 125 million light receptors,
the rods and cones. Rods, which considerably outnumber cones,
are responsible for our vision in dim light and are out of commission
in bright light. The three types of cones do not respond to dim light
but are responsible for our ability to see fine detail and for color vision;
cones are "tuned" to absorb long, medium or short wavelengths
of light, loosely corresponding to red, green and blue. The distribution
of rods and cones varies considerably over the surface of the retina;
in the center, where fine-detail vision is best, is the fovea,
which is densely packed with cones. The retina's middle layer contains
three types of nerve cells: bipolar cells, which receive input
from the receptors (i.e., rods and cones); horizontal cells, which
link receptors and bipolar cells; and amacrine cells, which link
bipolar cells and retinal ganglion cells. The layer at the front
of the retina contains approximately 1 million of the aforementioned retinal
ganglion cells, whose axons pass across the surface of the retina, collect
in a bundle, and leave the eye to form the optic nerve. 
Reverberant sound field.
A sound field made of reflected sounds in which the time average of the
mean square sound pressure is everywhere the same and the flow of energy
in all directions is equally probable. This requires an enclosed space
with essentially no acoustic absorption, e.g., a reverberation chamber.
In concert hall acoustics, reverberation refers to sound that persists
in a venue after a tone is suddenly stopped. A hall that is reverberant
is called a "live" hall. (See also liveness.)
A room that is not reverberant is called a "dead" or "dry"
Reverberation time (RT).
Defined as the time, multiplied by a factor of 2, that it takes for the
sound in a hall to decay from -5 to -35 dB below its steady-state value.
The factor of 2 is necessary because RT must conform to the original definition
of sound decay which was from 0 to -60 dB. Roughly speaking, RT is the
time it takes for a loud sound to decay to inaudibility after its source
is cut off. RT is usually measured in octave or one-third octave bands.
The source of sound may be a pink noise or a sound impulse. Originally,
RT was determined from a plot of sound pressure level vs. time as recorded
on the moving paper of a graphic level recorder. Today it is determined
by the Schroeder (1965) method which involves computer integration of
a backward-played tape recording of the decaying signal. The mid-frequency
RT is the average of the RTs at 500 and 1000 Hz. The measurement is generally
made in both occupied and unoccupied halls, at two positions when occupied
or at 8 to 24 positions when unoccupied. The data in each frequency band
at the various positions are averaged. A least-squares fit to the -5 to
-35 dB portion of the decay curve is used in setting the value of RT for
each band and position. The RTs of the largest stone cathedrals can be
nearly 10 seconds; the world's most renowned concert halls typically fall
in the range of 1.8 to 2.2 seconds; opera houses typically fall in the
1.2 to 1.6 second range; aggressively damped home theaters can exhibit
RTs below 0.25 seconds. A venue's use and its RT must be consonant: A
home theater with a 6 second RT would render movie dialog unintelligible,
while a cathedral with a 0.3 second RT would deflate its sonic grandeur.
A sequence of events having a specific set of time intervals between the
onsets of successive events. Sequences having different onset-to-onset
intervals are said to have different rhythmic structures or temporal structures.
The time taken for a signal to rise from silence to full intensity. The
tones of different instruments can be distinguished by their rise time,
the tones of percussive instruments like the piano rising very rapidly
and others like the tuba, more slowly. In music, "rise time"
is called "attack" (see amplitude envelope).
Room criteria (RC) curves.
A measure or specification of background noise from HVAC
systems according to measured sound pressure level at 10 octave-band center
frequencies: 16, 31.5, 63, 125, 250, 500, 1000, 2000, 4000 and 8000 Hz.
Room Criteria curves were derived for use in office spaces and are more
demanding than Noise Criteria curves at low
Frequencies at which sound waves in a room resonate (in the form of standing
waves), based on the room dimensions. 
Root mean square (rms).
The effective DC voltage of an AC signal. The square root of the mean
value of the squares of the instantaneous values of a varying quantity.
In acoustics, a unit of absorption equal to the absorption of 1 square
foot of surface which is totally sound absorbent. Named after Wallace
Clement Sabine, the Harvard professor honored as the "father of architectural
acoustics" for his investigations into concert hall sound at the
turn of the century. 
The normal, but largely unnoticed rapid darting of the eyes from one fixed
point to another. 
A set of pitches (or notes) arranged with certain intervals among them
within the span of an octave (see also pitch).
The scale pattern generally repeats in each octave.
Each note constitutes a degree of the scale. Each diatonic scale consists
of intervals between adjacent notes that are either minor or major seconds
(one or two semitones, respectively). The different arrangements of major
and minor seconds yield different modes. The two most important
modes in Western tonal music are the major and minor modes.
The chromatic scale contains all twelve semitone steps within an
octave. Another kind of scale which does not fall within the tonal system
but which was used extensively in the music of Debussy and Ravel is the
whole-tone scale, which has only six notes, all separated by whole
tones. Intonation (or tuning system) refers to the exact tuning of the
notes of a given scale system. The most widely used tuning system in Western
music is the equal-tempered system in which all intervals can be
expressed as integer multiples of a standardized semitone. This system
was brought to Europe from China and adopted during the 17th century.
Schroeder integration of reverberation.
In acoustics, an integration of reverberant data in which the last energy
is integrated first and the initial arrival is integrated last, all of
which is normalized by the total. The integration simulates the effect
of taking many time measurements and averaging them together. 
The "blind spot" in human vision corresponding to the region
where the optic nerve enters the eye, i.e., the oval-shaped area about
2 millimeters in diameter with no rods or cones. You can "map"
your blind spot simply by closing one eye and gazing at a small object
across the room. Hold a Q-Tip at arm's length directly in front of the
object and slowly move it out to the right exactly horizontally. The white
cotton will vanish when it is about 18 degrees out. Now, if you place
the stick so that it runs through the blind spot, it will appear as a
single, continuous stick, without any gap. (This feature is referred to
as "completion.") You are not normally aware of your blind spot,
and cannot be, unless you test for it. You don't see black or white or
anything there; you see nothing. , 
The process by which speech signals are divided into phonemes, syllables
or words. It consists of creating boundaries between groups of elements.
In music, segmentation refers to the process of dividing an event sequence
into distinct groups of sounds. The factors playing a role in segmentation
are similar to the principles of grouping addressed by Gestalt
The smallest standard musical interval (i.e., step in pitch)
in the Western equal-tempered pitch system (see scale).
All other intervals can be described as containing an integer number of
semitones, e.g., the octave contains 12 semitones,
the perfect fifth 7 semitones, etc. A tone that is a semitone higher than
another is approximately 6 percent higher in frequency.
There is a semitone separation between any black key on the piano and
its nearest white neighbor or between adjacent white keys that have no
black keys between them. 
Sensory experiences occur when stimulus energies excite one or more types
of receptor neurons, of which there are five
specialized types in animals: chemoreceptors, mechanoreceptors, thermoreceptors,
photoreceptors and nociceptors. These receptors transduce (change) the
form of input energy into a neural (electro-chemical) signal. A single
photon or micrometer of mechanical displacement is sufficient to excite
photoreceptors in the retina or mechanoreceptors
in the skin, respectively. Receptors selectively relay certain features
of the stimulus to the central nervous system. Individual receptors are
tuned to one or several stimulus features. Localization of a sensation
is a function of the size of the receptive field of the receptor. The
duration of a sensation is related both the duration of the stimulus and
the perceived intensity. The intensity of a sensation is mediated by two
mechanisms: Stimuli of increasing intensity evoke progressively more activity
in a receptor, and recruit additional receptors with higher activation
Signal-to-noise ratio (S/N).
The ratio in decibels between signal and noise. An audio component with
a high signal-to-noise ratio has relatively little background noise accompanying
the signal; a component with a low signal-to-noise ratio is noisy. 
The simplest form of periodic wave motion, expressed by the equation y
= sin x, where x is degrees and y is voltage or sound pressure level.
All other forms can be created by adding (mixing) a number of sine waves.
The wave form of a "pure tone" is a sine wave. 
Having the shape of a sine wave. 
In acoustics, a unit of loudness. Defined as the loudness of a 1000 Hz
tone 40 dB above threshold. A millisone is one-thousandth of a
sone and is often called the loudness unit. 
Energy that transmitted by pressure waves in air or other materials and
is the objective cause of the sensation of hearing. Longitudinal vibrations
in a medium in the frequency range 20 Hz to 20 kHz. 
Sound Transmission Class (STC).
In acoustics, a single number rating for describing sound transmission
loss of a wall or partition. 
In concert hall acoustics, a hall is said to be "spacious" if
the music performed in it appears to the listener to emanate from a source
wider than the visual width of the actual source, and if the listener
is noticeably enveloped by the reverberant sound. The former attribute
is often referred to as apparent source width
(ASW); the latter attribute is often referred to as listener
envelopment (LEV). , 
A description of the frequency content of a sound waveform, usually presented
as a graph with frequency on the abscissa (x axis) and amplitude on the
ordinate (y axis). A pure tone would have a single vertical line at the
appropriate frequency with a height indicating its amplitude. A complex
sound (see complex tone) would have several
such lines, indicating the multiple components. Drawing a curve through
the tops of the lines would describe the spectral envelope. A spectrogram
is another representation of a spectrum in which the time component is
reintroduced: time is represented on the abscissa, frequency on the ordinate,
and amplitude is coded as the darkness of the trace at a given frequency
and time. In an auditory neural spectrogram, instead of a continuous
signal, the probability of occurrence of nerve spikes at a given moment
in time is represented. The frequency axis is replaced by a frequency-specific
auditory nerve channel (see basilar membrane).
A third type of spectral representation called a time-frequency perspective
plot is drawn in three dimensions, with time along the x axis, amplitude
along the y axis, and frequency along the z axis. 
A mirror-like reflection of sound from a flat surface; reflections that
do not spread out. 
A measure of sound clarity that indicates the ease of understanding speech.
It is a complex function of psychoacoustics, signal-to-noise ratio of
the sound source, and direct-to-reverberant energy within the listening
Speed of sound.
In air, approximately 1130 feet per second at 20 degrees Centigrade. 
A square wave is one in which there are only two values of the displacement
of the wave from the neutral position, a positive displacement and an
equally large negative displacement. The wave moves instantaneously form
one state to the other and remains equally long in each state. Its spectrum
contains odd harmonics only, whose intensities are inversely proportional
to the harmonic number. 
In concert hall acoustics, the measure of the degree of support that the
hall, including the walls and ceiling of the hall and of the enclosure
immediately surrounding the players, give to the players on stage. It
is the difference, in decibels, between the impulse sound energy from
an omnidirectional sound source that arrives at a player's position within
the first 10 milliseconds, measured at a distance of 1 meter from the
sound source, and that which arrives in the time interval between 20 and
100 milliseconds at the same position. The sound arriving in the later
interval has been reflected from one or more surfaces surrounding the
player's position on the stage, and its strength, minus the strength of
the sound in the first 10 milliseconds, is made with the chairs, music
stands and percussion in place, except that those near the source and
receiver are set aside. The measurements are made at several positions
and the data are averaged. 
In acoustics, an apparently stationary waveform created by multiple reflections
between opposite room surfaces. At certain points along the standing wave,
the direct and reflected waves cancel, and at other points the waves add
together or reinforce each other. These are sometimes called room
The smallest muscle in the body, located in the middle
ear. Contraction of the stapedius pulls the stapes, altering
the mechanical efficiency of the ossicular chain. , 
(See Sound Transmission Class.)
Stereopsis. The most important mechanism
for assessing depth in human vision. First enunciated in 1838 by Sir Charles
Wheatstone (who also invented the "Wheatstone bridge" in electricity),
stereopsis depends on the slight differences in the two pictures projected
on the retinas. (See also parallax.)
(See analytic listening.)
t1 (initial time-delay gap or ITDG).
In concert hall acoustics, the time interval, measured in milliseconds,
between the arrival at a seat in the hall of the direct sound from a source
on stage to the arrival of the first significant reflection. It corresponds
with the subjective impression of "intimacy." 
Abbreviation for Time Delay Spectrometry.
A computer based platform for measuring audio devices and acoustic environments,
manufactured by Techron and more recently Goldline under license from
the Jet Propulsion Laboratory, Pasadena, California. See also Time
Delay Spectrometry. 
The speed of occurrence of the beats for a given metric structure. In
a musical score, the tempo is specified in terms of the number of metric
units per minute, for example, quarter-note = 60, in which the time value
of each quarter-note is 1 second. The inverse of tempo, the time between
beats, is called the beat period. 
An adjective meaning "pertaining to time." 
The degree to which the auditory system can resolve, or separately distinguish,
events separated by extremely brief time periods. 
Temporal coherence boundary.
Defines the threshold for hearing a repeating two-tone sequence as composed
of a single auditory stream across a range of frequency differences between
the tones and rates of tone presentation when the listener is trying to
hear a single stream. Above the boundary, the sequence is always heard
as two streams. Below it, the sequence may be heard as a single stream.
This boundary is contrasted with the fission boundary, which defines
the threshold for hearing the same kind of repeating sequences when the
listener is trying to hear two separate streams. Above the fission boundary,
the sequence may be heard as two streams, but below it the sequence is
always heard as a single stream. 
A region of the lateral part of cortex (just center of and slightly behind
the ears) concerned with audition and containing primary auditory cortex
(i.e., the first cortical area to which auditory signals are relayed,
also known under the name of Heschl's gyri).
(See rhythm pattern.)
Like the stapedius, a small muscle in the
middle ear. Contraction of the muscle increases
the stiffness of, and thus lessens the amount of energy conducted by,
the ossicular chain. Though to a significantly lesser extent than the
stapedius, the tensor tympani is involved in acoustic reflex, which
is the automatic, protective response of the intratympanic muscles to
intense sound stimulation. , 
In TDS measurements, the 3-D display shows
the change in magnitude/frequency response versus time for a number of
individual TDS sweeps. Each sweep is offset in time by a constant amount
and on the screen form a three-dimensional surface display sometimes called
a "waterfall" plot. The three dimensions are time, energy and
A division of Lucasfilm, San Rafael, California. Also, a set of specifications
for the enhancement of sound playback in the residential environment.
Also referred to as sound quality or sound color. The classic negative
definition of timbre is: the perceptual attribute of sound that allows
a listener to distinguish among sounds that are otherwise equivalent with
respect to pitch, loudness, and subjective duration. Contemporary research
has begun to decompose the attribute into several perceptual dimensions
of a temporal, spectral, or spectro-temporal nature. 
Time Delay Spectrometry (TDS).
A method, conceived by Richard Heyser, that permits a spectrum
that has been delayed to be measured with the signal delay removed. TDS
measures in the frequency domain, then transforms the results mathematically
for interpretation in the time, energy or frequency domains. The principal
advantages of TDS measurements are superior noise and distortion rejection
properties, fast data gathering capability, and the ability to make acoustical
measurements under actual use situations. TDS measurements include the
frequency response, phase response, and time response data associated
with other techniques, plus energy-time curves,
polar energy-time curves, and energy-time-frequency
curves (3-D displays). 
A sensation of noise, frequently of ringing, in the ears. Tinnitus
aurium refers a subjective sensation of noises in the ears. Objective
tinnitus refers to abnormal or pathological sounds originating within
the body, in the region of the ear, which are audible to others than the
subject. , 
A set of musical rules that characterize Western music since the Baroque
(17th century), Classical, and Romantic styles. This system is still quite
prominent in the large majority of traditional and popular musics of the
Western world. Other musical systems in use in the West do not conform
to these rules, and are consequently called non-tonal or atonal. 
The principal note or chord of a key in the
Western tonal system. 
Instabilities present in the oscillation pattern of a physical object
that is set into vibration before the object settles into a stable oscillation.
Also called attack transients (see amplitude envelope). Similar oscillatory instabilities
("legato transients") can be observed when the object changes
state suddenly as occurs when a musical instrument changes pitch (by changing
fingering on a woodwind instrument, pushing on a valve or piston in a
brass instrument, pressing down on a string with a finger, or lifting
one up on a string instrument). Transients are often characterized by
a noisy or inharmonic spectrum.  (See also harmonicity,
In the home entertainment context, pertaining to the auditory, visual
and haptic sensory modalities. 
Tuning system, musical.
Tympanic membrane (eardrum).
A thin, translucent, elliptically-shaped and slightly concave membrane
at the end of the meatus. The eardrum is made
up of four layers. The outermost layer is continuous with the skin of
the meatus, and the innermost layer is continuous with the mucous membrane
of the middle ear. Of the two inner layers,
the outer layer is composed of radial fibers, while the inner layer is
composed of non-radial fibers. The tympanic membrane attaches to the malleus
(hammer) of the middle ear. , 
Virtual auditory environment.
A perceived auditory environment which has
been manipulated so that it does not correspond to the immediate physical
environment. A trivial example is the use of headphones, which typically
foster the sense of sound originating within the head, while the physical
situation contains two sound sources located on either side of the head.
The phenomenon is which visual perception dominates when visual cues and
other sensory cues--auditory, proprioceptive,
haptic, etc.--are in direct conflict. In audio
design, the effect allows a loudspeaker to be placed at some distance
away from a video display without the audience perceiving the disparity
in location between the visual event generated on the screen, and the
sonic event generated in the distant speaker. There are limits to vision's
tendency to "overpower" the other senses: In the case of audio
design, the limits can be usefully defined in terms of angular disparity,
beyond which the audience "hears" the sonic event as being spatially
distinct from, and thus conflicting with, the locus of the visual event.
In concert hall acoustics, warmth is defined as liveness of the bass,
or fullness of the tone between 75 and 350 Hz, relative to that of the
mid-frequency tones (350 to 1,400 Hz). Musicians sometimes describe as
"dark" a hall that has too strong a bass, or whose high frequencies
are greatly attenuated. 
Discovered by Ernest Heinrich Weber in 1834. States that the smallest
detectable change (jnd) in intensity is a constant
fraction of the level of stimulation. Georg Fechner turned Weber's law
into a psychophysical logarithm of the magnitude of stimulation (I), or
S = k log I. A great deal of psychophysical research has attempted to
establish the Weber-Fechner law for sensory dimensions other than intensity,
e.g., frequency and duration in audition. While the empirical data conform
fairly well to the law over a certain range of values for each dimension,
they can differ substantially at extremes of the range of perceptible
 McAdams, S. & E. Bigand, eds. (1993). Thinking in Sound: The
Cognitive Psychology of Human Audition, Clarendon Oxford.
 Beranek, L. (1996). Concert and Opera Halls: How they Sound.
Woodbury NY: Acoustical Society of America
 Keith Yates
 Bregman, A. S. (1990). Auditory Scene Analysis. Cambridge,
MA: MIT Press.
 Gelfand, S. (1998). Hearing: An Introduction to Psychological
and Physiological Acoustics, 3rd ed. New York: Marcel Dekker.
 Hubel, D. (1988). Eye, Brain and Vision. New York: Scientific
 Dorland's Illustrated Medical Dictionary, 25th ed. Philadelphia:
 Kandel, E., Schwartz, J. & Jessell, T. (1991). Principles
of Neural Science, 3rd ed. Norwalk, CT: Appleton & Lange.
Room Design (sm)
Building your own dedicated A/V room?
Wondering how to get your room & A/V gear to
work with each other?
Questions about Home Theater or Listening Room acoustic
Copyright (c) 1998-2001 Keith Yates Design Group, Inc. All rights reserved.