Reprinted from MCS Review, Vol.6, No.1, Summer 1984.
The title, as well as the philosophy, of this contribution are taken with the sincerest flattery from Larry Clifton's Spring 1984 editorial in which he discusses what is so special about surround sound, and in amazingly few words analyzes so many of the reasons for the old resistance to taking the step from mono to stereo, and now from stereo to surround sound. He is surely right that it comes from people who are content with sound 'out there', the whole picture being complicated by keeping up with the Vanderbilts, and by those who spend all their time listening to equipment and little of it listening to music.
What then is so special about surround sound? First we have to be sure what we are talking about, and here it is not sufficient to call things what you will (or even how you like it), because different words and phrases do actually mean different things.
Surround sound is properly the generic name for all systems which extend the sound stage from the stereo restriction of a frontal sector (usually not exceeding 60°) so as to surround the listener with possible directions from which sound may be heard. There are two stages, first horizontal surround (for which the specific name is pantophony), and full spherical surround (known as periphony), which provides information about height as well as horizontal direction.
Multi-channel surround means exactly what the words say, namely that there is a multiplicity of audio channels, usually taken to mean more than two, between the source and the listener. Thus the famous Philadelphia-Washington relay of 1933 was multi-channel, but certainly not surround since reproduction was confined to a procenium stage. Nor is multi-channel necessary in order to achieve surround reproduction; obviously it cannot be done with only one channel, but two are quite sufficient (see the later remarks on UHJ).
'Quadraphonics' is a name of dubious etymological ancestry referring to a particular set of early attempts at surround sound. Unfortunately, it took a wrong trail and became bushed. It assumed that surround sound had to come from exactly four loudspeakers, and, by confusing loudspeakers with channels, concluded that 'four channels' were needed. Since, in general and particularly in the stereo media of recording and broadcasting in widespread use, four channels are not available, a method was invented of pretending to have four channels when one did not.
These mythical extra channels were said to be 'matrixed', which of course makes no sense. 'Matrix' simply means to mix, so one can mix or matrix signals, but not channels. Channels can be multiplexed as is done to realize the two channels of FM stereo broadcasting, and as was done in the Nippon-Columbia UD-4 and JVC CD-4 carrier-disc records, but that is a separate issue. Finally it was pretended that one could achieve the mathematical impossibility of inverting a 4 x 2 matrix and thus recover 'the original channels' (actually signals, and not the original signals, but derivatives).
In formal logic it is proved that from any false premise any false conclusion can be derived; the standard exercise to illustrate this is of the form 'given three equals five, prove that the Dragon Lady is President of the United States.' 1 The 'four channel' fallacy thus engendered a chain of consequent errors. Despite these technical shortcomings, some so-called 'quadraphonic' material has merit, so that it is worthwhile for the enthusiast to possess this material and means of replaying it. Nevertheless this approach to surround reproduction failed in the marketplace at large, with heavy losses to the audio industry which have left the industry with a very negative attitude to surround sound in general.
This is bad enough, but perhaps the worst misdemeanor of 'quadraphonics' has been to get everyone thoroughly muddled. Perhaps the Philadelphia-Washington relay, excellent though it was in its day, also has something to answer for as the ancester of some of the misconceptions.
When one has a dense array of microphones on the stage at Philadelphia coupled to corresponding loudspeakers on a stage in Washington, there is indeed a one-to-one correspondence between microphones, signals, channels and speakers, and little harm in using these names interchangeably, as 'quadraphonics' did; but this naive identification is not good enough for demands of a practicable surround-sound technology. It has already been noted that the Philadelphia-Washington relay was not surround sound, and in essence it was a spatial rather than a directional system; the audience in Washington was presented with the sounds crossing the Philadelphia procenium in approximately corresponding positions on the Washington stage, and any directional effect was a result of the geometry of the receiving hall. It was (to repeat) essentially positional not directional information that was conveyed, and this is not an approach that can be extended to domestic surround reproduction.
In order to make progress, it is necessary first to see that there is no necessary one-to-one correspondence between microphone outputs, channels and loudspeaker-feed signals, or any pair of these. Moreover we do not (whatever some comedians would have us believe) live in a square world. What needs to be encoded is not a set of 'four channels' (or even signals) arbitrarily associated with corner positions, but direction itself.
Thus the modern surround-sound technology uses what is mathematically called kernel encoding of directions. This is not difficult, even using only two channels. Indeed, since a pair of channel-signals may differ both in amplitude and in phase, and these differences may be associated mathematically with horizontal and vertical angles, two channels are all that are needed to convey full spherical surround information.
Two channels would indeed suffice for full with-height periphonic surround if at each instant there were only one source of sound; with the reservation that it would require what is properly called a signal-actuated parametrically (SAP) variable decoder (sometimes called 'logic' or 'variomatrix' in 'quadraphonic' parlance). This is a big reservation. A fixed linear decoder has the protection of the additive property governing all linear systems; this ensures that if such a decoder works for any sound source acting alone, it will work equally well when they all sound together. Variable decoders do not enjoy the protection of this theorem, and, although this lack can be mitigated by stratagems such as band-splitting, two-channel periphony is hardly practicable but horizontal surround is.
The possibilities of surround sound therefore begin with two-channel horizontal surround, and the first step in achieving this is the acceptance that it is impossible (except perhaps at the lowest frequencies) to reproduce the original physical soundfield. The aim has to be restricted to giving the listener the subjective impression of sound coming from the intended direction. This essentially involves the mechanisms of hearing, and particularly the ways in which the ear locates sound direction.
The decoder has therefore to be psychoacoustically designed. Its job is to take the directional information contained in the signals it receives and to generate loudspeaker-feed signals which, after amplification, will cause the loudspeakers to radiate sounds which combine to deceive the ear into hearing sounds in the right places.
Notice here two important differences fran the approach and aim of earlier surround-sound attempts. First there is a carefully tailored inter-relation between the sounds coming from the speakers, so that these sounds mix and cooperate to give the listener the desired illusion of direction. When this works well, the individual speakers become 'invisible' (or perhaps one should call it inaudible) as discrete sources of sound; the listener wishes, after all, to hear the directional character of what happened in the studio or concert hall, not to be confined to hearing nothing more interesting than where the loudspeakers happened to be placed in his own room. This is all in direct contrast to the 'quadraphonic' emphasis on 'separation' which tends to make the location of the speakers obtrusive.
The second major distinction is recognition that correct speaker-feeds depend on the size and shape of the listener's loudspeaker layout, and of course on the number of loudspeakers used (there is no restriction to four speakers, although it can be proved that this is the minimum practicable number). Thus there is no question of pretending that the decoder 'recovers' or 'reconstructs' non-existent signals. Correct loudspeaker-feed signals cannot exist at any earlier point in the chain than the listener's own room, since only here is the necessary information about loudspeaker layout available. To put it another way, the decoder receives directional information and interprets it according to the layout parameters to which it has been set; it is not supposed to recreate what has never existed, and which logically cannot previously have existed for want of necessary information.
Ambisonics conforms accurately to these modern requirements for surround reproduction, but does not otherwise restrict the way in which it can be used according to the taste and circumstances of the user. It is thus more like a marine chart than a course to steer; it shows the features of the ocean of possibilities, but does not dictate which port should be visited. It is therefore correct to refer to ambisonics as a technology, within which a number of specific systems can be realized according to need.
For the initial step of encoding direction, the ambisonic format recommended for professional purposes is 'B-format', in which the signals represent respectively the front-back, omni-directional, left-right and up-down components; if height information is not required, the up-down signal may be omitted. For consumer use, the UHJ formats form a compatible series beginning with two-channel, known strictly as BHJ, to which further signals can be added successively when the extra channels are available, thus enabling full advantage to be taken of modern multi-channel media, whether analog or digital.
Figure 1. The Ambisonic UHJ hierarchical system of encoding and decoding directional sound.
Since the ear uses predominantly different mechanisms of location in different frequency ranges, ambisonic decoders are frequency-dependent in order to give a uniform and consistent subjective impression over the audio range. They have of course simple means of being set to the size and shape of the loudspeaker layout, and numbers of loudspeakers other than four can be accommodated.
William Sommerwerck's review of the Minim AD10 Ambisonic Decoder (Spring 1984) is first and foremost a tribute to Minim's excellent product, but is also an indication of the extent to which the underlying ambisonic technology succeeds in fulfilling the objectives that have been described. It is therefore encouraging that the review emphasizes the stability of location, the realism of sound, and the coherence between direct and ambient sounds.
This of course applies primarily to the decoding of UHJ material. There are at present some 200 such recordings commercially issued. In one sense this is quite a lot, but of course only a small fraction of the whole record output. It is therefore relevant to ask whether ambisonic decoding can be used to enhance playback of stereo material, and indeed this is possible. The result naturally falls short of playback of UHJ material, and results must inevitably be variable because there is no precise engineering specifications for what we call 'stereo', which is a way of encoding frontal directional information defined largely empirically, and therefore varying somewhat from one studio and one recording to another. Nevertheless for the majority of material the effect can be pleasing, with enhanced sense of realism and involvement.
The review praises the AD-lO's stereo decode mode for the naturalness of the effect, saying that the direct and ambient sounds form a seamless whole, with no sense of the ambience being tacked on. It adds that it does the best job so far of any non-delay device in extracting ambience; and it is indeed correct that ambisonic stereo-decode brings out ambience present in the material being reproduced. The qualification about delay devices seems a little strange, since, however pleasing such devices may be, their action is inherently to tack on a reverberant effect of their own rather than to extract it from the program material; much the same is true of artificial reverberation added in the studio.
Ambisonic decoders, including the AD-lO, can offer two further facilities about which there has been some puzzlement. The names originally given to these were 'dominance' and 'preference'. This terminology may not be ideal, and has incurred disapprobation from some journalists, but it does at least have the merit of being accurate, which in the end is the best way of avoiding confusion. Efforts to find alternative names seem so far to have been counter-productive, and evidently we need to keep on searching. It is probably most helpful to begin the explanation by stating what these facilities actually do.
In the imperfect world of technology, compromise is continually needed; for example, between quiet and loud passages in the setting of recording level. The basic design of two-channel ambisonic decoders is carefully balanced so as to give good all-around performance, but it may sometimes be wished to give preference to one thing over another. In particular, for many kinds of orchestral, choral or chamber music, a slightly more pleasing result may be secured by setting the parameters of the decoder to favor direct frontal sounds, since indirect sound from the rear and sides carries less information. This can be done by means of the 'front preference' control (to give it its original name, which in context does express its function in a natural way).
This is the control Minim has chosen to call 'focus', a term that does carry a correct implication of sharpness but misleads by suggesting that a particular distance, rather than a particular direction, is favored. It is odd that the reviewer thought the effect 'phasey', since objectively phase differences are reduced in the direction to which preference is given. Perhaps something was not quite right, but the preference control does seem to provoke personal perferences, and in any case its use is optional. The ambisonic technology also provides for preference to be exercised in any direction, and automatically in accordance with the dominant signal (with or without band-splitting) at each moment, giving the ambisonic implementation of SAP decoding, known for obvious reasons as Variable Directional Preference (VDP). VDP decoders are likely to be useful mainly for program material such as drama, special effects and rock.
The control originally called 'dominance' produces an effect similar to that of moving the recording microphone either forward so as to be nearer the front sound stage, or else further back away from it. (Sideways dominance is also possible, but seldom required.) Thus when front-dominance is selected, frontal sounds dominate both by becoming louder and by spreading out and becoming apparently larger. The name 'zoom' has been suggested, but this is positively misleading because, in photography, zoom, as is only too well known, magnifies the image without being able to make a matching adjustment in perspective, so that the apparent perspective is distorted, as when athletes on television appear to be pounding down the track without getting any further away from the high-jumper waiting in the background. It is just such falsification that dominance is designed to avoid by its simultaneous adjustment of loudness and spread. This is presumably the facility Minim calls 'position', but surely one expects a control so labeled to alter the position of something.
So, to return to the original question, what is so special about surround sound? The question may perhaps be answered in two ways. Objectively, it is one more step in the now century-long struggle to overcome distortion in reproduced sound, in this case directional distortion. Subjectively, it is a further step in enabling the listener to become more involved in the program material and to hear more accurately what is happening in the performance. Writing in the British publication Studio Sound (June 1984, page 70), John Whiting of October Sound places mono, stereo, horizontal surround, periphony and real live sound in a list, each member of which becomes unsatisfactory when compared with the next. This is a view the writer shares, but it is not perhaps for those who prefer their sound 'out there'; perhaps such people have had few opportunities to come to appreciate sound that is not 'out there', for are not many 'stereo' recordings really a multi-mono mix from spot mono microphones?
As a postscript not specific to surround sound, but relevant to it, Mr. Sommerwerck mentions that records 'spit' much less when thoroughly cleaned, fully in accord with the present writer's experience. Almost all the nasties that have not been embedded in the disc right in the pressing plant respond to putting the record under the tap and scrubbing it (there is some art hidden behind this bald statement, but in a literal sense it is true). Also, is there really evidence that 'static' is the cause of the ills blamed on it?
1No marks will be awarded for proving that the Dragon Lady is the Prime Minister of Great Britain.
Professor Peter Fellgett is Head of the Department of Cybernetics at the University of Reading in Great Britain. In the early 1970s, Michael Gerzon and he developed the concepts and mathematics which describe the Ambisonic surround-sound system.
Reprinted from MCS Review, Vol.6, No.1, Summer 1984.
Copyright © 1984, 2000 by Laurence A. Clifton
Last updated: April 9, 2006
MCS Review On-Line Reprints