Caruso on Stage

Ralph Glasgal, http://www.ambiophonics.org

[August 2001. Originally appeared in La Folia 3:4.]

I have lately been revisiting my small, but now growing, collection of very old (circa 1916) acoustic Victor Red Seal vocal 78’s of Caruso, Galli-Curci, Melba, Schumann-Heink, and Tetrazzini. These particular pressings are all single-sided and mostly over eighty years old. Played on an amazingly well preserved Orthophonic Credenza Victrola, manufactured by the Victor Talking Machine Co. of Camden, N. J. in 1926, the results, at least for this listener, whose ears are more accustomed to stereo reproduction or live concerts, are astonishingly and unexpectedly listenable. The Credenza enclosed a folded six-foot logarithmic horn newly invented and licensed to Victor by Bell Labs. This all- acoustic playback machine has none of the coloration or extreme resonances of most other horned players and allows one to really appreciate what these singers of the golden age really sounded like.

Some 60 million of these classical Red Seals were sold throughout the world, before 1923. In 1921 alone, some 55,000,000 Victor 78RPM discs in both 10″ and 12″ sizes were sold, including classical, popular, and ethnic music. Thus, a lot of other music lovers must have also thought they sounded pretty realistic despite the obvious (to us), imperfections such as poor signal-to-noise ratio, limited frequency response and lack of stereo. As a private researcher into psycho-acoustic phenomena and a devoted, but practical, high-end audiophile, I was inspired to take a closer look at what makes these ancient acoustic vocal recordings so listenable even by today’s standards and then use these findings to see what I could do with them in their present-day digital CD reincarnations by applying psychoacoustically correct, very realistic, sound reproduction methods. But first, let us consider the question of how these recordings can sound as good as they do when played on a 70 year old talking machine.

The Ideal Recording Chain

Many, if not most, audiophiles, who feel strongly that LPs outperform CDs, would have no difficulty in subscribing to the following description of the ideal recording/reproduction system:

  • 100% analog recording and reproduction chain.
  • Pinpoint imaging.
  • As few as possible active elements, tone controls, equalizers, or cables in the signal path.
  • Playback volume automatically set identical to the recording site level.
  • 100-year lifetime for the storage medium.

The acoustic recording system, peaked to an extraordinary level of performance (considering the resources they had) by Calvin Child, Caruso himself, and the Victor group in Camden, does meet all the audiophile criteria listed above. Which leads me to paraphrase the old saw that audiophiles should be careful what they wish for because they just might get it. Despite the analog purity that is inherent in the old all acoustic recording process, such recordings fall far short of the audiophile dream for the usual mundane reasons such as noise, distortion, resonances, flutter, etc. One does wonder, however, what purely acoustic recordings made today using modern materials, logarithmic horns and diamond/titanium technology would sound like.

The Psychoacoustics of the Caruso Record

Let us consider what it is that acoustic-era recordings offer that is uniquely and acoustically correct, at least where the vocal part of the sound is concerned. When Caruso stepped up to the horn and sang directly into it, he effectively eliminated any possibility of inadvertently inscribing any significant early reflections or reverberant tails stimulated by his voice in the Victor work space. This is seldom the case with microphoned electrical recordings. The advantage of having a such a dry recording of a solo voice is that when this essentially anechoic recording is played back in a home living room, it sounds relatively more realistic than a piano disc because there is no conflict between the listening room’s character and the recording site’s sonic signature. This factor is especially significant when we are dealing with a small discrete or point sound source, such as a single human voice, that could comfortably fit in the average living room. Furthermore, in the case of an essentially unaccompanied vocal soloist, the lack of a stereo effect is hardly noticed. Since the piano or orchestra could not be squeezed into the horn, the accompaniment on these disks often sounds echoey or unnatural.

When a non-electrical gramophone reproduces a dry acoustic recording of a single voice, the ear-brain system hears a “he is here” sound field that is everyday normal and free of the interaural crosstalk and pinna-angle error that bedevils 60-degree stereo loudspeaker reproduction. In modern recordings, the recorded hall reverberation, unnaturally coming from the front, vitiates a realistic “the soloist is here” effect. Another serendipitous advantage of the early deluxe acoustic gramophones is that the loudness of the reproduction is amazingly close to what one would hear if Caruso were there in the room, sometimes painfully so. I have measured Caruso at the ear-splittingly-high level of 95 decibels six feet from the Credenza playback horn. This combination of an accurate sound level and a realistic acoustic milieu makes for an exciting listening experience, despite noise and distortion, and also explains why early instrumental or choral recordings were less successful. In the latter case, it is not just the restricted frequency response, it is also the lack of separation between the instruments and the halo of recording studio echo that surrounds them that makes them seem less than realistic, especially when monophonically reproduced in the typical home environment.

Spaced Out Caruso

Since my methods have been successful in the past in creating a “you are there” opera-house sound field for modern vocal stereo CDs, I decided to see what could be done with the CD version of the Caruso recordings from RCA, as processed by Stockham, and the Romophone version of the complete Galli-Curci. Using the digital form of these acoustic recordings makes it easier to digitally adjust a sound field that best complements the original recording. The goal was to be there to hear Caruso as he would have sounded if he were singing in a recital hall or other suitable space.

That a stereo recording is not apriori necessary to achieve such realism was demonstrated over 50 years ago by the pioneering G. A. Briggs, who played mono recordings of small chamber groups in real concert halls and fooled a lot of people into thinking they were hearing live musicians. Another way of understanding that stereo is nice but not always essential is to imagine that Caruso and his small band of instrumentalists are alive and sounding off at stage center in Carnegie Hall. Now assume that you have only been able to get a seat in the last row of the top balcony. You would hear almost no left-right separation, but the performance would still, by definition, be real and in this case be a recital to treasure.

Another advantage of psychoacoustically correct sound reproduction, even without stereo, is that defects such as needle scratch or random pops and ticks can be more easily tolerated, just as music at a live concert can be enjoyed, even in the balcony, despite heating system noise, coughing, or program rattling.

Putting Caruso Front and Center

Before we get to the punch line, two more psychoacoustic principles need to be considered. Playing a CD of a mono recording through two widely spaced loudspeakers is begging for audio trouble. The interaural crosstalk, caused when each speaker communicates the same signal to both ears, causes peaks and dips in the response at each ear, which the brain interprets as positional or spatial unreality. Thus when listening to any monophonic recording, it is a good idea to place your loudspeakers as close together as possible or, if this is inconvenient, to turn one of them off. The other problem is pinna-angle error. Launching centrally located sounds from speakers 30 degrees to the left and right of the listener instead of from straight-ahead causes spatial unease in most listeners. Fortunately the cure here is the same as for speaker crosstalk, i.e., move the speakers together or use just one directly in front. Furthermore, as for any audiophile-caliber reproduction, the listening room must be treated to reduce early listening room reflections at the listening position and reduce the room reverberation time to less than two-tenths of a second.

Now that we have a good solid frontal sound source for our flesh-and-blood Caruso, we must create a proper performing space for him to project his voice into. The Nimbus approach of re-recording Caruso records by playing them on a horned gramophone in a live room with a (Ambisonic?) microphone at the far end is a start in this direction, but this approach is not sophisticated enough to synthesize a space that sounds both live and realistic on playback. Studio ambience, if mixed with direct sound coming from front speakers, as in the Nimbus case, is not good enough to fool the brain into sensing a real space that it could be within. Just as surround-sound for movies requires additional rear loudspeakers to create a proper effect, the recreation of concert-hall sound fields also requires additional loudspeakers to the side and rear. The Ambiophonic method uses up to ten ambience speaker pairs, two pairs to output discrete early reflections from the proscenium and sides of the hall and three pairs to output three different but related rearward reverberant fields. For Caruso, I use two JVC XP-A1010s and a Yamaha 3090 to control and formulate all ten pairs of synthesized channels. Unfortunately, the present concentration on surround sound processing has left hall recreation hardware production and design in limbo. However, before too much time passes, I believe the multi-media, virtual-reality-auralization crowd will give us stored hall programs for the PC that will be able to do the job as well as or better than the-difficult-to-obtain JVC. The new Yamaha DSP-A1 seems promising in this regard, but you would need two or better three of them.

Caruso au Naturel

For best results, it is necessary that the hall ambience be related to the ambience of the recording coming from the main front speakers. In our case, the old acoustic recordings and their CD versions are quite dry. This means that setting Caruso in a church, an opera house or a cavernous concert hall is not going to work well. After much experimentation, I preferred Amelita and Enrico singing in a low ceilinged, salon-type space with a reverberation time of about 1 second and standing about twenty feet from my listening position. No direct sound or simple delayed direct sound is fed to any of the ambience speakers and the front very close together speakers are left unprocessed, although with a different DSP computer some early reflections coming from the front speakers, mixed with the direct sound would probably be okay and would allow a larger space to sound natural.

In the end, I have been able to select discrete early reflections and reverberant tails of sufficient accuracy to produce the eagerly anticipated Caruso-on-Stage apparition. The background noise and poor instrumental pickup do become only minor annoyances, as binaural listening theory predicts. When one has heard Caruso, Galli-Curci and others with proper depth cues and in such a realistically intimate recital hall space, one can understand why these singers were so acclaimed by their contemporaries. Using similar techniques on later electrical recordings, I have had really exciting results recreating Marian Anderson as she sounded in 1935, the Toscannini video Beethoven 9th and his Wagner performances with Melchior in Carnegie Hall in 1941.

Unfortunately, the present state of the art in home concert-hall synthesis makes it difficult for the non-technical vintage recording connoisseur to duplicate these results. However, it is hoped when the surround-sound furor has burned itself out, as far as music reproduction is concerned, that new and easy-to-use hall processors will be made available or even if more expensive, that the new multichannel DVD format will be used to produce new editions of treasured recordings, precoded with hall ambience data for what I have termed Ambiophonic playback.

RALPH GLASGAL, B.E.P., M.S.E.E. (IEEE, AES), a Cornell University Engineering Physicist and Electronics Engineer, holds a patent for a stereo dimension control, and designed recording equipment for RCA as well as high fidelity components for Fisher Radio. He has authored many articles and papers for magazines including Stereophile,The Audiophile Voice, Stereotimes, AES Preprints, etc. He is the founder of the Ambiophonics Institute, a not-for-profit sponsor and doer of research on the psychoacoustics of high-fidelity. He is the author of the book, “Ambiophonics, Replacing Stereophonics to Achieve Concert Hall Realism” available, cost free, at http://www.ambiophonics.org/. Glasgal Island in the Antarctic was named for him in recognition of his scientific research on the Aurora Australis during the International Geophysical Year. He is the founder, and past President/Chairman of Datatec Systems Inc. and is a noted authority on wide-area networking and data communications.