Friday, September 19, 2008

Spontaneous formant tracking: A point of logic, Part 1

Formant tracking is simply adjusting the vocal tract to a resonant state by attempting to match the frequency of one of the vowel formants to one of the harmonics of the sung pitch.

First, I will define the elements that we are trying to match.

The source: The sung pitch.

When we speak of pitch, we are not only speaking about the fundamental frequency that we are trying to sing, but the fundamental and its endless set of overtones. The overtones are multiples of the fundamental. For example, the pitch G2 (bass low G) is 100 Hz (or 100 vibrations per second). Its overtones would be 200, 300, 400, 500, 600, etc...

The fundamental is labeled F0 traditionally. It is also called the first harmonic or H1. The overtone above it is H2, the next one is H3 and so on.

The filter: The vowel (vocal tract)

Vowels can be defined by their first two formants, which correspond to the vocal tract divided by the tongue:


Photobucket

I altered the picture above to have the formant separation reflected. The red area is the first formant space (F1) and the white is the second formant space (F2). A picture of all the vowels can be seen here. I copied the page and include the picture here below so that you do not have to navigate away from the blog page:

Photobucket

I also include here the formant frequency values. Directly below is what is referred to as formant centers. These are the values given to the frequency pairs that give the most common version (some might say the purest form) of the vowel. They are color coordinated by vowel. The smooth column represents the F1 values and the dotted column represents the F2 values.

Photobucket

However, there is a wide latitude as to what the human ear will hear as a specific vowel in context. The following chart found at the National Center for Voice and Speech is a standard chart that has been used for years. It also shows were certain vowels intersect, that is where a formant pair might be perceived as two different vowels depending on context.

Photobucket



What we are attempting to do is to alter the size of the formant spaces to match the frequency of one of the harmonics of the sung pitch. This is the concept of vowel modification. The following chart which I've developed over the last year (there is one for each vowel spectrum, e.g. [o to u], [i to e], [E to ae]) shows the male passaggio up to tenor high B which corresponds to the female first passaggio and middle voice.

Photobucket

A larger version of this file can be found in PDF form here.

This chart tells us a lot. Before I go further, I must add that vowel modification (formant tracking) is most important where a precarious balance between the cricothyroid muscle and the vocalis muscle exists. Read the next blog entry which should appear in a couple of days for more details. For our purposes let us assume the muscular balance is not precarious.

The [a] vowel is colored dark blue on the chart. What is immediately noteworthy is that we do not see a lot of dark blue on the chart. That means that in the male passaggio and upper range and in the female first passaggio and middle range, a pure form of the [a] vowel is not the best choice. Keep in mind that neighboring vowels will sound like [a] in context when phonation is efficient (i.e. the quality of the vowel is strongly dependent upon a good phonation mode).

The problem in this range is a bigger problem for men than for women. We will discuss the female difficult range a bit later.

For men it is important to know that the range between C4-B4 is not only a problem of resonance (formant tracking) but one of muscular balance.

Many professional singers develop the crico-thyroid dominance that makes it possible to stretch the vocal folds for high notes regardless of resonance adjustments. Therefore, in the best case scenario, resonance tracking is a point of refinement. However, because accurate resonance adjustment takes pressure off of the vocal folds (see previous post on Inertial Reactance), a singer who experiences a precarious muscular balance would benefit from exact formant tracking. Most certainly a singer who has great muscular balance in general becomes a great singer when formant tracking is added into the mix. The voice would sound more consistently resonant and richer.

In the range between C4 and G4 the issues are different depending on whether one is a bass, baritone or tenor. This is because basses, baritones and tenors reach the muscular threshold at different point. The muscular change has been dealt with concurrently with the acoustic (resonance) issue. One of the points of this post is that formant changes sometimes occur before the muscular threshold. Let us take the different archetypal male voice parts one at a time:

1. Basso: Let us say that the muscular change (from vocalis dominance in the low and middle range to crico-thyroid dominance in the upper range) occurs on C4#. A basso will begin to feel the muscular tension around B3b or even A3. The question is whether at that point, the basso should try to access F1 tuning or F2. This brings us to how the singer alters the formant frequencies to tune to F1 or F2. The following rules come from the National Center for Voice and Speech and are scientifically proven and accepted as standard by the voice science community. more details on these rules can be found on the link directly above.

Four Rules for Modifying Vowels

1. All formant frequencies decrease uniformly as the length of the vocal tract increases.

The vocal tract length increases when the larynx lowers.

2. All formant frequencies decrease uniformly with lip rounding and increase with lip spreading.

Lip rounding and lip trumpeting have the same effect (see details on the NCVS page)

3. A mouth constriction lowers the first formant and raises the second formant.

This includes the raising of the tongue principally as in going from the [a] to the [i] vowel whereby the space below the tongue increases (lowering the pitch. Larger spaces have lower pitch) and the space above decreases (raising the pitch. Smaller spaces have higher pitch).

4. A pharyngeal constriction raises the first formant and lowers the second formant.

The reverse of number 3.

In order to follow these rules, we must establish what the default position of the vocal tract should be. I proceed from the following:

The larynx cannot fall to its naturally low position without the jaw being released. The laryngeal position that produces accurately resonance notes in the speaking range (male between 110 and 150 Hz and women between 220 and 260 Hz) should be the default. Therefore:

1. The larynx should maintain that basic low position.
2. The jaw should always return to the [a] position and the tongue and lips should articulate for all changes (consonants and vowels).

If the jaw had to close for vowels and the larynx had to rise, the variables would be too many and since both would narrow and shorten the larynx, the voice would take a thinner quality.

Given the parameters that I have established, the rest is a matter of logic. Let us continue with our basso on the [a] vowel:

C4 is an interesting note. The choice is either to raise F1 (the [a] vowel is the closest) or to round to [Ɔ] (as in "fort") to access F2. F2 is a better choice for the basso, but he is still in his lower register muscularly and might feel more "natural" (more speech-like) to sing the [a] vowel although the resonance might be imprecise and cause the tone to spread. C# fits the [a] vowel perfectly on F2. This is a moment where an F2 change might be better as mentioned earlier even though the muscular is borderline and the singer would still feel comfortable singing the more speech-like F1 resoanance. The vowel of the word "up" fits nicely to continue the second formant tracking through D4 and Eb4. Remember that to keep first formant tracking the lower space must become smaller. This means that the larynx may rise when tongue migration and lip spreading is not enough to accomplish the frequency rise. If the larynx is kept low during this change, it becomes difficult to match the formant with the harmonic in question (which is fixed with a given pitch). Some bassos are able to let the larynx rise and keep a first formant resonance, but they will lose the darker color, which is a basic characteristic of the basso voice type.

E3b is an interesting note. The same vowel (of the word "up") matches both formants. Some basses I hear round the vowel slightly which lowers both formants. This discourages F2, which needs to rise from its center (1180) to meet the 4th harmonic (1264) and lowers F1 from the center (640) to meet the 2nd harmonic (632). There are more options. It is possible that the singer might track the second formant of the schwa, [Ə] (1450), and by rounding bring it closer to the 4th harmonic (1264). It might be less exact but would diminish the competition from F1 since the first formant of the [Ə] (430) is out of range (too distant from the two relevant harmonics: H1 (316) and H2 (612).

In truth, either choice is possible. This is to say that while formant tracking makes for a more continually resonant quality, second formant dominance in the upper range is not as crucial for basses as it would be for baritones or tenors, unless the basso is singing F4-G4. The following graph (large version here) shows the E4b (D4#) as sung by two great basses, Nicolai Ghiaurov and Jerome Hines at "passar nelle tue tasche..." in the famous "coat aria" from Puccini's la Bohème.

Photobucket

The top spectrogram (using the Voce Vista software) is Ghiaurov and the lower is Hines. The green cursor goes through the dominant formant, F2 for Ghiaurov and F1 for Hines. Ghiaurov is clearly dominant on F2 which essentially means that most of the energy of the the sound gathers on the 3rd harmonic (H3). This acoustic "focus" has the positive characteristic of clearly delineating the harmonics and help increase energy in them and in the singer's formant range (between the two orange lines). The singer's formant range is the most sensitive acoustic range for the human ear. The Hines' spectrogram in this instance is less efficient. The acoustic energy is not only spread between F1 and F2 (although they look nearly equal, F1 is slightly stronger) but F1 lies between the fundamental (F0) and the second harmonic (H2) causing another spread of energy between those two harmonics. In short the energy is spread between the first three harmonics, which seems to weaken energy in the singer's formant range. The two Youtube.com clips in question are found below.






Both singers sound wonderful. However it seems my analysis of the chart bears out. Hines seems to favor a rather extremely low larynx. If the larynx was deeper than natural overall, the first formant would be tracked on D4# because the lowered larynx would lower both formants. That slightly depressed larynx it seems was enough to cause some balance problems between the two formants and suppress the strength of the harmonics to the point where there exact frequency is difficult to ascertain at sight. In other words, the acoustic adjustment of the vocal tract is out of phase with the frequencies of the natural harmonic series of the sung pitch.

The main point of this article is that maintaining a naturally low larynx promotes a natural transition from F1 dominance to F2 dominance. The next article will continue this discussion with analysis of the same range relative to baritones, tenors and female voices. I will also discuss the next octave which will deal with the female top voice.

© 09/23/2008 (Date of publication)

16 comments:

Martin Berggren said...

Dear TS,

Thanks for another nice and enlightening blog. The exploratorium plastic vocal model is quite cool! I am already looking forward to the next blog and the dissection of the tenor voice...

One small point that I find somewhat confusing is the association of the first and the second formant with the lower and upper region of the vocal tract, respectively. From that statement, it is easy to believe that the first formant somehow "lives" in the lower and the second formant in the upper portion of the vocal tract, which I believe is slightly misleading.

The simplest model for a vocal tract is a 17 cm piece of tubing, for which the formants can be visualized as in http://hep.physics.indiana.edu/~rickv/Standing_Sound_Waves.html. That is, when a vocal tract is fed with sound of a frequency that is in the vicinity of the first formant, the largest pressure oscillations occurs in the lower half of vocal tract. In that sense, it may be fair to associate the first formant with the lower portion of the vocal tract. However, the vibration pattern for the second harmonic is more complicated, with two maxima for the acoustic pressure, one in the lower part of the vocal tract, and one higher up.

Just above the passaggio, many of the good tenors seem to be able to adjust their vocal tract to obtain a strong F2 resonance that makes H3 very strong. The big question is of course how to posture everything to obtain that effect!

Cheers,

Toreadorssong's Vocal Technique Blog said...

Dear Martin,

You have a point, but I do believe in fact that F1 and F2 correspond respectively to the lower and higher portions of the vocal tract. As for the pressure oscillations, I believe there is a reason for the apparently simpler behavior of F1 rich sounds and the relative complexity of F2-rich sounds.

Lower F0 frequencies in the voice, which coincide with a muscular balance whereby the folds are thick and lax, produce strong low harmonics and relatively weaker upper harmonics. This would explain why notes that are in the lower part of the male passaggio have little strength when they are tuned to be F2 dominant.

In fact I believe that the F1 dominant notes also have energy in the F2 range but it is less discernible. In the passaggio, F1 dominant pitches would probably display strong pressure oscillations in both parts of the vocal tract. It is my strong believe that the passive formant has an effect throughout the voice. However for the reasons mentioned regarding the weakness of high harmonics in low F0, this fact has much greater significance in the passaggio and high range where the energies in both parts of the vocal tract interact (we are speaking about male voices of course).

When we compare close-formant vowels like [a] with open-formant vowels like [i], the relationship of formant frequencies to the tongue's partitioning of the vocal tract is inescapable.

Martin Berggren said...

[TS wrote:]
but I do believe in fact that F1 and F2 correspond respectively to the lower and higher portions of the vocal tract.

Hmm, I do not follow you here! How would they correspond to lower and higher portions of the vocal tract? Formants are the tops in the absolute value (or magnitude) of the input impedance spectrum as measured at the location of the vocal folds. Their definition do not involve locations in the vocal tract. Or do you mean that manipulations of the formant frequencies are associated with changes in these areas? But to me it seems like the rules for formant manipulations are much more complicated than so.

Toreadorssong's Vocal Technique Blog said...

Dear Martin,

I'm interested in what you think the main factors are and would love for you to discuss the science of this as you understand it. You know more about the physics of this than I do. I am going back to basic physics to have a deeper understanding of the acoustic theory involved, but the four rules I wrote regarding vocal tract adjustments are the result of scientific studies by Titze and others before him. If the rules we go by are based on vocal tract modification then yes, the vocal tract changes influences how sound is propagated. The fact that the changes in the partitioning of the vocal tract have a direct effect on formant frequencies (i.e. vowels) has a lot to do with the way the two subspaces of the vocal tract are shaped as well as their relative sizes which is interdependent (with tongue migration, f1 decreases as F2 increases, although I imagine they are not exactly inversely proportional because of other acoustic factors). These high energy frequency bands depend on how the vocal tract is shaped, otherwise why do the formant ranges change with vowel changes?

Martin Berggren said...

Dear TS,

Of course, the shape of the vocal tract determines the formant structure!

The issue I had was not that, but your basis to associate F1 with the lower and F2 with the upper part of the vocal tract. How have you come to that conclusion? The wave pattern that exists in the vocal tract when frequencies around F1 and F2 are excited does not obey such a partition. And it is not the case that F1 and F2 can be independently controlled by manipulating the upper and lower part of the vocal tract.

(Of course, this is a very minor issue that does not in any way diminishes the value of the latest interesting blog!)

Toreadorssong's Vocal Technique Blog said...

Dear Martin,

Don't ever feel a need to soften your comments. You are a treasure and I welcome all commentary with strength. Now I understand what you meant. I will try to find the literature, but I remember a paper (Miller's book might cover this. I have to look) in which the logic for the f1/f2 and vocal tract partitions is discussed.
the four rules set about vocal tract modification go as follow:

1)lengthening of the vocal tract by laryngeal depression lowers both formants but more so the first.

2)Lengthening the vocal tract by lip rounding lowers both formants but more so the second.

3) Tongue migration that constricts the mouth raises the second formant and lowers the first and vice versa.

These rules show a superficial connection between the size of the vocal tract partitions and formant tuning.

You are right. In a way this is coincidental when we consider the complex acoustical formulas that yield formant frequency ranges.

More crucial to the conversation, is the conscious vocal tract adjustments that affect formant tuning. By following the rules we arrive at some predictable possibilities. More in the next post.

Martin Berggren said...

Dear TS,

I had a look in Miller's book (which I recently acquired based on your recommendation!). He presents a simplified model of the vocal tract consisting of four cylinders (laryngeal cavity, tongue constriction, oral cavity, and lips) and he tells in the text something to the effect (I do not have the book in front of me right now) that the laryngeal and oral cavities can be associated with F1 and F2, respectively. In fact, I have not seen that particular statement elsewhere (except in your writings). This idea may however make some sense as a rule of thumb to remember the effects of vocal tract changes. However, I still think it is a misleading idea from a physical point of view: all formants are actually associated with wave motion in the whole vocal tract.

Something I wonder about is the following: Theoretically, based on the general nature of resonances, there should be a maximum for the acoustic particle velocity around the lip when F1 or F2 are strongly resonating. Additionally, there should be a maximum for the acoustic particle velocity somewhere inside the oral cavity (quite deep I think) when F2 is strongly resonating, a maximum that is not present for an F1 resonance. I am curious whether or not the latter maximum can be felt as some kind of vibrational sensation by the singer. I am not yet able to get a nice strong F2 resonance for the notes right above the passagio so I cannot experiment on myself... What do you think?

Toreadorssong's Vocal Technique Blog said...

Dear Martin,

I believe Miller and the scientists he works with have observed that modifications of the lower vocal tract does have a more direct effect on F1 as modification of the upper portion has a more direct effect on F2. This in no way superficially dismisses the complex interaction of sound waves within the vocal tract. The distance that certain waves travel depends on pressure (i.e. /shape/directional) changes throughout the vocal tract. I think it is difficult to come up with a formula that predicts why certain waves experience less friction than others and which wavelengths end up in phase with the glottal opening and which ones are not. Since each vocal tract is different, such a formula would not be universal.

My thought is that the volume ratio of the two parts of the vocal tract and the basic shape of the two parts may encourage a relationship between F1 and the lower part and F2 and the upper part.

Along those lines, I have found that the fundamentals for achieving F2 dominance in the tenor upper range are:

1)a relatively efficient phonation pattern

2)a comfortably low larynx that does not rise at the passaggio point. And this is facilitated by:

3) A released jaw that allows the larynx to remain unfettered. It is my experience that a closed temporomandibular joint (http://en.wikipedia.org/wiki/Temporomandibular_joint)
is tantamount to a slightly raised larynx. From a closed position, the only way to get the larynx lower is to depress it with the back of the tongue, which among other things tends to push on the epiglottis and alter the natural resonance of the vocal tract.

All vowels can be articulated with a released jaw. But our speech habits are against this mode of vowel articulation and so it feels foreign to many singers at first.
Also, singers who have excellent phonation modes are the singers we value regardless of their resonance strategies. Logical formant tracking strategies are not consistent among the singers we value. That is why we have so much opposition to the logic that science dictates. Many singers are fans of Kraus (who is a great model for F1 dominance in the upper range), usually because their voices are of similar weight and they tend naturally to follow the resonance model of their hero.

Toreadorssong's Vocal Technique Blog said...

Martin,

I must add that a change in one formant has an effect on the other. As you noticed, I wrote that maintaining a low larynx will have a positive effect on achieving F2 dominance in the tenor upper range. In the passaggio, if the larynx goes up, the first formant will rise to keep up with the rise of pitch and the F1 dominance will remain. Discouraging the rise of F1 will then bring F2 influence because F2 will be closer to H3 than F1 to H2 or H1 depending on vowel.

Iris said...

How does having the tongue tip either behind the bottom teeth, or on top of the bottom teeth as some teachers advocate, affect the formants?
Thanks
Iris

Martin Berggren said...

[TS wrote:]
I believe Miller and the scientists he works with have observed that modifications of the lower vocal tract does have a more direct effect on F1 as modification of the upper portion has a more direct effect on F2.

All right.

[TS wrote:]
This in no way superficially dismisses the complex interaction of sound waves within the vocal tract. The distance that certain waves travel depends on pressure (i.e. /shape/directional) changes throughout the vocal tract.

Well, it is rather the case that each geometry change in the vocal tract locally generates some reflections. The propagated and all reflected waves interact in a complicated way causing formant strucures.

[TS wrote:]
I think it is difficult to come up with a formula that predicts why certain waves experience less friction than others and which wavelengths end up in phase with the glottal opening and which ones are not.

Well, friction is not much of an issue in the vocal tract...

[TS wrote:]
Since each vocal tract is different, such a formula would not be universal.

Correct, there is no simple formula for how the formants depend on the shape, but the input impedance spectrum (whose tops are the formants) are readily computable (and also measurable) for any given vocal tract. That is, a computer program can compute the formants accurately from any given vocal tract shape. (We have done such computations for brass instruments.)

I have heard the term "released jaw" before, but never really understood the meaning of the term! Thanks for the explanation! I checked Corelli and Björling on Youtube with regards to sign of a "released jaw". Indeed, the released jaw is quite clear for Corelli, and also in Björling's higher register (but not as much in the lower register).

Toreadorssong's Vocal Technique Blog said...

Dear Iris,

The tip of the tongue behind the lower teeth for vowel production is correct because that is the natural at-rest position of the tongue. When the tongue is pulled back, it is a reflection of laryngeal tension of which tongue retraction is a symptom.

In short, when we see the tongue retract it is a sign of tensions that need to be corrected.

Naturally when there is tongue tension, the larynx is not at its released position. A tense larynx has a direct influence on the stiffness of the tongue. In such a case, vowels cannot be produced correctly and/or flexibly. That is certainly a major problem in terms of formant tracking since vocal tract adjustments which are our primary tools for formant tracking would be compromised. The tongue is the most significant modifier of the vocal tract.

Toreadorssong's Vocal Technique Blog said...

Dear Martin,

By friction I meant the interaction of reflecting waves. When waves collide and reduce each other's strength, some interpret that as friction relative to the wave in question.

The formulas work more practically for brass instruments and probably any instrument other than the voice. The distinguishing feature of the voice is its intricately flexible resonator,the vocal tract. While it can be argued that trombones have a very flexible resonance system as well, the system is more linear and therefore predictable. When we think of the many geometric changes that can occur in the vocal tract even during the length of a single note, it becomes impracticable to apply the formulas. Certain the knowledge we gain is significant in terms of our understanding of how the instrument functions.

Martin Berggren said...

[TS wrote:]
By friction I meant the interaction of reflecting waves. When waves collide and reduce each other's strength, some interpret that as friction relative to the wave in question.

Yes, there is interaction, that is, amplification of in-phase waves and damping of out-of-phase waves. But friction, no; that'sa different phenomenon...

[TS wrote:]
The formulas work more practically for brass instruments and probably any instrument other than the voice. The distinguishing feature of the voice is its intricately flexible resonator,the vocal tract. While it can be argued that trombones have a very flexible resonance system as well, the system is more linear and therefore predictable. When we think of the many geometric changes that can occur in the vocal tract even during the length of a single note, it becomes impracticable to apply the formulas.

Well, computing the impedance spectrum for a vocal tract of arbitrary complexity is no problem with today's technology. Already in 1996, Story, Titze, and Hoffman determined vocal tract shapes from MR scans and computed associated formant structures. Today, it is possible to use even more accurate models for the wave propagation (it is probably also done).

Of course, there are modeling differences between the vocal tract and brass instruments, such as the soft walls of the vocal tract that require a model of absorption.

Blue Yonder said...

I just spent the evening catching up on reading your blog, and it was time well spent. I'm incredibly grateful that you share your encyclopedic knowledge with the singing community at large, through your blog and your NFCS posts. You have a gift, and we all benefit from it. Really, you should write a book at some point, if you are ever able to carve out the time from your performing and teaching duties. But if you do write a book, you've got to have a multimedia website to go with it, in order to illustrate your explanations with all of the helpful sound clips, videos, and images you use!

Toreadorssong's Vocal Technique Blog said...

Hello Blue Yonder,

Thank you for your comments. A book will be eventually written. My hope is to use the blog as a starting point. The multimedia part could work well as a CD-Rom or an online resource or both. Be sure I will ask the readership for their input when I decide to formalize this book.

Warmest Regards,

TS/JRL