upsetting stop series i

2025-11-10

finally getting into our linguistics arc

A stop¹¹ Sometimes m, n, ñ, and ng are called “nasal stops”, but more commonly they’re just “nasals”. I’m not talking about those today. Also, sometimes people don’t count affricates as stops, but I could just use the word “plosive” in that case, and I really would like to have a word for specifically the plosives and affricates. is a consonant like English’s p, b, t, d, ch, j, k, g²² When I don’t specify otherwise, I pretty much always mean the “hard g”, the one that can’t also be written j. If I wanted to talk about “soft g” I’d just call it a j sound., or -³³ That is, the - sound from uh-oh, the glottal stop /ʔ/. It’s the sound א makes when it isn't silent or standing in for /a/.

You can probably ignore this one if it’s confusing you I just find it funny to treat it like a full-on English phoneme occurring in lots of words, like atmosphere (/ˈæʔməsˌfɪɹ/) or the British pronunciation of button (/ˈbəʔn̩/). But okay fine those are actually an allophone of t. And it’s pretty reasonable to say that uh-oh is /ə.ow/ with a hiatus avoidance thing. (British people: exactly how stupid would it sound to you if someone used an r sound in uh-oh?)

However. You, Tue(sday), and ew clearly form minimal pairs /ju/, /tju/, and /ʔju/.

(Unfortunately bolding the letter representing ʔ in ew—that is, the space in front of the word—doesn’t cause an easy-to-notice change.)

Admittedly I don’t think you’ll find very many people other than me transcribing ew as /ʔju/, but like I’m pretty sure that in fact the way I draw a distinction between you and ew is that one starts with /j/ and the other starts with /ʔj/? It’s only one syllable, am I supposed to invent a whole new type of vowel or something when the diphthong really seems like a perfectly normal /ju/ and I can hear a clear [ʔj] at the start of the word?

But even for me, I will happily admit that /ʔ/ is a marginal phoneme at best, with only two reasonable examples. I even use /x/ in more English words than that, if you let me get away with using LaTeX, arχiv, and chi as my examples. Loch, Bach, or Chanukah would be cheating, but chi isn’t really a loan word—the Greek word has the vowel /i/, but I use the vowel /aj/ (like in the English words pi, phi, and psi, which all have an /aj/ sound even though the Greek word has an /i/ sound.)

Substack really makes the difference between x and χ unusually subtle. Maybe this makes actual Greek writing look nice, but that’s not at all what I use χ for.. These can be divided into the affricates (ch and j) and the plosives (everything else). I think the difference between affricates and plosives might be more intuitive to English speakers if they look at examples like the Japanese ts in tsunami or the German z in Zugzwang—to English speakers, these sounds mostly seem like a t followed by an s. According to the teachings of the linguists, ch and j can similarly be thought of as t + sh ([tʃ]) and d + zh ([dʒ])⁴⁴ That is, the sound in fusion or treasure, which doesn't really have a dedicated way to write that’s actually used in English spelling—but using zh (by analogy to sh) is pretty much the only reasonable choice if you don’t want to just use the IPA /ʒ/. I also use zh in garage, beige, and genre, but I think those would make less clear examples—some people use a j there.

…What do you mean some British people say /ˈɡæ.ɹɪdʒ/? That’s so bad., though their t and d is a little further back in the mouth than usual. But I think this is a relatively insignificant distinction, and that your average native English speaker on the street will happily include ch and j with the rest of the stops as one natural class of sounds⁵⁵ Aside from the questions they’ll have about -..

We can arrange these stops into a chart to help understand what’s happening in more detail:

\scriptsize \begin{array}{|l|c|c|c|c|c|} \hline & \text{Bilab.} & \text{Alveolar} & \text{Postalv.} & \text{Velar} & \text{Glot.} \\ \hline \text{Voiceless} & \text{p} & \text{t} & \text{ch} & \text{k} & \text{-} \\ \hline \text{Voiced} & \text{b} & \text{d} & \text{j} & \text{g} & \\ \hline \end{array}

The columns represent places of articulation, or what parts of your mouth are used to make the sound. The bilabials p and b use both lips, the alveolars⁶⁶ In American English, usually this is pronounced /ælˈvi.lɚ/, for some reason. But a schwa or long o can happen, especially in British English. t and d use the tip of your tongue and the alveolar ridge behind your teeth, the postalveolars ch and j use more of the blade of your tongue and are a little further back, the velars k and g use the back of your tongue and your velum or soft palate, and the glottal - uses your glottis⁷⁷ The h sound is described by the IPA as the “voiceless glottal fricative”, which is largely a misnomer—in most languages it’s not a fricative produced with the glottis, though it is voiceless. Voicelessness is kind of its only trait. Your tongue is mostly just in the position for the following vowel when you say /h/. In heat or who the h sound is pretty similar to Japanese’s devoiced vowels in… I don’t know Japanese words… shita or Satsuke. Oh, or desu, osu!, gozaimasu, there’s lots of -su words that sound to an English ear like they just end in [s], but usually (at least outside of rapid speech) still have the [u], just devoiced and reduced. (I think. I’m not a Japanese expert.), the part of the larynx with the vocal cords.

To my intuition, any educated person ought to understand places of articulation on at least a basic level, though I appreciate that this intuition might not be widely shared. However, this mostly isn’t the topic of this particular article—I mainly wanted to be able to sort the stops into pairs.

The rows divide the consonants into “voiced” and “unvoiced”, a distinction typically called “voicing”. A more helpful pair of sounds to understand this distinction might be s and z. The s and z sounds are very similar, with the main difference being that during a z sound, there’s an extra buzzing sort of thing happening in the voice box (larynx), which you can probably feel pretty well if you put a hand on your throat.

For various reasons, people often understand English stops as having a voicing distinction—which isn’t incorrect per se, but it’s too much of an oversimplification for the purposes of this article. So I’m going to introduce the terms fortis and lenis, which are Latin for strong and weak⁸⁸ I sometimes find it a little unintuitive that the voiceless ones are seen as the stronger ones—I also found it kind of unintuitive when my German class described the /z/ in words like sehen as a “soft s”, and the name “sharp s” for ß (which makes an /s/ sound in contests that the letter <s> would make a /z/ sound) is a little odd to me too. It’s not that I can’t see why voicelessness would be the more marked form—vowels and sonorants are almost always voiced, and /s/ for example is pretty acoustically intense in some ways—but I sometimes need to pay attention to not mix it up.

Maybe part of my issue is how “soft th” is the voiceless one and “hard th” is the voiced one..

\scriptsize \begin{array}{|l|c|c|c|c|} \hline & \text{Bilabial} & \text{Alveolar} & \text{Postalv.} & \text{Velar}\\ \hline \text{Fortis} & \text{p} & \text{t} & \text{ch} & \text{k}\\ \hline \text{Lenis} & \text{b} & \text{d} & \text{j} & \text{g} \\ \hline \end{array}

English’s fortis plosives p, t, and k (you can forget about -) are, in many contexts, furthermore given a quality known as aspiration, which we denote with a superscript h.

\scriptsize \begin{array}{|l|c|c|c|} \hline & \text{Bilabial} & \text{Alveolar} & \text{Velar}\\ \hline \text{Fortis} & \text{p}^\text{h} & \text{t}^\text{h} & \text{k}^\text{h}\\ \hline \text{Lenis} & \text{b} & \text{d} & \text{g} \\ \hline \end{array}

This is why it’s still possible to distinguish words like pin and bin, even if you’re whispering (which makes everything voiceless). But in a word like⁹⁹ Consonant clusters aren’t the only case, it can also (at least) happen in some unstressed syllables. I think it pretty much never happens word-initially. I’m not totally sure whether “consonant clusters” + “non-word-initial unstressed syllables” makes a complete list.

Also, note that t in particular does a bunch of weird stuff in a lot of contexts. Sometimes t turns into a glottal stop, and in some contexts t or d can turn into a tapped r (like from Spanish), which is a lot like t losing its aspiration. span, stan, or scan, the aspiration is often dropped—notice that sban, sdan, and sgan wouldn’t be valid English words, because s is fortis and can’t mix with a lenis plosive like that.

The thing is, if you were to edit a file and chop the “s” off the start of someone saying the word span, most English speakers would think they were hearing ban with a b¹⁰! So there’s almost an argument that the aspiration is the more central distinction in English, in that sense that an unaspirated yet voiceless [p] (also known as a tenuis p) will be recognized as the lenis b rather than the fortis p, if it occurs in any context where there can be a minimal pair between p and b. And the voicing contrast being lost in some contexts isn’t even very weird for a Germanic language.

I don’t totally disagree with that take exactly¹¹¹¹ But come back in a couple days for my argument that vowel length is the real distinguishing factor between fortis and lenis consonants in (General American) English., but I think everything slides into place if we just observe that aspiration is sort of just voicelessness deluxe, in a sense:

Hindi speakers, please hold your questions, we will get to them

If you say a plosive slowly, you notice that it starts by stopping the previous vowel (if there was one), and then releases with the start of the next vowel (if there will be one). In this diagram, the zigzagged line represents voicing, the open parts on the left and right represent the vowels before and after a consonant, and the line through the middle represents the stopping itself.

One notices that an aspirated plosive is sort of just like a voiceless plosive that was voiceless for extra long—it leaks into the following vowel. So what I’d really say is, fortis and lenis consonants in English have a voice onset timing (VOT) distinction¹²¹² You might think it is weird to describe voiced stops as even having a “voice onset” which one could consider the timing of. It might help to note that the chart is sort of an oversimplification—voicing isn’t necessarily continuous throughout a whole b d or g, especially not at full strength. But it starts appreciably before the release of the stop, not at the release., this distinction is partially (or even wholly) neutralized in certain contexts, and the contrast is strong enough that the fortis consonants usually fall into the aspirated category¹³¹³ Also, ch might or might not be a partial exception to some of the neutralization patterns—it being an affricate makes the release a little different, but I’m not really sure exactly how extensively that interacts with the aspiration..

Okay, so now (hopefully) you understand a lot of what’s up with the distinctions between the two entries in each of English’s four stop series. And (hopefully) that should also be enough that you can sort of see what’s up when I talk about some other languages. Let’s go:

Mandarin

\scriptsize \begin{array}{|l|c|c|c|c|c|c|} \hline & \text{Lab.} & \text{Den.} & \text{Alv.} & \text{Ret.} & \text{Pal.} &\text{Vel.} \\ \hline \text{Asp.} & \text{p } (\text{p}^\text{h}) & \text{c } (\text{ts}^\text{h}) & \text{t } (\text{t}^\text{h}) & \text{ch } (\text{ʈʂ}^\text{h}) & \text{q } (\text{tɕ}^\text{h}) & \text{k } (\text{k}^\text{h}) \\ \hline \text{Ten.} & \text{b } (\text{p}) &\text{z } (\text{ts}) & \text{d } (\text{t}) & \text{zh } (\text{ʈʂ}) & \text{j } (\text{tɕ}) & \text{g } (\text{k}) \\ \hline \end{array}

So the main thing I want to point out here is that Mandarin pretty much just makes its distinction based on aspiration, rather than on voicing. I think this doesn’t actually present a problem for most English speakers trying to learn Mandarin. Really, the retroflexes ch and zh are probably a bigger issue most of the time. It might also be worth noting that in Mandarin the affricates definitely draw an aspiration distinction.

Also, so I don’t leave anyone with misconceptions: I called c and z dentals—produced with the tongue against the (lower) teeth—mostly because I wanted to shove the affricates and plosives all into two rows and six columns. It’s not that c and z are unreasonable to describe as dentals, exactly. It’s just that t and d would also be rather reasonable to describe as dentals. I labelled q and j as “pal.”, but these are more fully not just palatals but alveolo-palatals, because the t is usually alveolar (though palatalized). And Mandarin doesn’t actually perfectly consistently avoid voicing stops ever, they’re just usually not voiced and it isn’t a key part of the distinction.

Sometimes q and j are analyzed as allophones of the dentals, retroflexes, or velars—historically they come from a merger between the dentals and velars before i and ü. This, the distinction being aspiration rather than voicing, and the knowledge that <e> in a five vowel system really isn’t too far off from a diphthong /ej/ (weigh, hey, bade), can be put together into a full explanation for why Beijing used to be spelt Peking.

Spanish

\scriptsize \begin{array}{|l|c|c|c|c|} \hline & \text{Bilabial} & \text{Dental} & \text{Postalv.} & \text{Velar}\\ \hline \text{Tenuis} & \text{p} & \text{t} & \text{ch} & \text{k}\\ \hline \text{Voiced} & \text{b}\sim\text{β} & \text{d}\sim\text{ð} & \text{y (& ll)} & \text{g}\sim\text{ɣ} \\ \hline \end{array}

Spanish does not aspirate stops. The “voiced stops” are fricatives or even approximants half the time. The letters b and v are identical—β in the chart is a bilabial approximant and could be compared to either v or w—and some countries need to call the letters be grande and ve chica or similar in order for their names to not be homophonous (but Spain calls v uve). Spanish d is sometimes pronounced like English hard th (/ð/), and g sometimes like Dutch’s g, which the International Phonetic Alphabet writes as /ɣ/. Also, Spanish y is not quite English y, being almost comparable to b d and g in some ways.

So overall it seems fair to say that Spanish has an even stronger distinction between its pairs of stops than English does, if you’re even willing to call these “stops”.

Greek

First let’s look at Ancient Greek:

\scriptsize \begin{array}{|l|c|c|c|} \hline & \text{Labial} & \text{Coronal} & \text{Velar}\\ \hline \text{Asp.} & \phi \text{ (p}^\text{h}) & \theta\text{ (t}^\text{h}) & \chi\text{ (k}^\text{h})\\ \hline \text{Tenuis} & \pi\text{ (p)} & \tau\text{ (t)} & \kappa\text{ (k)} \\ \hline \text{Voiced} & \beta \text{ (b)} & \delta\text{ (d)} & \gamma\text{ (g)} \\ \hline \end{array}

(Coronal is a slightly more general term that includes both dentals and alveolars, or anything that uses the tip or blade of the tongue. It can be contrasted with labials that use the lips, dorsals that use the dorsum (back of the tongue), and laryngeals that use the layrnx.)

Ancient Greek had a three way distinction, distinguishing both aspirated from tenuis and tenuis from voiced. The Greek alphabet makes much more sense once you understand this fact. It won’t quite explain ξ or ψ for you, but English has x anyways, so it’s not like we have room to complain about letters that represent consonant clusters.

If you know the Greek alphabet well, you might look at φ and θ and have a question or two. Like, okay, φ being spelt “phi” and English words with Greek etymologies spelling it ph makes a little more sense if φ was an aspirated p, but… ph is pronounced as an f, which is a fricative, not an aspirated stop. And when a word with Greek etymology has a th that comes from a theta, it’s pronounced with English’s fricative th sound, not as a t.

If you’re a normal person, you might be grateful that chi isn’t mysteriously pronounced with a fricative, and notice that this helps explain why ch is usually pronounced with the k sound when it comes from a χ (technology, Christ). But if you’re like me, you’re noticing that ch is a dorsal fricative in both German and Hebrew¹⁴¹⁴ Well, if you transcribe ח as ch, anyways. And sometimes it’s more laryngeal., and wondering why χ doesn’t fit the pattern established by φ and θ.

Well, let’s look at what’s changed in Modern Greek:

\scriptsize \begin{array}{|l|c|c|c|} \hline & \text{Labial} & \text{Coronal} & \text{Velar}\\ \hline \text{Unvoiced fricative} & \phi \text{ (f)} & \theta \text{ (soft th)}& \chi \text{ (x)}\\ \text{Voiced fricative} & \beta \text{ (v)} & \delta\text{ (hard th)} & \gamma \text{ (ɣ)} \\ \hline \text{Unvoiced stop} & \pi\text{ (p)} & \tau\text{ (t)} & \kappa\text{ (k)} \\ \text{Voiced stop} & \mu\beta \text{ (b)} & \nu\delta\text{ (d)} & \gamma\kappa\text{ (g)} \\ \hline \end{array}

So, χ in fact also became a fricative. As did β, δ, and γ! We looked at Spanish, so that sort of thing shouldn’t come totally out of left field.

But Greek didn’t actually lose the b, d, and g sounds! They’re just written with nasal consonants now, for some reason! Except for g, which is γκ!

It’s a little less weird than it looks—the voiced stops in Greek are sometimes (though not always) at least a little prenasalized, which is sort of like saying mb or nd instead of just saying b or d, except maybe without the nasalized part being pronounced quite as fully as if there were an actual m or n there. Swahili and other Bantu languages do a lot of this.

Hindi

\scriptsize \begin{array}{|l|c|c|c|c|c|} \hline & \text{Lab.} & \text{Den.} & \text{Ret.} & \text{Pal.} & \text{Vel.}\\ \hline \text{Aspirated} & \text{ph} & \text{th} & \text{ṭh} & \text{ch} & \text{kh}\\ \hline \text{Voiceless} & \text{p} & \text{t} & \text{ṭ} & \text{c} & \text{k} \\ \hline \text{Voiced Asp.} & \text{bh} & \text{dh} & \text{ḍh} & \text{jh} & \text{gh} \\ \hline \text{Voiced} & \text{b} & \text{d} & \text{ḍ} & \text{j} & \text{g} \\ \hline \end{array}

(Urdu is pretty similar, but also has a voiceless uvular q.)

This chart may raise some questions. For example, it has a row labelled “Voiced Asp.”, but earlier we said:

Aspiration is sort of just voicelessness deluxe.

which might make the idea of a “voiced aspirated” consonant a little confusing. Basically, “voiced aspirated” is at least arguably a sort of misnomer. You see, voicing can’t really be represented with a boolean “is it voiced or is it unvoiced”. To start voicing, you tense the glottis, and the glottis can be tensed by various amounts.

A totally lax glottis results in a voiceless sound. A glottis tensed just the right amount to allow the maximum vibration results in a voiced sound, or more specifically a modally voiced sound. A fully tensed glottis… would just be holding your breath, but going a little short of that produces vocal fry.

Hindi’s voiced aspirateds are somewhere between unvoiced and voiced—they specifically use breathy voice (or murmured voice). So if you think of aspiration pretty loosely and say it just describes some sort of move towards the “less glottal tension” part of the spectrum (and normal aspirated stops need to start increasing the voice onset timing to be further along that spectrum than tenuis stops), then it pretty much makes sense to see breathy voice as sort of like an aspirated voiced stop. But it isn’t quite the same exact thing.

This post is titled upsetting stop series i, so I’d like to be clear that I don’t really find any of these stop series all that upsetting. Or at least, I don’t anymore, I’m pretty sure I used to.

Like, Mandarin having four coronal stop series maybe seems like a little much, that’s a whole lot of t~ch-ish sounds, but it actually helps a lot that the alveolo-palatals are in complementary distribution with the dentals. And sure, I’d struggle to personally draw a three way distinction like Ancient Greek does, much less a four way distinction like Hindi. But really these are actually like, not all that unreasonable, even if I don’t personally totally get retroflexes yet.

No, come back another time and we’ll talk about languages like Korean and Basque. Maybe we can also go over that neat thing Irish and Russian do, even though I actually like that one and it makes a lot more sense than Korean or Basque.

Sometimes m, n, ñ, and ng are called “nasal stops”, but more commonly they’re just “nasals”. I’m not talking about those today. Also, sometimes people don’t count affricates as stops, but I could just use the word “plosive” in that case, and I really would like to have a word for specifically the plosives and affricates.
↩
When I don’t specify otherwise, I pretty much always mean the “hard g”, the one that can’t also be written j. If I wanted to talk about “soft g” I’d just call it a j sound.
↩
That is, the - sound from uh-oh, the glottal stop /ʔ/. It’s the sound א makes when it isn't silent or standing in for /a/.
You can probably ignore this one if it’s confusing you I just find it funny to treat it like a full-on English phoneme occurring in lots of words, like atmosphere (/ˈæʔməsˌfɪɹ/) or the British pronunciation of button (/ˈbəʔn̩/). But okay fine those are actually an allophone of t. And it’s pretty reasonable to say that uh-oh is /ə.ow/ with a hiatus avoidance thing. (British people: exactly how stupid would it sound to you if someone used an r sound in uh-oh?)
However. You, Tue(sday), and ew clearly form minimal pairs /ju/, /tju/, and /ʔju/.
(Unfortunately bolding the letter representing ʔ in ew—that is, the space in front of the word—doesn’t cause an easy-to-notice change.)
Admittedly I don’t think you’ll find very many people other than me transcribing ew as /ʔju/, but like I’m pretty sure that in fact the way I draw a distinction between you and ew is that one starts with /j/ and the other starts with /ʔj/? It’s only one syllable, am I supposed to invent a whole new type of vowel or something when the diphthong really seems like a perfectly normal /ju/ and I can hear a clear [ʔj] at the start of the word?
But even for me, I will happily admit that /ʔ/ is a marginal phoneme at best, with only two reasonable examples. I even use /x/ in more English words than that, if you let me get away with using LaTeX, arχiv, and chi as my examples. Loch, Bach, or Chanukah would be cheating, but chi isn’t really a loan word—the Greek word has the vowel /i/, but I use the vowel /aj/ (like in the English words pi, phi, and psi, which all have an /aj/ sound even though the Greek word has an /i/ sound.)
Substack really makes the difference between x and χ unusually subtle. Maybe this makes actual Greek writing look nice, but that’s not at all what I use χ for.
↩
That is, the sound in fusion or treasure, which doesn't really have a dedicated way to write that’s actually used in English spelling—but using zh (by analogy to sh) is pretty much the only reasonable choice if you don’t want to just use the IPA /ʒ/. I also use zh in garage, beige, and genre, but I think those would make less clear examples—some people use a j there.
…What do you mean some British people say /ˈɡæ.ɹɪdʒ/? That’s so bad.
↩
Aside from the questions they’ll have about -.
↩
In American English, usually this is pronounced /ælˈvi.lɚ/, for some reason. But a schwa or long o can happen, especially in British English.
↩
The h sound is described by the IPA as the “voiceless glottal fricative”, which is largely a misnomer—in most languages it’s not a fricative produced with the glottis, though it is voiceless. Voicelessness is kind of its only trait. Your tongue is mostly just in the position for the following vowel when you say /h/. In heat or who the h sound is pretty similar to Japanese’s devoiced vowels in… I don’t know Japanese words… shita or Satsuke. Oh, or desu, osu!, gozaimasu, there’s lots of -su words that sound to an English ear like they just end in [s], but usually (at least outside of rapid speech) still have the [u], just devoiced and reduced. (I think. I’m not a Japanese expert.)
↩
I sometimes find it a little unintuitive that the voiceless ones are seen as the stronger ones—I also found it kind of unintuitive when my German class described the /z/ in words like sehen as a “soft s”, and the name “sharp s” for ß (which makes an /s/ sound in contests that the letter <s> would make a /z/ sound) is a little odd to me too. It’s not that I can’t see why voicelessness would be the more marked form—vowels and sonorants are almost always voiced, and /s/ for example is pretty acoustically intense in some ways—but I sometimes need to pay attention to not mix it up.
Maybe part of my issue is how “soft th” is the voiceless one and “hard th” is the voiced one.
↩
Consonant clusters aren’t the only case, it can also (at least) happen in some unstressed syllables. I think it pretty much never happens word-initially. I’m not totally sure whether “consonant clusters” + “non-word-initial unstressed syllables” makes a complete list.
Also, note that t in particular does a bunch of weird stuff in a lot of contexts. Sometimes t turns into a glottal stop, and in some contexts t or d can turn into a tapped r (like from Spanish), which is a lot like t losing its aspiration.
↩
See the relevant video from the excellent Geoff Lindsey:
↩
But come back in a couple days for my argument that vowel length is the real distinguishing factor between fortis and lenis consonants in (General American) English.
↩
You might think it is weird to describe voiced stops as even having a “voice onset” which one could consider the timing of. It might help to note that the chart is sort of an oversimplification—voicing isn’t necessarily continuous throughout a whole b d or g, especially not at full strength. But it starts appreciably before the release of the stop, not at the release.
↩
Also, ch might or might not be a partial exception to some of the neutralization patterns—it being an affricate makes the release a little different, but I’m not really sure exactly how extensively that interacts with the aspiration.
↩
Well, if you transcribe ח as ch, anyways. And sometimes it’s more laryngeal.
↩