# Formal semantic analysis of natural language quantifiers

Natural language quantifiers are an interesting subset of words in that it is possible to define them formally using set theory, by taking them to be binary relations between sets. For example, here are the formal definitions of some English quantifiers.

• “every” is the binary relation $\forall$ between sets such that for every pair of sets $A$ and $B$, $\forall(A, B)$ if and only if $A \subseteq B$. For example, the sentence “every man is in the room” is true if and only if the set of all men is a subset of the set of everything in the room.
• “some” (in the sense of “at least one”) is the binary relation $\exists$ between sets such that for every pair of sets $A$ and $B$, $\exists(A, B)$ if and only if $A \cap B$ is non-empty. For example, the sentence “some (at least one) man is in the room” is true if and only if the set of all men and the set of everything in the room have at least one member in common.
• More generally, each natural number $n$ (as an English word, in the sense of “at least $n$”) is the binary relation $\exists n$ between sets such that for every pair of sets $A$ and $B$, $\exists n(A, B)$ if and only if $|A \cap B| \ge n$. (We are assuming here that the number is not interpreted to be exhaustive, so that the statement “two men are in the room” would still be seen as true in the case where three men are in the room.) For example, the sentence “two men are in the room” is true if and only if the set of all men and the set of everything in the room have at least two members in common.
• “most” (in the sense of “more often than not”) is the binary relation $M$ between sets such that for every pair of sets $A$ and $B$, $M(A, B)$ if and only if $|A \cap B| \ge |A \setminus B|$.

Admittedly, some natural language quantifiers, like “few” and “many”, cannot be satisfactorily defined in this way. But quite a lot of them can be, and I’m going to just focus on those that can be in the rest of the post. From now on you can take the term “natural language quantifiers” to refer specifically to those natural language quantifiers that can be given a formal definition as a binary relation between sets.

Now, once we have taken this approach to natural language quantifiers, an interesting question arises: which binary relations between sets correspond to natural language quantifiers? Clearly, no individual language could have natural language quantifiers corresponding to every single binary relation between sets, because there are infinitely many such binary relations, and only finitely many words in a given language. In fact, we can be quite sure that the vast majority of binary relations between sets will never correspond to natural language quantifiers in any language, because most of them are simply too obscure. Consider, for example, the binary relation $R$ between sets such that for every pair of sets $A$ and $B$, $R(A, B)$ if and only if $A$ is the set of all men and $B$ is the set of all women. If this corresponded to an English quantifier, which might be pronounced, say, “blort”, then the sentence “blort men are women” would be true, and every other sentence of the form “blort Xs are Ys” would be false. I don’t know about you, but I can’t think of any circumstances under which such a word would be of any use in communication whatsoever.

Another problem with our supposed quantifier “blort” is that it can’t reasonably be called a quantifier, because its definition has absolutely nothing to do with quantities! You probably know what I mean here, but it’s worth trying to spell out exactly what it is (after all, the whole point of formal analysis of any subject is that trying to spell out exactly what you mean often leads to interesting new insights). It seems that the problem is to do with the objects and properties that are referred to in the definition of “blort”. Our definition of “blort” refers to the identities of the two arguments $A$ and $B$—it includes the phrases “if $A$ is the set of all men” and “if $B$ is the set of all women”. But the definition of a quantifier should refer to quantities only, not identities. From the point of view of set theory, “quantity” is just another word for “cardinality”, which means the number of members a given set contains. So perhaps we should say that the definition of a natural language quantifier can only refer to the cardinalities of the arguments $A$ and $B$. This is still not a proper formal definition, because we have not been specific about what it actually means for a definition to “refer” to the cardinalities of the arguments only. If we take the statement very literally, we could take it to mean that the definition of a natural language quantifier should be a string consisting only of the substrings “$|A|$” and “$|B|$” (with $A$ and $B$ replaced by whichever symbols you want to use to refer to the two arguments), interpreted in first-order logic. But that’s ridiculous, and not just because such a string would evaluate to a natural number rather than a truth value. In order to find out what the proper constraints on the string should be, let’s have a look again at the definitions we gave above.

• For “every”, we have that $\forall(A, B)$ if and only if $A \subseteq B$, or, equivalently, $|A \setminus B| = 0$.
• For each natural number $n$, we have that $\exists(A, B)$ if and only if $|A \cap B| \ge n$.
• For “most”, we have that $M(A, B)$ if and only if $|A \cap B| \ge |A \setminus B|$.

In order for these to count as quantifiers, our definition must allow us to compare the cardinalities as well as refer to them. We also need to refer to the cardinalities of combinations of the two arguments of $A$ and $B$, such as $A \cap B$ and $A \setminus B$, as well as $|A|$ and $|B|$. And, although none of the definitions above involve the logical connectives $\wedge$ (AND) and $\vee$ (OR), we will need them for more complex quantifiers that are formed as phrases, such as “most but not all”.

The question of exactly which combinations of sets we need to refer to is quite an interesting one. Given our two arguments $A$ and $B$, we can see all the possible combinations by drawing a Venn diagram:

There are four disjoint regions in this Venn diagram, corresponding to the sets $A \cap B$, $A \setminus B$, $B \setminus A$ and (not labelled, but we mustn’t forget it) $U \setminus (A \cup B)$ (where $U$ is the universal set). We also might need to refer to regions that are composed of two or more of these disjoint regions, but such regions can be referred to by using $\cup$ to refer to the union of the disjoint regions.

But do we need to be able to refer to each of these disjoint regions? Note that in the definitions above, we only needed to refer to $|A \setminus B|$ and $|A \cap B|$, not to $|B \setminus A|$ and $|U \setminus (A \cup B)|$. In fact, it is thought that these are the only two disjoint regions that definitions of natural language quantifiers ever need to refer to. Quantifiers which can be defined without reference to $|U \setminus (A \cup B)|$ are called extensional quantifiers, and quantifiers which can be defined without reference to $|B \setminus A|$ are called conservative quantifiers. So now, if all this seems like pointless formalism to you, you might be relieved to see that we can make an actual falsifiable hypothesis:

Hypothesis 1. All natural language quantifiers are conservative and extensional.

To give you a better sense of exactly what it means for a quantifier to be conservative or extensional, let’s give some examples of quantifiers which are not conservative, and not extensional.

• Let $NE$ be the binary relation between sets such that for every pair of sets $A$ and $B$, $NE(A, B)$ if and only if $|U \setminus (A \cup B)| = \emptyset$. For example, if we suppose $NE$ corresponds to an English quantifier “scrong”, the sentence “scrong men are in the room” is true if and only if everything which is not a man is in the room (it’s therefore identical in meaning to “every non-man is in the room”). “scrong” is conservative, but not extensional.
• Let $NC$ be the binary relation between sets such that for every pair of sets $A$ and $B$, $NC(A, B)$ if and only if $|B \setminus A| = \emptyset$. For example, if we suppose $NC$ corresponds to an English quantifier “gewer”, the sentence “gewer men are in the room” is true if and only if there is nothing in the room which is not a man. “gewer” is extensional, but not conservative.

Wait a minute, though! I don’t know if you noticed, but “gewer” as defined above has exactly the same meaning as a real English word: “only”. The sentence “only men are in the room” means exactly the same thing as “gewer men are in the room”. (It’s true that we can say “only men are in the room” might just mean that there are no women in the room, not that there is nothing in the room that is not a man—there could be furniture, a table, etc. But “only” still has the same meaning there—it’s just that the universal set is taken to be the set of all people, not the set of all objects. In semantics, the universal set is understood to be a set containing every entity relevant to the current discourse context, not the set that contains absolutely everything.)

Does that falsify Hypothesis 1? Well… I said “only” was a word, but I didn’t say it was a quantifier. In fact, the people who propose Hypothesis 1 would analyse “only” as an adverb, rather than a quantifier. I guess this makes sense considering “only” has the “-ly” suffix. But that’s not proper evidence. Some people have argued that “only” cannot be a determiner (and hence cannot be a quantifier) based on syntactic evidence: “only” does not pattern like other determiners. The example I was given at university was the following sentence:

The girls only danced a tango.

Here, “only” occurs in front of the VP, rather than the NP, hence it must be a determiner.

I’m sure my lecturer could have given better evidence, but he was just pressed for time. But the obvious problem with this argument is that there is a well-known group of determiners which can appear in front of the VP, rather than the NP: the “floating” quan tifiers, such as “all”:

The girls all danced a tango.

Anyway, I remain not totally convinced that “only” is not a quantifier, and, taking a very brief look at the literature, it seems like a far-from-uncontroversial topic, with, for example, de Mey (1991) arguing that “only” is a determiner, after all (although I don’t really understand its argument, having not read the paper very carefully). Payne (2010) mentions that “only” should be seen as a kind of adverb-quantifier hybrid, which I guess is probably the best way to think about it, although it is kind of inconvenient if you’re trying to analyse these words in a formal semantic approach.

I wonder if there are any words in natural languages which have ever been analysed as non-extensional quantifiers. Google Scholar doesn’t turn up anything on the subject.

In any case, perhaps the following weakened statement of Hypothesis 1 is more likely to be true.

Hypothesis 1. In every natural language, the words that can be analysed as non-conservative or non-extensional quantifiers will exhibit atypical behaviour compared to the conservative and extensional quantifiers, so that it may be better to analyse them as adverbs.