The incremental development of formal semantics

The goal of formal semantics is to determine the truth or falsehood of natural language sentences according to a mathematical description of the world. Meaning is the tool which determines these truth-values; in other words, specifying what meaning is should allow us to say what is true and what is not.

For that purpose, how can we define meaning? We will try to answer the question incrementally, and only using simple sentences such as ⌜X is Y⌝ and ⌜X thinks that Y⌝. Although syntactically simple, they have been real puzzles for linguists and philosophers: sentences that broke theories and helped repair them. We will start with a naive theory, break it, fix it, break it again with a different puzzle and try to fix it again.

For conciseness, we will skip a lot of things usually covered by textbooks. This includes generalized quantifiers, which were an important development. Apart from this omission, we will more or less follow the history of the discipline. I don’t think these notes are detailed enough to serve as a standalone introductory text. Initially, I only wanted to explain how to go beyond intension, starting where Winter (2016)’s excellent textbook stops. This is what I have done in the third section about structured propositions, but I ended up adding a lot more to give context.

Method and assumptions

Content, meaning and the general method

The meaning of an expression will have its intuitive and informal sense. When we judge that two sentences have different meanings, it is as competent speakers that we do so, and we don’t need to justify ourselves.

On the other hand, the content of an expression (a sub-string or sub-expression within a sentence, be it an isolated word, a sequence of words or the entire sentence) is a mathematical object. The goal is to define this mathematical object so that it matches meaning. Starting from a naive definition of content, we are going to show sentences that have different meanings, yet are assigned a mathematically identical content. Such sentences demonstrate that our theory is broken, and we will have to redefine the mathematical definition of content so that 1) our example sentences will get different contents, while 2) other sentences that were not problematic for our theory will be assigned content that still discriminates between them. We will repeat this method two times.

Some authors use “content” and “meaning” interchangeably: I did that in the introduction. Similarly, physicists call “position” the predicted position of the object of study, even though it might not match observations perfectly. Now, I’ll differentiate the two for clarity.

Semantics rely on syntax

The theory that we will build should be compositional: the content of an expression should be a function of content of sub-expressions. Moreover, the order of composition will be guided by the constituency tree of the expression. Please read on if this doesn’t make sense, or skip if it’s clear.

It will probably be intuitive that the content of the sentence “The third Cabinet of Angela Merkel was sworn in on 17 December 2013” should depend on the content of sub-expressions like “The third Cabinet of Angela Merkel” or “sworn in” or “on 17 December 2013”. And in turn, the content of “The third Cabinet of Angela Merkel” depends on the content of “The third Cabinet” and on the content of “Angela Merkel”. This is a quick and intuitive justification for compositionality.

Moreover, it also seems that not every sub-expression should have content. For instance, “in on 17” seems to lack context, to be incomplete, to be meaningless without more information.

Constituency trees are the scaffold for the content function. Constituency trees are binary trees, where each node represents an expression, and the two children of this node represent two sub-expressions, which, concatenated, form the expression of the parent. The expressions (labels of nodes) in a constituency tree are called constituents. For instance, our example sentence is split in “The third Cabinet of Angela Merkel” and “was sworn in on 17 December 2013”. These correspond to the grammatical categories of subject and predicate. Then, the children of these two nodes divide each constituent in two, and so recursively, such that the leaves are associated with individual words.

By the way, the parsing algorithm which computes constituency trees does not necessarily split constituents in a top-down fashion, from sentences to words. But whatever parsing algorithm is used, the concatenation of expressions of sibling nodes is equal to the expression associated with their parent nodes.

Content will be computed bottom-up, from the leaves (words) to the root node representing the entire sentence. In particular, the content of a given constituent in the tree will depend on its children. Expressions like “in on 17” are not constituents: they overlap over two sibling constituents (sworn in” and “on 17 December 2013”). Thus, the content function will only be defined on constituents.

I have appealed to intuition to justify the analysis of the example, but syntacticians delineate them in a data-driven manner using constituency tests. How to obtain the constituency trees is a matter of syntax, not today’s topic, so we will consider that we have access to such trees. The semantic analysis of a sentence – the computation of the content of that sentence – will depend on the syntactic analysis of that sentence. The hidden assumption here is that syntax is relatively independent of questions of meaning and that syntactic analysis can be performed as a pre-processing step. This assumption is often illustrated by Chomsky’s “Colorless green ideas sleep furiously”, and is obviously debatable.

In short, we assume 1) that syntax is independent of semantics, that the construction of the constituency tree does not appeal to meaning at all, and that we have access to such trees; and 2), that content is a recursively-defined function, and that the recursion follows the syntactic tree of the sentence. Now, the question is the following: given a sentence and its constituency tree, how could we define content to match meaning?

Content as reference

Our first, naive theory will consider that the content of names and noun phrases will directly be the things that they refer to in the actual world. For instance, the meaning of “Angela Merkel” is the person Angela Merkel, and “all the green apples harvested this summer by my uncle” means all the green apples harvested this summer by my uncle. This sounds trivial, circular, hard to formalize, and useless. However, this gets us a long way.

Linking words to sets with structures

A vocabulary is a set, which elements are called words. A language is a subset of all possible finite sequences of words. The elements of this set are called sentences. Interesting languages put strong grammatical requirements on their sentences (for instance, such word cannot follow such other word) but this is not our concern. We naively assume that syntax is independent of questions of meaning, and that we have access to a constituency parser.

The actual world is described as a mathematical structure, model, or world. To simplify a little, it consists in:

For example, if we take the entire set of things in the world to be our domain, the structure has a function mapping “Angela Merkel” to the individual Angela Merkel. The structure also maps the constituent “Chancellor” to all the Chancellors that have ever been and “of Germany” to things that are German (loosely speaking), so that both are subsets of individuals of \(D\).

It seems that the “Chancellor of Germany” refers to the intersection of the subsets denoted by “Chancellor” and “of Germany”. So we realize that we can obtain the referent of larger expressions bottom-up.

So far, we have discussed the referents of noun phrases and have established that they point to elements or subsets in the domain. But what do verbs refer to and how do we obtain truth-values for entire sentences? Let’s look at a simple sentence, “Angela Merkel runs”. It has the following constituency tree:

Intuitively, this sentence is true if and only if Angela Merkel runs. How can we express this necessary and sufficient condition and relate it to our structure? Well, since Angela Merkel is an individual of the domain, we let “runs” denote the subset of individuals that run. Then, the entire sentence is true iff Angela Merkel is in that subset. This mechanism also works with more complex. For instance, in the sentence “Angela Merkel talked for two hours with Hu Jintao during the G8”, the predicate “talked for […]” will be compositionally reduced to the subset of the domain containing elements which have talked to Hu Jintao during the G8 for two hours.

In summary, one can think of a structure as 1) a sort of coherent database of facts defined in terms of elements, sets, and relations and 2) a function linking words to these elements, sets, and relations. Given a structure, we can proceed bottom-up the constituency tree and apply simple set-theoretic operations to compute the referent of parent nodes.

It seems that reference is a possible candidate to represent meaning. Indeed, reference follows the constituency tree, just like we established above that content should. Moreover, the referent of a sentence is a truth-value. Our first theory is nothing more than that: content is reference.

Lambda-calculus for expressing computations

We have not defined precisely the content function, but we have merely said that it is recursive and it performs set-theoretic operations like set intersection. As we go beyond very simple sentences, we will quickly need ways to express relatively complicated computations of reference. For instance, we need to be able to describe the meaning of function words like “and”, “not”, etc. but these words do not refer; they seem to glue together the referents of constituents nearby in a specific way. Other words like “herself” do seem to refer, but the reference is indirectly given by the subject of the sentence, higher up the tree. To deal with these words, we will define content as a function that computes referents.

From Winter (2016)’s textbook, we take the following example: “Tina praised herself”. Its constituency tree looks like this:

Here, “herself” refers to the same individual as the subject of the sentence, “Tina”. The problem is that this subject is higher up the tree. As a result, the referent of the whole predicate “praised herself” also depends on the subject “Tina”. How can we express this? We would like to define the content of the predicate “praised herself” to be a function of the content of the subject, not a static, fixed subset of the domain. The computation of the subset denoted by “herself” would be delayed until the top of the tree, where we have access to the subject.

To define this function, we use lambda-calculus. It is a formal language whose elements, the lambda-expressions, denote functions. Here is an example of a lambda-expression: \(\lambda x.\lambda y.x+y\). It denotes the function of a variable \(x\) that returns the function of a variable \(y\) that sums \(x\) and \(y\). The variables \(x\) and \(y\) themselves are lambda-expressions. We’re going to use this language to define the content of constituents.

Lambda-calculus can only describe functions of one argument, which matches the binary structure of constituency trees: one sibling which content is a lambda-expression will “consume” the other sibling’s content (which can also be a lambda-expression) to produce the content of the parent constituent. Moreover, the example shows how a lambda-expression can denote a function that returns another function (see also currying, the conversion of a function of several arguments into a composition of one-argument functions). We want the content of “praised herself” to be a function that will consume the content of “Tina”, which we can achieve by having the content of “herself” return that function.

We are now ready to reverse-engineer the content of “herself” based on what we want the content of higher constituents (of “praised herself” and the whole sentence) to be. This is summarized in the following trees where content is added under the constituents between {}. The structure maps “praised” to the mathematical relation \(\mathrm{Pr}\), “Tina” to \(\mathrm{Ti}\).

You can take a moment to try to figure out the content of “herself”, or read on.

If \(\mathrm{P}\) is a binary relation defined in the structure, we use the notation \(\mathrm{P}(x, y)\) to be the function that is true iff \((x, y) \in P\). We define the content of “herself” as \(λP.λx.P(x,x)\), and apply this function recursively, bottom-up to obtain:

\(\mathrm{Pr}(\mathrm{Ti}, \mathrm{Ti})\) is precisely what we want, since it is true iff Tina praises herself.

Formal semantics uses the simply-typed variant of lambda-calculus. Thus the lambda-expressions have types which constrain the arguments they can take and the values they can return, much like types of statically-typed programming languages. In most texts, the types of the variables are indicated as indices, possibly with brackets to indicate input and output types. In our case, we would write the content of “herself” as \(λP_{et}.λx_{e}.P(x,x)\): a simple entity like \(x\) has type \(e\), a VP predicate has type \(et\) as it consumes an entity and returns a truth-value of type \(t\). The type of the content of “herself” is \((et)(et)\) as it consumes a predicate of type \(et\) (like \(\mathrm{Pr}\)) and returns the same type.

As you may have noticed, constituency trees only partially guide the order of composition, as we never know in advance whether the left sibling is going to consume the right one or the reverse. In fact, the order vary depending on the two siblings: if the object of “praised” is “Mary”, or any term which directly refers to an individual of the domain (or a subset), then the content of “praised” takes its sibling as argument. However, when “herself” is used as the object, it is reversed: the content of “herself” composes with the content of the verb.

To recap, sentences are meaningless sequences of symbols, until they are interpreted by a structure. A structure assigns referents (elements, sets, relations) to words. The referents of intermediary constituents are computed recursively by some function, following a constituency tree. The truth-value (true or false) of the entire sentence is the content of the root constituent. The precise computations performed are defined using lambda-calculus. In this early-stage theory, the content of a constituent is its referent according to the structure, or, for function words like “herself” and “and”, some function that is used to compute the content of constituents higher up in the tree. Let us now see problems with this theory.

Content as intension

Frege’s puzzle

Frege (1892) showed that two constituents can have the same referents, yet, different contents:

A single issue underlies both puzzles. There are pairs of sentences where two constituents are swapped (noun phrases in 1), embedded sentences in 2)). These constituents have the same referents. Since content is computed compositionally and bottom-up (from words to bigger constituents), then if the only different intermediary constituents have the same content, sentences also obtain the same truth-values. Therefore, content cannot be reduced to reference and we need to improve our theory.

Modality and possible worlds

Let A be a structure describing things in our world, and denote by f_A the reference function in that structure. In our current theory, we compute the referent of “The evening star is the morning star” in 2 steps: (f_A("the evening star") == f_A("the morning star")) = (Venus == Venus) = true. Similarly, “The evening star is the evening star” is analyzed as (f_A("the evening star") == f_A("the evening star")) = (Venus == Venus) = true which evaluates to true. But the second step is optional: (f_A(evening_star) == f_A(evening_star)) directly evaluates to true, and we don’t even need to consider the particular referent of “the evening star” in A. In other words, regardless of the the way things are in the world, this sentence is true. In such cases, people often say that “it must be true” or that “it is necessary”. On the other hand, there exists a structure B identical to A, but with f_B("the evening star") != Venus, and in which “the morning star is the evening star” is false. Therefore, it is conceivable, imaginable or possible that “the evening star is the morning star” were false, despite the fact that it is true in the actual world.

Formally, a sentence is necessary if it is true in all possible worlds (in mathematical logic, we say it is valid). A sentence is contingent if it is true in the actual world, but is false in a different possible world. A sentence is possible if it is true in at least one possible world. These definitions rely on the set of all possible worlds, which we will talk about more later.

Intensions

We have considered pairs of sentences that differ by a constituent referring to the same thing, and established that even though they are true in the actual world, there are worlds in which one is true and the other is not. So the two subsets of possible worlds in which two sentences are true differ, and therefore can be used to discriminate between the two sentences.

This is the intuition behind one possible solution to the puzzle: define content of words not as their referents, but as a function from a world to their referent in that world. This function is equivalently represented as the sets of structures for which the function is true (since the function can only take two values, true and false). Such a function is called intension (following Carnap), while the referent in a particular world is the extension.

In Winter (2016)’s formalism, possible worlds exist within a single structure. We add to the domain a set of worlds \(D_s\), called indices, and introduce the corresponding type s. The intension of a sentence is now called a proposition (of type st, mapping indices of type s to truth-values of type t). The intension of an entity is called an individual concept (of type se, mapping indices to individuals of type e). Without going into details, we can update our lambda-expressions systematically to support possible worlds.

The theory is compatible with the previous one, in that it determines the same referents for the actual world as our previous theory. But the content of a constituent is a much richer object than its extension in the actual world: it can determine the extension of a constituent in all possible worlds.

This solution also solves the second puzzle. The problem was that the content of a sentence was reduced to a single truth-value, either true or false. Thus, the content of ⌜Fran believes that T⌝ was identical for every true T. Now that we have introduced intensions, how is belief encoded in the structure? Let us note \(B\) the belief relation in the structure corresponding to “believe that”. In each particular world \(W\), \(B(W)\) is a (potentially different) relation between individuals e and propositions st, i.e., a subset of \(D_e \times (D_s \times D_t)\). The lambda-expression for “believes that” would consume a proposition \(S\) of type st as input, and return a predicate of type se(st) as output, something like this:

\(\mathrm{believe} = \lambda S_{st}.\lambda I_{se}.\lambda W_s.(I(W),S) \in B(W)\)

Content as structured propositions

Unfortunately, there is still a problem with necessary truths such as \(T_1\)=“It is snowing or it is not snowing” and \(T_2\)=“All students are students” (from Cann (1993), p.316). Our theory cannot distinguish between these sentences and they have the same intension, the set of all possible worlds. So if ⌜Bernard believes \(T_1\)⌝ is true, then ⌜Bernard believes \(T_2\)⌝ is also true, which is not a valid inference. Once again, our theory is too coarse-grained: it assigns the same content to sentences which mean different things.

We can try to solve this problem without much efforts, by modifiying the support of the intension functions to include worlds in which some of these sentences are true and others are false. In particular, including logically impossible worlds could help us broaden the domain on which intensions are defined, thus discriminating between \(T_1\) and \(T_2\). The second solution is more complex: redefine content as something else that can determine intension, but that is richer than intension; just like intension determines reference but cannot be reduced to it. This is the structured proposition approach. Finally, I will risk sharing some thoughts about a third possible approach where lambda-expressions could work as content.

Possible worlds, sensible worlds, impossible worlds

“There are not enough worlds to differentiate the different statements from one another”, according to Cann (1993). But he already uses a broader set of possible worlds than other authors, for which things are even worse. King (2019), for instance, exposes the problem above using different examples:

The worlds in which these sentences are false are worlds where linguistic facts differ from ours. Whether these words should be considered possible or not is debatable. Soames (1987) talk about metaphysically possible worlds while Cann (1993) talks about sensible worlds to denote worlds where language is used in the same way as it is in our world. To define sensible worlds, we can use Carnap (1952)’s meaning postulates, propositions that indicate how words are used in the actual world, rather similar to dictionary definitions. For example, the proposition expressed by the sentence “bachelors are unmarried men” is a meaning postulate. A world is sensible if the meaning postulates are true in this world. By contrast, a world in which “Rabbits are robots from Mars” (Cann (1993), p.277) is not sensible: it does not satisfy the meaning postulate encoding the fact that a rabbit is an animal, or that robots are machines, not animals.

But as we have seen, our problem holds even when possible worlds include all worlds where the language is inconsistent with ours (in Cann (1993)’s possible worlds or Soames (1987) logically possible worlds). A solution is to broaden the support of the intension function even further to include not only possible worlds, but also logically impossible worlds. For instance, there would be an impossible world where the law of excluded middle does not hold, and therefore, “it is snowing or it is not snowing” would be false.

In summary, by defining intensions over more and more worlds (logically possible or impossible worlds), intensions may become fine-grained enough.

Proponents of structured propositions deny that this solution is satisfying. Let’s turn to their criticism and proposed solution.

Structured propositions

Other solutions to the necessary truths problem can be grouped under the term structured propositions. King (2019) says:

I am going to discuss the relatively popular neo-Russellian flavor of structured propositions as described by Soames (1987).

For Soames, we cannot salvage the view that content is intension, even when we use more than sensible worlds as support. He exhibits issues arising because of inferences based on distribution over conjunction: if ⌜Manny believes A and B⌝ is true, then ⌜Manny believes A⌝ and ⌜Manny believes B⌝ are true. Take the conjunction operator over propositions to be the intersection of the sets of possible worlds representing the two propositions. Then, the conjunction of a necessary true proposition and any proposition P is P. So for every necessary truth T and any sentence A, ⌜Manny says that A⌝ is true iff ⌜Manny says that A and T⌝ is true. By distribution over conjunction, then, ⌜Manny says that T⌝ is true, which is not warranted. There is a similar problem for necessary false sentences F: if ⌜Manny says that F⌝ is true, then, for every sentence B, ⌜Manny says that B and F⌝ is true, and by distribution over conjunction, then ⌜Manny says that B⌝ is true for every sentence B, which is absurd. In summary, Manny always says necessary truths whenever he says something (however non-trivial they are!) and cannot say necessary false statements.

In addition, Soames is trying to accommodate direct reference theory, (re?)popularized by Saul Kripke with “Naming and Necessity”. Without going into details (that I ignore), Kripke holds that proper and common nouns are rigid designators: they refer to the same individual in all possible worlds. Formally, there is no individual concept – or rather, individual concepts are constant functions. But as soon as we adopt this viewpoint, Frege’s puzzle comes back. In Soames (1987)’s original example ((9)), using our current theory, assuming the first sentence is true, the other sentences can be inferred in this order:

(I ignore subtleties related to using definite descriptions and assume “the evening star” only picks out an individual.) The problem is that the ancients did not know that “Hesperus” and “Phosphorus” referred to the same individual, but that is contradicted by 4). Therefore, we need to break this chain of inference. But which inference(s) is/are unwarranted?

As Soames explains, the inference from the first to the second sentence is not that weird. Sentential and propositional attitudes should be treated differently. “Assert” and “believe” are relations between individuals and propositions. But “utter” and “say” are relations between individuals and sentences. This is different: the ancients would certainly not literally say that “the evening star is Hesperus and that the morning star is Hesperus”, but it is true that they believed so. (Is it simply wrong under a de dicto interpretation and correct under a de re interpretation?) As for the inference from 3) to 4), it simply comes from distribution over conjunction which seems intuitively correct. Therefore, the problem lies in the 2) to 3) inference.

To make it illegal, Soames proposed two variants of the same idea: “structured Russellian propositions” and a simpler variant based on “truth-supporting circumstances”. The latter is minimalist: it just solves the paradox, nothing more. But Soames argues in section VII that it is less cognitively plausible, using again distributivity of conjunction. Therefore, let us only discuss the first variant. The approach boils down to defining content as follows:

For instance, “John does not run” would be represented as <NEG, <<J>, Run>>, where J is the entity directly denoted by “John”, and Run is the intension of “run”.

To get back to our example, if the content of the clause of 2) is a proposition P2 (not detailed here), whereas the content of the clause of 3) is the proposition P3 = <CONJ, <P2, <SOME, g>>>. That is, P3 encodes the conjunction of P2 and some other proposition <SOME, g> (where g is properly defined). Since P2 and P3 are different mathematical objects, the inference from 2) to 3) is invalid.

Soames’ structured Russellian propositions are nested tuples, so they can be equivalently represented as trees. As Soames says:

Finally, note that other proposals for structured propositions seem quite similar. For example, Cann (1993) summarizes Cresswell’s approach (p.317) as follows:

It seems rather similar to Soames’ variant based on “truth-supporting circumstances” and essentially consists in (non-nested) tuples.

Content as lambda-expression?

It is easy to forget that lambda-expressions denote functions, but are not the functions they denote. They are just strings, sequences of symbols. The distinction is not very clearly made in Winter (2016)’s text, and actually, not very clear in this text either until now. It is pedantic in ordinary/informal maths, but can be useful in a text about semantics. And without this distinction, it is not clear why content as intension do not work! In fact, I think that if we take content to be the lambda-expression that represents intension, we avoid a lot of problems.

A lambda-expression is finer-grained than an intension function: there are several lambda-expressions which represent the same function. The lambda-expression that computes the intension of the sentence “bachelors are unmarried” is \(\lambda i_s.\mathrm{bachelor}(i) \subset \neg \mathrm{married}(i)\). It is clearly different from that of “Brothers are male siblings” that could be something like \(\lambda i_s.\mathrm{brothers}(i) \subset \mathrm{male}(i) \times \mathrm{male}(i) ...\). Yet, both expressions represent an identical intension (using all sensible worlds): the constant true function \(i \mapsto 1\).

Now if content is the lambda-expression, and not the denoted function, content is an expression from an intermediary language. The string representation of neo-Russellian propositions also obey strict syntactic rules too, from which we can recover the tuples. So the two solutions might be very similar.

Closing thoughts

Linguists, philosophers and mathematicians have incrementally refined the notion of content/meaning in formal semantics: from reference, to intension, to structured propositions. The story does not end here, see Speaks (2019)’s section on Fregean semantics.

Perhaps we could draw a parallel between the trajectories of neural NLP and formal semantics. Neural NLP used to represent sentences in single, large-dimensional vectors (bag-of-words embeddings, last hidden states of LSTM encoders), not unlike the infinite-dimensional binary vectors that represent intensions of sentences in formal semantics. But with ELMo and the transformer-based models that followed, neural NLP also shifted towards structured representations which share several characteristics with neo-Russellian structured propositions. For instance, BERT’s representations and neo-Russellian structured propositions both:

Finally, I would like to share some naive thoughts, doubts and questions about the whole project.

References

Cann, Ronnie. 1993. Formal Semantics: An Introduction. Cambridge Textbooks in Linguistics. Cambridge University Press. https://doi.org/10.1017/CBO9781139166317.

Carnap, Rudolf. 1952. “Meaning Postulates.” Philosophical Studies 3 (5): 65–73.

Frege, Gottlob. 1892. “Über Sinn Und Bedeutung.” Zeitschrift für Philosophie Und Philosophische Kritik 100: 25–50.

Jawahar, Ganesh, Benoı̂t Sagot, and Djamé Seddah. 2019. “What Does BERT Learn about the Structure of Language?” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3651–57.

King, Jeffrey C. 2019. “Structured Propositions.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Summer 2019. https://plato.stanford.edu/archives/sum2019/entries/propositions-structured/; Metaphysics Research Lab, Stanford University.

Soames, Scott. 1987. “Direct Reference, Propositional Attitudes, and Semantic Content.” Philosophical Topics 15 (1): 47–87.

Speaks, Jeff. 2019. “Theories of Meaning.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Winter 2019. https://plato.stanford.edu/archives/win2019/entries/meaning/; Metaphysics Research Lab, Stanford University.

Winter, Yoad. 2016. Elements of Formal Semantics: An Introduction to the Mathematical Theory of Meaning in Natural Language. Edinburgh University Press.