Tag: topos

The shape of languages

Published March 3, 2023 by lievenlb

In the topology of dreams we looked at Sibony’s idea to view dream-interpretations as sections in a fibered space.

The ‘points’ in the base-space and fibers consisting of chunks of text, perhaps connected by links. The topology and shape of this fibered space is still shrouded in mystery.

Let’s look at a simple approach to turn a large number of texts into a topos, and define a loose metric on it.

There’s this paper An enriched category theory of language: from syntax to semantics by Tai-Danae Bradley, John Terilla and Yiannis Vlassopoulos.

Tai-Danae Bradley is an excellent communicator of everything category related, so probably it is more fun to read her own blogposts on this paper:

or to watch her Categories for AI talk: ‘Category Theory Inspired by LLMs’:

Let’s start with a collection of notes. In the paper, they consider all possible texts written in some language, but it may be a set of webpages to train a language model, or a set of recollections by someone.

Next, shred these notes into chunks of text, and point one of these to all the texts obtained by deleting some words at the start and/or end of it. For example, the note ‘a red rose’ will point to ‘a red’, ‘red rose’, ‘a’, ‘red’ and ‘rose’ (but not to ‘a rose’).

You may call this a category, to me it is just as a poset $(\mathcal{L},\leq)$. The maximal elements are the individual words, the minimal elements are the notes, or websites, we started from.

A down-set $A$ of this poset $(\mathcal{L},\leq)$ is a subset of $\mathcal{L}$ closed under taking smaller elements, that is, if $a \in A$ and $b \leq a$, then $b \in A$.

The intersection of two down-sets is again a down-set (or empty), and the union of down-sets is again a downset. That is, down-sets define a topology on our collection of text-snippets, or if you want, on language-fragments.

For example, the open determined by the word ‘red’ is the collection of all text-fragments containing this word.

The corresponding presheaf topos $\widehat{\mathcal{L}}$ is then just the category of all (set-valued) presheaves on this topological space.
As an example, the Yoneda-presheaf $\mathcal{Y}(p)$ of a text-snippet $p$ is the contra-variant functor

$$(\mathcal{L},\leq) \rightarrow \mathbf{Sets}$$

sending any $q \leq p$ to the unique map $\ast$ from $q$ to $p$, and if $q \not\leq p$ then we map it to $\emptyset$. If $A$ is a down-set (an open of over topological space) then the sections of $\mathcal{Y}(p)$ over $A$ are $\{ \ast \}$ if for all $a \in A$ we have $a \leq p$, and $\emptyset$ otherwise.

The presheaf $\mathcal{Y}(p)$ already contains some semantic information about the snippet $p$ as it gives all contexts in which $p$ appears.

Perhaps interesting is that the ‘points’ of the topos $\widehat{\mathcal{L}}$ are the notes we started from.

Recall that Connes and Gauthier-Lafaey want to construct a topos describing someone’s unconscious, and points of that topos should be the connection with that person’s consciousness.

Suppose you want to unravel your unconscious. You start by writing down a large set of notes containing all relevant facts of your life. Then you construct from these notes the above collection of snippets and its corresponding pre-sheaf topos. Clearly, you wrote your notes consciously, but probably the exact phrasing of these notes, or recurrent themes in them, or some text-combinations are ruled by your unconscious.

Ok, it’s not much, but perhaps it’s a germ of an potential approach…

(Image credit)

Now we come to the interesting part of the paper, the ‘enrichment’ of this poset.

Surely, some of these text-snippets will occur more frequently than others. For example, in your starting notes the snippet ‘red rose’ may appear ten time more than the snippet ‘red dwarf’, but this is not visible in the poset-structure. So how can we bring in this extra information?

If we have two text-snippets $p$ and $q$ and $q \leq p$, that is, $p$ is a connected sub-string of $q$. We can compute the conditional probability $\pi(q|p)$ which tells us how likely it is that if we spot an occurrence of $p$ in our starting notes, it is part of the larger sentence $q$. These numbers can be easily computed and from the rules of probability we get that for snippets $r \leq q \leq p$ we have that

$$\pi(r|p) = \pi(r|q) \times \pi(q|r)$$

so these numbers (all between $0$ and $1$) behave multiplicative along paths in the poset.

Nice in theory, but it requires an awful lot of computation. From the paper:

The reader might think of these probabilities $\pi(q|p)$ as being most well defined when $q$ is a short extension of $p$. While one may be skeptical about assigning a probability distribution on the set of all possible texts, it’s reasonable to say there is a nonzero probability that cat food will follow I am going to the store to buy a can of and, practically speaking, that probability can be estimated.

Indeed, existing LLMs successfully learn these conditional probabilities $\pi(q|p)$ using standard machine learning tools trained on large corpora of texts, which may be viewed as providing a wealth of samples drawn from these conditional probability distributions.

It may be easier to have an estimate $\mu(q|p)$ of this conditional probability for immediate successors (that is, if $q$ is obtained from $p$ by adding one word at the beginning or end of it), and then extend this measure to all arrows in the poset by taking the maximum of products along paths. In this way we have for all $r \leq q \leq p$ that

$$\mu(r|p) \geq \mu(r|q) \times \mu(q|p)$$

The upshot is that this measure $\mu$ turns our poset (or category) $(\mathcal{L},\leq)$ into a category ‘enriched’ over the unit interval $[ 0,1 ]$ (suitably made into a monoidal category).

I’ll spare you the details, just want to flash out the corresponding notion of ‘enriched presheaves’ which are the objects of the semantic category $\widehat{\mathcal{L}}^s$ in the paper, which is the enriched version of the presheaf category $\widehat{\mathcal{L}}$.

An enriched presheaf is a function (not functor)

$$F~:~\mathcal{L} \rightarrow [0,1]$$

satisfying the condition that for all text-snippets $r,q \in \mathcal{L}$ we have that

$$\mu(r|q) \leq [F(q),F(r)] = \begin{cases} \frac{F(r)}{F(q)}~\text{if $F(r) \leq F(q)$} \\ 1~\text{otherwise} \end{cases}$$

Note that the enriched (or semantic) Yoneda presheaf $\mathcal{Y}^s(p)(q) = \mu(q|p)$ satisfies this condition, and now this data not only records the contexts in which $p$ appears, but also measures how likely it is for $p$ to appear in a certain context.

Another cute application of the condition on the measure $\mu$ is that it allows us to define a ‘distance function’ (satisfying the triangle inequality) on all text-snippets in $\mathcal{L}$ by

$$d(q,p) = \begin{cases} -ln(\mu(q|p))~\text{if $q \leq p$} \\
\infty~\text{otherwise} \end{cases}$$

So, the higher $\mu(q|p)$ the closer $q$ lies to $p$, and now the snippet $p$ (example ‘red’) not only defines the open set in $\mathcal{L}$ of all texts containing $p$, but now we can structure the snippets in this open set with respect to this ‘distance’.

In this way we can turn any language, or a collection of texts in a given language, into what Lawvere called a ‘generalized metric space’.

It looks as if we are progressing slowly in our, probably futile, attempt to understand Alain Connes’ and Patrick Gauthier-Lafaye’s claim that ‘the unconscious is structured like a topos’.

Even if we accept the fact that we can start from a collection of notes, there are a number of changes we need to make to the above approach:

there will be contextual links between these notes
we only want to retain the relevant snippets, not all of them
between these ‘highlights’ there may also be contextual links
texts can be related without having to be concatenations
we need to implement changes when new notes are added
… (much more)

Perhaps, we should try to work on a specific ‘case’, and explore all technical tools that may help us to make progress.

(tbc)

Previously in this series:

The topology of dreams

Next:

Loading a second brain

The topology of dreams

Published February 27, 2023 by lievenlb

Last May, the meeting Lacan et Grothendieck, l’impossible rencontre? took place in Paris (see this post). Video’s of that meeting are now available online.

Here’s the talk by Alain Connes and Patrick Gauthier-Lafaye on their book A l’ombre de Grothendieck et de Lacan : un topos sur l’inconscient ? (see this post ).

Let’s quickly recall their main ideas:

1. The unconscious is structured as a topos (Jacques Lacan argued it was structured as a language), because we need a framework allowing logic without the law of the excluded middle for Lacan’s formulas of sexuation to make some sense at all.

2. This topos may differs from person to person, so we do not all share the same rules of logic (as observed in real life).

3. Consciousness is related to the points of the topos (they are not precise on this, neither in the talk, nor the book).

4. All these individual toposes are ruled by a classifying topos, and they see Lacan’s work as the very first steps towards trying to describe the unconscious by a geometrical theory (though his formulas are not first order).

Surely these are intriguing ideas, if only we would know how to construct the topos of someone’s unconscious.

Let’s go looking for clues.

At the same meeting, there was a talk by Daniel Sibony: “Mathématiques et inconscient”

Sibony started out as mathematician, then turned to psychiatry in the early 70ties. He was acquainted with both Grothendieck and Lacan, and even brought them together once, over lunch, some day in 1973. He makes a one-line appearance in Grothendieck’s Récoltes et Semailles, when G discribes his friends in ‘Survivre et Vivre’:

“Daniel Sibony (who stayed away from this group, while pursuing its evolution out of the corner of a semi-disdainful, smirking eye)”

In his talk, Sibony said he had a similar idea, 50 years before Connes and Gauthier-Lafaye (3.04 into the clip):

“At the same time (early 70ties) I did a seminar in Vincennes, where I was a math professor, on the topology of dreams. At the time I didn’t have categories at my disposal, but I used fibered spaces instead. I showed how we could interpret dreams with a fibered space. This is consistent with the Freudian idea, except that Freud says we should take the list of words from the story of the dream and look for associations. For me, these associations were in the fibers, and these thoughts on fibers and sheaves have always followed me. And now, after 50 years I find this pretty book by Alain Connes and Patrick Gauthier-Lafaye on toposes, and see that my thoughts on dreams as sheaves and fibered spaces are but a special case of theirs.”

This looks interesting. After all, Freud called dream interpretation the ‘royal road’ to the unconscious. “It is the ‘King’s highway’ along which everyone can travel to discover the truth of unconscious processes for themselves.”

Sibony clarifies his idea in the interview L’utilisation des rêves en psychothérapie with Maryse Siksou.

“The dream brings blocks of words, of “compacted” meanings, and we question, according to the good old method, each of these blocks, each of these points and which we associate around (we “unblock” around…), we let each point unfold according to the “fiber” which is its own.

I introduced this notion of the dream as fibered space in an article in the review Scilicet in 1972, and in a seminar that I gave at the University of Vincennes in 1973 under the title “Topologie et interpretation des rêves”, to which Jacques Lacan and his close retinue attended throughout the year.

The idea is that the dream is a sheaf, a bundle of fibers, each of which is associated with a “word” of the dream; interpretation makes the fibers appear, and one can pick an element from each, which is of course “displaced” in relation to the word that “produced” the fiber, and these elements are articulated with other elements taken in other fibers, to finally create a message which, once again, does not necessarily say the meaning of the dream because a dream has as many meanings as recipients to whom it is told, but which produces a strong statement, a relevant statement, which can restart the work.”

Key images in the dream (the ‘points’ of the base-space) can stand for entirely different situations in someone’s life (the points in the ‘fiber’ over an image). The therapist’s job is to find a suitable ‘section’ in this ‘sheaf’ to further the theraphy.

It’s a bit like translating a sentence from one language to another. Every word (point of the base-space) can have several possible translations with subtle differences (the points in the fiber over the word). It’s the translator’s job to find the best ‘section’ in this sheaf of possibilities.

This translation-analogy is used by Daniel Sibony in his paper Traduire la passe:

“It therefore operates just like the dream through articulated choices, from one fiber to another, in a bundle of speaking fibers; it articulates them by seeking the optimal section. In fact, the translation takes place between two fiber bundles, each in a language, but in the starting bundle the choice seems fixed by the initial text. However, more or less consciously, the translator “bursts” each word into a larger fiber, he therefore has a bundle of fibers where the given text seems after the fact a singular choice, which will produce another choice in the bundle of the other language.”

This paper also contains a pre-ChatGPT story (we’re in 1998), in which the language model fails because it has far too few alternatives in its fibers:

I felt it during a “humor festival” where I was approached by someone (who seemed to have some humor) and who was a robot. We had a brief conversation, very acceptable, beyond the conventional witticisms and knowing sighs he uttered from time to time to complain about the lack of atmosphere, repeating that after all we are not robots.

I thought at first that it must be a walking walkie-talkie and that in fact I was talking to a guy who was remote control from his cabin. But the object was programmed; the unforeseen effects of meaning were all the more striking. To my question: “Who created you?” he answered with a strange word, a kind of technical god.

I went on to ask him who he thought created me; his answer was immediate: “Oedipus”. (He knew, having questioned me, that I was a psychoanalyst.) The piquancy of his answer pleased me (without Oedipus, at least on a first level, no analyst). These bursts of meaning that we know in children, psychotics, to whom we attribute divinatory gifts — when they only exist, save their skin, questioning us about our being to defend theirs — , these random strokes of meaning shed light on the classic aftermaths where when a tile arrives, we hook it up to other tiles from the past, it ties up the pain by chaining the meaning.

Anyway, the conversation continuing, the robot asked me to psychoanalyse him; I asked him what he was suffering from. His answer was immediate: “Oedipus”.

Disappointing and enlightening: it shows that with each “word” of the interlocutor, the robot makes correspond a signifying constellation, a fiber of elements; choosing a word in each fiber, he then articulates the whole with obvious sequence constraints: a bit of readability and a certain phrasal push that leaves open the game of exchange. And now, in the fiber concerning the “psy” field, chance or constraint had fixed him on the same word, “Oedipus”, which, by repeating itself, closed the scene heavily.

Okay, we have a first potential approximation to Connes and Gauthier-Lafaye’s elusive topos, a sheaf of possible interpretation of base-words in a language.

But, the base-space is still rather discrete, or at best linearly ordered. And also in the fibers, and among the sections, there’s not much of a topology at work.

Perhaps, we should have a look at applications of topology and/or topos theory in large language models?

(tbc)

Next:

The shape of languages

2 Comments

Mamuth to Elephant (2)

Published March 8, 2022 by lievenlb

Last time, we’ve viewed major and minor triads (chords) as inscribed triangles in a regular $12$-gon.

If we move clockwise along the $12$-gon, starting from the endpoint of the longest edge (the root of the chord, here the $0$-vertex) the edges skip $3,2$ and $4$ vertices (for a major chord, here on the left the major $0$-chord) or $2,3$ and $4$ vertices (for a minor chord, here on the right the minor $0$-chord).

The symmetries of the $12$-gon, the dihedral group $D_{12}$, act on the $24$ major- and minor-chords transitively, preserving the type for rotations, and interchanging majors with minors for reflections.

Mathematical Music Theoreticians (MaMuTh-ers for short) call this the $T/I$-group, and view the rotations of the $12$-gon as transpositions $T_k : x \mapsto x+k~\text{mod}~12$, and the reflections as involutions $I_k : x \mapsto -x+k~\text{mod}~12$.

Note that the elements of the $T/I$-group act on the vertices of the $12$-gon, from which the action on the chord-triangles follows.

There is another action on the $24$ major and minor chords, mapping a chord-triangle to its image under a reflection in one of its three sides.

Note that in this case the reflection $I_k$ used will depend on the root of the chord, so this action on the chords does not come from an action on the vertices of the $12$-gon.

There are three such operations: (pictures are taken from Alexandre Popoff’s blog, with the ‘funny names’ removed)

The $P$-operation is reflection in the longest side of the chord-triangle. As the longest side is preserved, $P$ interchanges the major and minor chord with the same root.

The $L$-operation is refection in the shortest side. This operation interchanges a major $k$-chord with a minor $k+4~\text{mod}~12$-chord.

Finally, the $R$-operation is reflection in the middle side. This operation interchanges a major $k$-chord with a minor $k+9~\text{mod}~12$-chord.

From this it is already clear that the group generated by $P$, $L$ and $R$ acts transitively on the $24$ major and minor chords, but what is this $PLR$-group?

If we label the major chords by their root-vertex $1,2,\dots,12$ (GAP doesn’t like zeroes), and the corresponding minor chords $13,14,\dots,24$, then these operations give these permutations on the $24$ chords:

P:=(1,13)(2,14)(3,15)(4,16)(5,17)(6,18)(7,19)(8,20)(9,21)(10,22)(11,23)(12,24) L:=(1,17)(2,18)(3,19)(4,20)(5,21)(6,22)(7,23)(8,24)(9,13)(10,14)(11,15)(12,16) R:=(1,22)(2,23)(3,24)(4,13)(5,14)(6,15)(7,16)(8,17)(9,18)(10,19)(11,20)(12,21)

Then GAP gives us that the $PLR$-group is again isomorphic to $D_{12}$:

gap> G:=Group(P,L,R);; gap> Size(G); 24 gap> IsDihedralGroup(G); true

In fact, if we view both the $T/I$-group and the $PLR$-group as subgroups of the symmetric group $Sym(24)$ via their actions on the $24$ major and minor chords, these groups are each other centralizers! That is, the $T/I$-group and $PLR$-group are dual to each other.

For more on this, there’s a beautiful paper by Alissa Crans, Thomas Fiore and Ramon Satyendra: Musical Actions of Dihedral Groups.

What does this new MaMuTh info learns us more about our Elephant, the Topos of Triads, studied by Thomas Noll?

Last time we’ve seen the eight element triadic monoid $T$ of all affine maps preserving the three tones $\{ 0,4,7 \}$ of the major $0$-chord, computed the subobject classified $\Omega$ of the corresponding topos of presheaves, and determined all its six Grothendieck topologies, among which were these three:

Why did we label these Grothendieck topologies (and corresponding elements of $\Omega$) by $P$, $L$ and $R$?

We’ve seen that the sheafification of the presheaf $\{ 0,4,7 \}$ in the triadic topos under the Grothendieck topology $j_P$ gave us the sheaf $\{ 0,3,4,7 \}$, and these are the tones of the major $0$-chord together with those of the minor $0$-chord, that is the two chords in the $\langle P \rangle$-orbit of the major $0$-chord. The group $\langle P \rangle$ is the cyclic group $C_2$.

For the sheafication with respect to $j_L$ we found the $T$-set $\{ 0,3,4,7,8,11 \}$ which are the tones of the major and minor $0$-,$4$-, and $8$-chords. Again, these are exactly the six chords in the $\langle P,L \rangle$-orbit of the major $0$-chord. The group $\langle P,L \rangle$ is isomorphic to $Sym(3)$.

The $j_R$-topology gave us the $T$-set $\{ 0,1,3,4,6,7,9,10 \}$ which are the tones of the major and minor $0$-,$3$-, $6$-, and $9$-chords, and lo and behold, these are the eight chords in the $\langle P,R \rangle$-orbit of the major $0$-chord. The group $\langle P,R \rangle$ is the dihedral group $D_4$.

More on this can be found in the paper Commuting Groups and the Topos of Triads by Thomas Fiore and Thomas Noll.

The operations $P$, $L$ and $R$ on major and minor chords are reflexions in one side of the chord-triangle, so they preserve two of the three tones. There’s a distinction between the $P$ and $L$ operations and $R$ when it comes to how the third tone changes.

Under $P$ and $L$ the third tone changes by one halftone (because the corresponding sides skip an even number of vertices), whereas under $R$ the third tone changes by two halftones (a full tone), see the pictures above.

The $\langle P,L \rangle = Sym(3)$ subgroup divides the $24$ chords in four orbits of six chords each, three major chords and their corresponding minor chords. These orbits consist of the

$0$-, $4$-, and $8$-chords (see before)
$1$-, $5$-, and $9$-chords
$2$-, $6$-, and $10$-chords
$3$-, $7$-, and $11$-chords

and we can view each of these orbits as a cycle tracing six of the eight vertices of a cube with one pair of antipodal points removed.

These four ‘almost’ cubes are the NE-, SE-, SW-, and NW-regions of the Cube Dance Graph, from the paper Parsimonious Graphs by Jack Douthett and Peter Steinbach.

To translate the funny names to our numbers, use this dictionary (major chords are given by a capital letter):

The four extra chords (at the N, E, S, and P places) are augmented triads. They correspond to the triads $(0,4,8),~(1,5,9),~(2,6,10)$ and $(3,7,11)$.

That is, two triads are connected by an edge in the Cube Dance graph if they share two tones and differ by an halftone in the third tone.

This graph screams for a group or monoid acting on it. Some of the edges we’ve already identified as the action of $P$ and $L$ on the $24$ major and minor triads. Because the triangle of an augmented triad is equilateral, we see that they are preserved under $P$ and $L$.

But what about the edges connecting the regular triads to the augmented ones? If we view each edge as two directed arrows assigned to the same operation, we cannot do this with a transformation because the operation sends each augmented triad to six regular triads.

Alexandre Popoff, Moreno Andreatta and Andree Ehresmann suggest in their paper Relational poly-Klumpenhouwer networks for transformational and voice-leading analysis that one might use a monoid generated by relations, and they show that there is such a monoid with $40$ elements acting on the Cube Dance graph.

Popoff claims that usual presheaf toposes, that is contravariant functors to $\mathbf{Sets}$ are not enough to study transformational music theory. He suggest to use instead functors to $\mathbf{Rel}$, that is Sets with as the morphisms binary relations, and their compositions.

Another Elephant enters the room…

(to be continued)