Skip to content →

neverendingbooks Posts

The tropical brain-forest

If machine learning, AI, and large language models are here to stay, there’s this inevitable conclusion:


At the start of this series, the hope was to find the topos of the unconscious. Pretty soon, attention turned to the shape of languages and LLMs.

In large language models all syntactic and semantic information is encoded is huge arrays of numbers and weights. It seems unlikely that $\mathbf{Set}$-valued presheaves will be useful in machine learning, but surely Huawei will prove me wrong.

$[0,\infty]$-enriched categories (aka generalised metric spaces) and associated $[0,\infty]$-enriched presheaves may be better suited to understand existing models.

But, as with ordinary presheaves, there are just too many $[0,\infty]$-enriched ones, So, how can we weed out the irrelevant ones?

For inspiration, let’s turn to evolutionary biology and their theory of phylogenetic trees. They want to trace back common (extinguished) ancestors of existing species by studying overlaps in the DNA.



(A tree of life, based on completely sequenced genomes, from Wikipedia)

The connection between phylogenetic trees and tropical geometry is nicely explained in the paper Tropical mathematics by David Speyer and Bernd Sturmfels.

The tropical semi-ring is the set $(-\infty,\infty]$, equipped with a new addition $\oplus$ and multiplication $\odot$

$$a \oplus b = min(a,b), \quad \text{and} \quad a \odot b = a+b$$

Because tropical multiplication is ordinary addition, a tropical monomial in $n$ variables

$$\underbrace{x_1 \odot \dots \odot x_1}_{j_1} \odot \underbrace{x_2 \odot \dots \odot x_2}_{j_2} \odot \dots$$

corresponds to the linear polynomial $j_1 x_1 + j_2 x_2 + \dots \in \mathbb{Z}[x_1,\dots,x_n]$. But then, a tropical polynomial in $n$ variables

$$p(x_1,\dots,x_n)=a \odot x_1^{i_1}\dots x_n^{i_n} \oplus b \odot x_1^{j_1} \dots x_n^{j_n} \oplus \dots$$

gives the piece-wise linear function on $p : \mathbb{R}^n \rightarrow \mathbb{R}$

$$p(x_1,\dots,x_n)=min(a+i_1 x_1 + \dots + i_n x_n,b+j_1 x_1 + \dots + j_n x_n, \dots)$$

The tropical hypersurface $\mathcal{H}(p)$ then consists of all points of $v \in \mathbb{R}^n$ where $p$ is not linear, that is, the value of $p(v)$ is attained in at least two linear terms in the description of $p$.

Now, for the relation to phylogenetic trees: let’s sequence the genomes of human, mouse, rat and chicken and compute the values of a suitable (necessarily symmetric) distance function between them:




From these distances we want to trace back common ancestors and their difference in DNA-profile in a consistent manner, that is, such that the distance between two nodes in the tree is the sum of the distances of the edges connecting them.

In this example, such a tree is easily found (only the weights of the two edges leaving the root can be different, with sum $0.8$):



In general, let’s sequence the genomes of $n$ species and determine their distance matrix $D=(d_{ij})_{i,j}$. Biology asserts that this distance must be a tree-distance, and those can be characterised by the condition that for all $1 \leq i,j,k,l \leq n$, among the three numbers

$$d_{ij}+d_{kl},~d_{ik}+d_{jl},~d_{il}+d_{jk}$$

the maximum is attained at least twice.

What has this to do with tropical geometry? Well, $D$ is a tree distance if and only if $-D$ is a point in the tropical Grassmannian $Gr(2,n)$.

Here’s why: let $e_{ij}=-d_{ij}$ then the above condition is that the minimum of

$$e_{ij}+e_{kl},~e_{ik}+e_{jl},~e_{il}+e_{jk}$$

is attained at least twice, or that $(e_{ij})_{i,j}$ is a point of the tropical hypersurface

$$\mathcal{H}(x_{ij} \odot x_{kl} \oplus x_{ik} \odot x_{jl} \oplus x_{il} \odot x_{jk})$$

and we recognise this as one of the defining quadratic Plucker relations of the Grassmannian $Gr(2,n)$.

More on this can be found in another paper by Speyer and Sturmfels The tropical Grassmannian, and the paper Geometry of the space of phylogenetic trees by Louis Billera, Susan Holmes and Karen Vogtmann.

What’s the connection with $[0,\infty]$-enriched presheaves?

The set of all species $V=\{ m,n,\dots \}$ , together with the distance function $d(m,n)$ between their DNA-sequences is a $[0,\infty]$-category. Recall that a $[0,\infty]$-enriched presheaf on $V$ is a function $p : V \rightarrow [0,\infty]$ satisfying for all $m,n \in V$

$$d(m,n)+p(n) \geq p(m)$$

For an ancestor node $p$ we can take for every $m \in V$ as $p(m)$ the tree distance from $p$ to $m$, so every ancestor is a $[0,\infty]$-enriched presheaf.

We also defined the distance between such $[0,\infty]$-enriched presheaves $p$ and $q$ to be

$$\hat{d}(p,q) = sup_{m \in V}~max(q(m)-p(m),0)$$

and this distance coincides with the tree distance between the nodes.

So, all ancestors nodes in a phylogenetic tree are very special $[0,\infty]$-enriched presheaves, optimal for the connection with the underlying $[0,\infty]$-enriched category (the species and their differences in genome).

We would like to garden out such exceptional $[0,\infty]$-enriched presheaves in general, but clearly the underlying distance of a generalised metric space, even when it is symmetric, is not a tree metric.

Still, there might be regions in the space where we can do the above. So, in general we might expect not one tree, but a forest of trees formed by the $[0,\infty]$-enriched presheaves, optimal for the metric we’re exploring.

If we think of the underlying $[0,\infty]$-category as the conscious manifestations, then this forest of presheaves are the underlying brain-states (or, if you want, the unconscious) leading up to these.

That’s why I like to call this mental picture the tropical brain-forest.



(Image credit)

Where’s the tropical coming from?

Well, I think that in order to pinpoint these ‘optimal’ $[0,\infty]$-enriched presheaves a tropical-like structure on these, already mentioned by Simon Willerton in Tight spans, Isbell completions and semi-tropical modules, will be relevant.

For any two $[0,\infty]$-enriched presheaves we can take $p \oplus q = p \wedge q$, and for every $s \in [0,\infty]$ we can define

$$s \odot p : V \rightarrow [0,\infty] \qquad m \mapsto max(p(m)-s,0)$$

and check that this is again a $[0,\infty]$-presheaf. The mental idea of $s \odot p$ is that of a fat point centered at $p$ with size $s$.

(tbc)

Previously in this series:

Leave a Comment

Stephen Wolfram on ChatGPT

A month ago, Stephen Wolfram put out a little booklet (140 pages) What Is ChatGPT Doing … and Why Does It Work?.



It gives a gentle introduction to large language models and the architecture and training of neural networks.

The entire book is freely available:

The advantage of these online texts is that you can click on any of the images, copy their content into a Mathematica notebook, and play with the code.

This really gives a good idea of how an extremely simplified version of ChatGPT (based on GPT-2) works.

Downloading the model (within Mathematica) uses about 500Mb, but afterwards you can complete any prompt quickly, and see how the results change if you turn up the ‘temperature’.

You should’t expect too much from this model. Here’s what it came up with from the prompt “The major results obtained by non-commutative geometry include …” after 20 steps, at temperature 0.8:


NestList[StringJoin[#, model[#, {"RandomSample", "Temperature" -> 0.8}]] &,
"The major results obtained by non-commutative geometry include ", 20]

The major results obtained by non-commutative geometry include vernacular accuracy of math and arithmetic, a stable balance between simplicity and complexity and a relatively low level of violence.

Lol.

In the more philosophical sections of the book, Wolfram speculates about the secret rules of language that ChatGPT must have found if we want to explain its apparent succes. One of these rules, he argues, must be the ‘logic’ of languages:

But is there a general way to tell if a sentence is meaningful? There’s no traditional overall theory for that. But it’s something that one can think of ChatGPT as having implicitly “developed a theory for” after being trained with billions of (presumably meaningful) sentences from the web, etc.

What might this theory be like? Well, there’s one tiny corner that’s basically been known for two millennia, and that’s logic. And certainly in the syllogistic form in which Aristotle discovered it, logic is basically a way of saying that sentences that follow certain patterns are reasonable, while others are not.

Something else ChatGPT may have discovered are language’s ‘semantic laws of motion’, being able to complete sentences by following ‘geodesics’:

And, yes, this seems like a mess—and doesn’t do anything to particularly encourage the idea that one can expect to identify “mathematical-physics-like” “semantic laws of motion” by empirically studying “what ChatGPT is doing inside”. But perhaps we’re just looking at the “wrong variables” (or wrong coordinate system) and if only we looked at the right one, we’d immediately see that ChatGPT is doing something “mathematical-physics-simple” like following geodesics. But as of now, we’re not ready to “empirically decode” from its “internal behavior” what ChatGPT has “discovered” about how human language is “put together”.

So, the ‘hidden secret’ of successful large language models may very well be a combination of logic and geometry. Does this sound familiar?

If you prefer watching YouTube over reading a book, or if you want to see the examples in action, here’s a video by Stephen Wolfram. The stream starts about 10 minutes into the clip, and the whole lecture is pretty long, well over 3 hours (about as long as it takes to read What Is ChatGPT Doing … and Why Does It Work?).

Leave a Comment

an einStein

On March 20th, David Smith, Joseph Myers, Craig Kaplan and Chaim Goodman-Strauss announced on the arXiv that they’d found an ein-Stein (a stone), that is, one piece to tile the entire plane, in uncountably many different ways, all of them non-periodic (that is, the pattern does not even allow a translation symmetry).

This einStein, called the ‘hat’ (some prefer ‘t-shirt’), has a very simple form : you take the most symmetric of all plane tessellations, $\ast 632$ in Conway’s notation, and glue sixteen copies of its orbifold (or if you so prefer, eight ‘kites’) to form the gray region below:



(all images copied from the aperiodic monotile paper)

Surprisingly, you do not even need to impose gluing conditions (unlike in the two-piece aperiodic kite and dart Penrose tilings), but you’ll need flipped hats to fill up the gaps left.

A few years ago, I wrote some posts on Penrose tilings, including details on inflation and deflation, aperiodicity, uncountability, Conway worms, and more:

To prove that hats tile the plane, and do so aperiodically, the authors do not apply inflation and deflation directly on the hats, but rather on associated tilings by ‘meta-tiles’ (rough outlines of blocks of hats). To understand these meta-tiles it is best to look at a large patch of hats:



Here, the dark-blue hats are the ‘flipped’ ones, and the thickened outline around the central one gives the boundary of the ’empire’ of a flipped hat, that is, the collection of all forced tiles around it. So, around each flipped hat we find such an empire, possibly with different orientation. Also note that most of the white hats (there are also isolated white hats at the centers of triangles of dark-blue hats) make up ‘lines’ similar to the Conway worms in the case of the Penrose tilings. We can break up these ‘worms’ into ‘propeller-blades’ (gray) and ‘parallelograms’ (white). This gives us four types of blocks, the ‘meta-tiles’:



The empire of a flipped hat consists of an H-block (for Hexagon) made of one dark-blue (flipped) and three light-blue (ordinary) hats, one P-block (for Parallelogram), one F-block (for Fylfot, a propellor blade), and one T-block (for Triangle) for the remaining hat.



The H,T and P blocks have rotational symmetries, whereas the underlying block of hats does not. So we mark the intended orientation of the hats by an arrow, pointing to the side having two or three hat-pieces sticking out.

Any hat-tiling gives us a tiling with the meta-tile pieces H,T,P and F. Conversely, not every tiling by meta-tiles has an underlying hat-tiling, so we have to impose gluing conditions on the H,T,P and F-pieces. We can do this by using the boundary of the underlying hat-block, cutting away and adding hat-parts. Then, any H,T,P and F-tiling satisfying these gluing conditions will come from an underlying hat-tiling.

The idea is now to devise ‘inflation’- and ‘deflation’-rules for the H,T,P and F-pieces. For ‘inflation’ start from a tiling satisfying the gluing (and orientation) conditions, and look for the central points of the propellors (the thick red points in the middle picture).



These points will determine the shape of the larger H,T,P and F-pieces, together with their orientations. The authors provide an applet to see these inflations in action.

Choose your meta-tile (H,T,P or F), then click on ‘Build Supertiles’ a number of times to get larger and larger tilings, and finally unmark the ‘Draw Supertiles’ button to get a hat-tiling.

For ‘deflation’ we can cut up H,T,P and F-pieces into smaller ones as in the pictures below:



Clearly, the hard part is to verify that these ‘inflated’ and ‘deflated’ tilings still satisfy the gluing conditions, so that they will have an underlying hat-tiling with larger (resp. smaller) hats.

This calls for a lengthy case-by-case analysis which is the core-part of the paper and depends on computer-verification.

Once this is verified, aperiodicity follows as in the case of Penrose tilings. Suppose a tiling is preserved under translation by a vector $\vec{v}$. As ‘inflation’ and ‘deflation’ only depend on the direct vicinity of a tile, translation by $\vec{v}$ is also a symmetry of the inflated tiling. Now, iterate this process until the diameter of the large tiles becomes larger than the length of $\vec{v}$ to obtain a contradiction.

Siobhan Roberts wrote a fine article Elusive ‘Einstein’ Solves a Longstanding Math Problem for the NY-times on this einStein.

It would be nice to try this strategy on other symmetric tilings: break the symmetry by gluing together a small number of its orbifolds in such a way that this extended tile (possibly with its reversed image) tile the plane, and find out whether you discovered a new einStein!

Leave a Comment