nor's blog

By nor, 6 months ago, In English

As usual, content here: https://nor-blog.codeberg.page/posts/2024-07-28-thoughts-on-the-present-and-future-of-ai-and-problem-solving/. Consider subscribing to the RSS feeds for more posts!

Bloopers: while I was trying to make the point that AI is good for basic things like coming up with a TL;DR by asking ChatGPT to summarize my post, AI disappointed me yet again with the clearly wrong and generic summary (and its references, of course).

Or is it secretly sentient with a sense of humour? Read more to find out I guess...

image

  • Vote: I like it
  • +63
  • Vote: I do not like it

»
6 months ago, # |
  Vote: I like it 0 Vote: I do not like it

NEAR AI is future!!!!

»
6 months ago, # |
  Vote: I like it 0 Vote: I do not like it

NEAR AI is future!

»
6 months ago, # |
  Vote: I like it +3 Vote: I do not like it
Chatgpt broke up with your blog
»
6 months ago, # |
  Vote: I like it 0 Vote: I do not like it

NEAR AI is future!!

»
6 months ago, # |
  Vote: I like it 0 Vote: I do not like it

I wonder if there're anyone REALLY read your blogs / essays with that long

  • »
    »
    6 months ago, # ^ |
      Vote: I like it +6 Vote: I do not like it

    Yes

  • »
    »
    6 months ago, # ^ |
      Vote: I like it +12 Vote: I do not like it

    yes

    but not me

  • »
    »
    6 months ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    I get pretty insightful conversations (quite unlike most comments on this comments section) out of comments over email, mostly from people outside the competitive programming community. Initially I was planning to not post this on CF, but talking to some of my competitive programmer friends made me decide to write about AlphaCode in my post too, and post it here since CF has had quite a few discussions on this topic (and lots of people who misunderstand AI).

»
6 months ago, # |
Rev. 3   Vote: I like it +4 Vote: I do not like it

Main points from my comment in a group discord:

does lack of thinking in axiomatic logic imply lack of intelligence? i think cats for example have some level of intelligence, no where near human level, but they also cannot perform similar abstract axiom-based reasoning

from skimming, i def agree with deepmind has been elusive and overembellishing stuff lately at least but i think a big part of his cynisism is a fundamentally different believe in what qualifies as logic, and therefore what counts as a good step in simulating logic

there are a lot of statements throughout like

We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.”

but more and more i believe generally human problem solving is closer to a kind of brute force than not, just with some good heuristics baked in and u hope after long enough u mash together and tweak things into something slightly new and call it original. what does nor exactly want to be satisfied a program is also performing "human-like original discovery"

i think the impressive part of alphageometry and alphaproof if it has any advantage over purely mechanical like wu's method is that we're now able to integrate small parts of non-sequential heuristics just like human problem solving. to me that means we're making big step towards human solving, but to nor it seems by itself that is non-factor

  • »
    »
    6 months ago, # ^ |
    Rev. 3   Vote: I like it +3 Vote: I do not like it

    Intelligence is definitely something that can be debated upon to no end, and you might have noticed that I left off that discussion by saying that one of the major aspects of intelligence consists of being able to incorporate logic rigorously into what the model says, and being careful about adversarial/out-of-sample inputs (the latter is especially important, and animals have this part too). I did not call this constraint "human-like reasoning" here because, well, that's easily misinterpretable to be saying that the AI does reasoning like a human in its methods.

    I don't say that AI must reason like a human in the sense of a Turing test. I am saying that its reasoning should hold up to the high standards we have set for it. People involved in the IMO Grand Challenge understand this, and I think it is a very reasonable constraint on a machine that claims to be AGI. At the very least, reasoning must be able to encompass all the math we have, including reasoning with different axioms and systems of logic. Of course, I don't see AI becoming the next Grothendieck yet, but there is some value in having an AI that can "understand" math in some sense, and to be able to do that, it must have a system for logic that never produces a wrong conclusion (but can produce conjectures and mark them as such).

    Coming up with a new language (abstractions, logic systems and such) in math is something I hope AI is able to achieve in the future though it does not seem anywhere close now — which is a part of the reason why I suggested, as a first step, a language (something like Cur) for a proof assistant that allows you to build smaller domain specific languages for the task, because it will help the model learn better meta strategies on a higher level of abstraction (the success of abstraction is clear from stuff like category theory, software engineering and so on).

    As far as the methods are concerned, one of my main suggestions was that the methods are already at their limits, and we need more directed search, somehow, to be able to deal with the super large branching factor. Considering how the attention mechanism was derived from a high-level understanding of how words fit together in a context, in a manner that was arguably human-motivated, why not use the lowest high-level heuristics to solve problems?

    It might seem like patchwork — we are trying to patch the inability of proof engines to "generate" transient theorems that will help the final proof. However, my claim is that learning this is in fact half of the remaining part of reasoning that we can't capture using current methods. There are, somewhere out there, some building blocks (in the sense of fundamental nodes of a graph) — explicit or encoded in an architecture, that will allow us to construct generalized policies for traversal in order to discover more of said non-sequential heuristics better in the first place.

    Purely mechanical Wu's method has been adapted with things like deductive engines (DD+AR approaches) that use databases and algebraic rules to give very human readable proofs — so it is not really the AI that brings the human heuristics. I think it's more an issue of flawed accounting on their part, that I wanted to set straight while talking about AlphaGeometry.

»
6 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Guess I should point out that the chatgpt browsing function is off (fortunately I never used it). If you simply copy paste your article the summary turns out fine.

Result
  • »
    »
    6 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Another funny thing is that ChatGPT does support sending additional word files, but it doesn't seem to be that competent either. I told it the other day to summarize the History material that my teacher gave me (the file contains no image, no additional symbols, just pure text, mind you, so it probably doesn't require that much computation power to process), and it just spit out pure gibberish.

»
6 months ago, # |
  Vote: I like it 0 Vote: I do not like it

You mention LLMs are bad at counting and arithmetics, and you say

and that is simply one consequence of their failure to encode basic axioms (in any of their many equivalent forms).

What kind of axioms do you mean here? It seems to me it's more of a problem of the tokenization; 'cause I have the feeling that the structure of composed attention is ideal for some arithmetics (at least with a fixed number of digits). For instance, for multiplication, let's say you have two numbers A = a0 + a1*x^1 + a2*x^2 and B = b0 + b1*x^1 + b2*x^2 (I'm assuming the coefficients of the polynomials are the tokens produced by the tokenizer).

Then, a single layer of cross attention should be able to learn the product of all pairs {a_i * b_j * x^{i+ j}} of digits, just because it naturally correlates each pair of tokens. Perhaps you'd need another layer to permute the pairs in the right positions and then the model then should be able to learn addition on the corresponding powers of x. Addition in particular has been partially studied in mechanistic interpretability (https://www.youtube.com/watch?v=OI1we2bUseI)

Obviously, even if we can simulate permutations through standard DL operations, it doesn't mean a NN can learn that with the standard training methods on lots of data. But intuitively, it seems to me they are possible to do.

Probably something on the realm of neural algorithmic reasoning (https://arxiv.org/pdf/2105.02761) is needed there.

The same for combinatorics. Many problems in combinatorics are easily modelled through some recurrence which competitive programmers call Dynamic Programming (ignoring hard GF problems or stuff with Symmetry Groups, ...). Papers like this one tackle learning classical algorithms like Bellman-Ford's (https://arxiv.org/pdf/2203.15544). This algorithm is applying arithmetics over a graph in the tropical semi-ring. It seems to me that similar ideas could be applied to other "arithmetics", moving us from optimisation problems to counting problems.

Lastly, I wonder what are your takes on the fields of Geometric Deep Learning and the more recent Categorical Deep Learning.

  • »
    »
    6 months ago, # ^ |
    Rev. 3   Vote: I like it +10 Vote: I do not like it

    Thanks for the greatly detailed response! I'll try to address your concerns as much as possible.

    What kind of axioms do you mean here?

    I mean basic math axioms, like those for arithmetic. You can extend this to rules that you can derive for different operations (as in, base 10 addition for example).

    I have the feeling that the structure of composed attention is ideal for some arithmetics (at least with a fixed number of digits).

    This is correct. And GPTs seem to have some support for addition — with the correct embeddings that is (for example, this paper talks about these).

    There are a few issues with such arguments when generalizing this to multipurpose models that claim emergent behaviour, though:

    • Just because something is expressible, does not mean that it will be learnt as easily. We generally need the right representations for things (people who say the model will learn representations automatically — refer to the paper I just linked). ML models are not oracles that will magically learn everything you give them, we need to treat them like a kid and train them on things represented in ingestible forms.
    • When you have too many moving parts, it is not clear what parts of the model capacity are being used for what, and performance can deteriorate on putting them together. In most cases I have seen, trying to get one thing to do too much usually leads to the model becoming worse at everything. So a model might be able to do 10 tasks well when tuned on each of them, but not all of them at once unless you train them separately and stack them on top of each other.
    • Training algorithms have limitations. There are lots of papers you can find on hardness bounds on learning the set of expressible functions by a neural network, even if the architecture is relatively simple (for example, consider this paper). The common argument is that weights have some properties that evade such learning issues generally (and are not perfectly random), but this paper shows that with high probability, there is no generic property that holds with high probability in most random networks and allows efficient learning. The trouble begins when you combine that with the fact that if your LLM was able to reason perfectly, it could hypothetically solve the Halting problem, unless it refuses to give us an output for some superset of undecidable tasks. And we all know that LLMs are not good at not hallucinating.
    • Peano arithmetic (equivalent to elementary school arithmetic) is Turing complete. So as you said, the constraint of fixed width comes into the picture. In my blog too, I mention that for these models to be able to prove things, it is necessary for them to have arbitrarily large context lengths. Same reason. Otherwise the model is just a very large DFA (or perhaps pushdown automaton) with complex rules.

    Obviously, even if we can simulate permutations through standard DL operations, it doesn't mean a NN can learn that with the standard training methods on lots of data. But intuitively, it seems to me they are possible to do.

    I agree. Back in 2021 when there was only GPT2, I was wondering — what if we make an LLM that emulates a Turing machine? Then in 2023, there came this paper which proved that LLMs, if given memory storage, can emulate Turing machines (the language models are admittedly brittle, by design though, and I still think this is a good direction). There are of course things here that can be improved, but that's a discussion for a later time.

    Probably something on the realm of neural algorithmic reasoning (https://arxiv.org/pdf/2105.02761) is needed there.

    That is quite similar to (but is a specialization of) my suggestions for different solver nodes for different reasoning "types" and it does seem promising. From my understanding, there is already a decent amount of headway in this direction (for instance, here, here, here and here). However, using this for doing basic arithmetic seems like an overkill, though I acknowledge that if we want that can do non-trivial arithmetic and reasoning, something more general is needed.

    Many problems in combinatorics are easily modelled through some recurrence which competitive programmers call Dynamic Programming (ignoring hard GF problems or stuff with Symmetry Groups, ...).

    I think there is a clarification in order here — when I was referring to combinatorics, I was referring to it in the IMO sense, and not the CF sense. The combinatorics section in the IMO shortlist is basically a place for all the ad hocs and research problems to get dumped, and as a result, that part of the shortlist is notoriously hard for automated theorem provers.

    It seems to me that similar ideas could be applied to other "arithmetics", moving us from optimisation problems to counting problems.

    I did not quite understand this part, can you elaborate?

    Lastly, I wonder what are your takes on the fields of Geometric Deep Learning and the more recent Categorical Deep Learning.

    Well, I was convinced in uni that graph ML will be the future, simply because it has such cool and general properties. Geometric DL uses the fact that graphs are just discretized manifolds as well as discrete structures with an absurd amount of expressive power, and ends up doing some really cool stuff with all that power, so I think it will do a lot of non-trivial things in the future. In fact, my recommendation in the blogpost was partly inspired from graph learning and partly from my blog post on learning (and later turned out to be some sort of meta RL).

    As far as categorical deep learning is concerned, theoretically it looks really cool and sensible, and should greatly help in motivating high level architectures based on what properties we want them to have. However I feel that we tend to overestimate what we know about data, and it might not be the most practical for us at the moment (just as theoretical ML is lagging behind practical ML). And even though this is the approach I would like the community to take, I don't think it is the approach that will be the easiest (albeit possibly being the one towards AGI with the least detours). As an aside, UMAP is a pretty concrete application of category theory to ML stuff, and I would recommend reading it to anyone who knows category theory and is curious about its applications in ML.