NeurIPS 2024 Tutorial: Sandbox* for the Blackbox: How LLMs learn Structured Data

*The organizers thank Andrej for the word choice (& mindset) of “sandbox”.

“Sandbox” $\approx$ a minimal setup that helps with understanding.

**NeurIPS page** (the video is now public! :))

Speakers: Ashok Vardhan Makkuva (EPFL), Bingbin Liu (Simons Institute), Jason Lee (Princeton)

Table of content

Outline (with slides)

Introduction (10min)

intro.pdf

Part I: Representability (50min)

tutorial_part1_representation.pdf

Architecture of interest: Transformer and RNN.
Tools for bounding the size of the construction
- Note: “upper/lower bounds” here refer to bounding the size of the construction, as opposed to bounding the set of representable functions (where the upper/lower is flipped, e.g. upper bound means that Transformers cannot represent something more complicated).
- Upper bound: connection to automata.
- Lower bound: circuit complexity (depth lower bound); communication complexity (width lower bound).
Implications:
- Understanding architecture design: depth-width tradeoff.
- Comparing architectures: Transformer vs RNN (SSM).
- Improvement: Chain-of-Thought, hybrid architectures.

(10min Q&A / break)

Part II: Learning and optimization (50min)