*The organizers thank Andrej for the word choice (& mindset) of “sandbox”.
“Sandbox” $\approx$ a minimal setup that helps with understanding.
NeurIPS page
Speakers: Ashok Vardhan Makkuva (EPFL), Bingbin Liu (Simons Institute), Jason Lee (Princeton)
Table of content
Outline (with slides)
Introduction (10min)
intro.pdf
- Why sandboxes & structured data?
Part I: Representability (50min)
tutorial_part1_representation.pdf
- Architecture of interest: Transformer and RNN.
- Tools for bounding the size of the construction
- Note: “upper/lower bounds” here refer to bounding the size of the construction, as opposed to bounding the set of representable functions (where the upper/lower is flipped, e.g. upper bound means that Transformers cannot represent something more complicated).
- Upper bound: connection to automata.
- Lower bound: circuit complexity (depth lower bound); communication complexity (width lower bound).
- Implications:
- Understanding architecture design: depth-width tradeoff.
- Comparing architectures: Transformer vs RNN (SSM).
- Improvement: Chain-of-Thought, hybrid architectures.
(10min Q&A / break)
Part II: Learning and optimization (50min)