Opinion

Substrate: Intuitions

​This post and the related sequence were written as part of the AI Safety Camp project “MoSSAIC: Scoping out Substrate Flexible Risks.” This was one of the three projects supported by, and continuing the work of, Groundless. Specifically, it develops one of the key concepts referred to in the original MoSSAIC (Management of Substrate-Sensitive AI Capabilities) paper (sequence here). Matthew Farr and Aditya Adiga co-mentored the project; Vardhan, Vadim Fomin, and Ian Rios-Sialer participated as team members.In a previous post and paper, we informally sketched out a definition of substrate as follows:”the (programmable) environment in which a system is implemented. In other words, it is the essential context that enables an algorithm to be implemented beyond the whiteboard.”Or more informally,”that (layer of abstraction) which you don’t have to think about.”We gave several examples of differences in this programmable context producing differences in measurable and relevant aspects of computation. These included the adoption of GPUs allowing networks to train at scale, or how quantum computers operate on entirely different algorithms from their classical counterparts.In the following posts, we expand upon this concept more thoroughly, giving (i) an informal, intuitive introduction to substrate and it’s role in computation, (ii) some case studies that argue it’s salience in cybersecurity and AI safety, and (iii) our initial formal characterization.Substrates (from the ground up) There is a principled sense in which the physical and architectural substrate of a computation shapes its behavior in ways that are invisible on the purely computational.We start with pure, unembedded mathematics. These are numbers as Plato intended them. In this case, there is no substrate, no wider context needed to implement the numbers. (We neglect the embedded nature of the cognition in which the numbers are placed.)Thus, we can say that 3 = 3 in all possible respects within this Platonic reality.Now we consider a simple embedding of these Platonic ideals in physical reality. We write these numbers on a sheet of paper. Here, the paper is part of the substrate. This might seem trivial in the above example, but in more complex calculations, the paper becomes an essential part of what makes the computation possible. We note that we start to characterise the differences between these two numbers, in that both live at different locations on the sheet.To see how these differences become increasingly relevant in computation, let’s consider now two separate pieces of paper. One is on my desk, the other is kept at my local library.Now these two 3s, despite having the same mathematical/formal meaning, perform very differently when I want to use them in computation. If I’m working at my desk and I want to compute something using the number written on that sheet of paper next to me, I can complete this very quickly. However, if I need the number on the other sheet, the one at the library, my calculations will take considerably longer.Computational substratesNow we generalize this to actual modern computer systems. Behind the layers of abstraction and useful ergonomics of modern computers, we have something remarkably similar to the above example of sheets of paper located across the city. Instead of sheets of paper, we have addresses in the computer’s memory. These are updated and retrieved for computational work via code, itself a layer of abstraction that hides the various stages of assembly sitting between the user and the physical changes she’s making to her computer.Instead of 3’s located at a desk and in the local library, we now have 3s located at different memory addresses. The locations of these have an noticeable and often exploitable effect on the computation performed.For instance, and for those that don’t know, a CPU has a memory hierarchy structured like this:The L1 cache is the smallest and most immediate. It has a latency of ~1 ns. Think of it as a book open on your desk. It is small but can be accessed very quickly.The L2 cache is next. It is larger but needs more time to access (~3–5 ns). Think of it as a stack of closed books on your desk. They contain more information but you have to open them and find the correct page if you need to access the information they hold.The L3 cache is larger still. Multiple cores will access it, and its latency is ~10–20 ns. Think of it as a bookshelf in your room: you have to leave your desk and search through the shelves to find the information you need.And so on…The point is this: whether a 3 is stored in the L1 cache vs the L3 cache makes a non-trivial difference to the computation performed, despite the formal equivalence. These differences by themselves are trivial, on the order a few nanoseconds. But, as we scale the systems to increasingly complicated computations, these differences count.Hardware engineers have come up with various tricks by which to exploit these differences to improve performance.MergeSort vs QuickSort. Two algorithms of the same computational complexity perform differently in real systems. Quicksort partitions arrays in-place and sequentially, using local memory. MergeSort accesses scattered data across multiple caches during merging, causing frequent cache misses. Data-Oriented Design. In object-oriented code, each entity stores its fields together in memory. A physics system iterating over positions must load entire objects to update single fields. This wastes cache lines on irrelevant data. Data-oriented design stores each field type contiguously: all positions in one array, all velocities in another, and so on. Iteration streams linearly through memory, which can be 2–10× faster.To summarize. Two entities can be formally equivalent but can be meaningfully different when implemented in real computation. Above we point at the performance aspects, though other differences include the security/vulnerability, interpretability, stability and so on. Whilst these differences are often trivial, at scale they accumulate, having a meaningful impact on the tractability of certain formal entities (numbers at locations, algorithms using certain caches).We use the term “substrate” to capture this essential context.Discuss ​Read More

Substrate: Intuitions

​This post and the related sequence were written as part of the AI Safety Camp project “MoSSAIC: Scoping out Substrate Flexible Risks.” This was one of the three projects supported by, and continuing the work of, Groundless. Specifically, it develops one of the key concepts referred to in the original MoSSAIC (Management of Substrate-Sensitive AI Capabilities) paper (sequence here). Matthew Farr and Aditya Adiga co-mentored the project; Vardhan, Vadim Fomin, and Ian Rios-Sialer participated as team members.In a previous post and paper, we informally sketched out a definition of substrate as follows:”the (programmable) environment in which a system is implemented. In other words, it is the essential context that enables an algorithm to be implemented beyond the whiteboard.”Or more informally,”that (layer of abstraction) which you don’t have to think about.”We gave several examples of differences in this programmable context producing differences in measurable and relevant aspects of computation. These included the adoption of GPUs allowing networks to train at scale, or how quantum computers operate on entirely different algorithms from their classical counterparts.In the following posts, we expand upon this concept more thoroughly, giving (i) an informal, intuitive introduction to substrate and it’s role in computation, (ii) some case studies that argue it’s salience in cybersecurity and AI safety, and (iii) our initial formal characterization.Substrates (from the ground up) There is a principled sense in which the physical and architectural substrate of a computation shapes its behavior in ways that are invisible on the purely computational.We start with pure, unembedded mathematics. These are numbers as Plato intended them. In this case, there is no substrate, no wider context needed to implement the numbers. (We neglect the embedded nature of the cognition in which the numbers are placed.)Thus, we can say that 3 = 3 in all possible respects within this Platonic reality.Now we consider a simple embedding of these Platonic ideals in physical reality. We write these numbers on a sheet of paper. Here, the paper is part of the substrate. This might seem trivial in the above example, but in more complex calculations, the paper becomes an essential part of what makes the computation possible. We note that we start to characterise the differences between these two numbers, in that both live at different locations on the sheet.To see how these differences become increasingly relevant in computation, let’s consider now two separate pieces of paper. One is on my desk, the other is kept at my local library.Now these two 3s, despite having the same mathematical/formal meaning, perform very differently when I want to use them in computation. If I’m working at my desk and I want to compute something using the number written on that sheet of paper next to me, I can complete this very quickly. However, if I need the number on the other sheet, the one at the library, my calculations will take considerably longer.Computational substratesNow we generalize this to actual modern computer systems. Behind the layers of abstraction and useful ergonomics of modern computers, we have something remarkably similar to the above example of sheets of paper located across the city. Instead of sheets of paper, we have addresses in the computer’s memory. These are updated and retrieved for computational work via code, itself a layer of abstraction that hides the various stages of assembly sitting between the user and the physical changes she’s making to her computer.Instead of 3’s located at a desk and in the local library, we now have 3s located at different memory addresses. The locations of these have an noticeable and often exploitable effect on the computation performed.For instance, and for those that don’t know, a CPU has a memory hierarchy structured like this:The L1 cache is the smallest and most immediate. It has a latency of ~1 ns. Think of it as a book open on your desk. It is small but can be accessed very quickly.The L2 cache is next. It is larger but needs more time to access (~3–5 ns). Think of it as a stack of closed books on your desk. They contain more information but you have to open them and find the correct page if you need to access the information they hold.The L3 cache is larger still. Multiple cores will access it, and its latency is ~10–20 ns. Think of it as a bookshelf in your room: you have to leave your desk and search through the shelves to find the information you need.And so on…The point is this: whether a 3 is stored in the L1 cache vs the L3 cache makes a non-trivial difference to the computation performed, despite the formal equivalence. These differences by themselves are trivial, on the order a few nanoseconds. But, as we scale the systems to increasingly complicated computations, these differences count.Hardware engineers have come up with various tricks by which to exploit these differences to improve performance.MergeSort vs QuickSort. Two algorithms of the same computational complexity perform differently in real systems. Quicksort partitions arrays in-place and sequentially, using local memory. MergeSort accesses scattered data across multiple caches during merging, causing frequent cache misses. Data-Oriented Design. In object-oriented code, each entity stores its fields together in memory. A physics system iterating over positions must load entire objects to update single fields. This wastes cache lines on irrelevant data. Data-oriented design stores each field type contiguously: all positions in one array, all velocities in another, and so on. Iteration streams linearly through memory, which can be 2–10× faster.To summarize. Two entities can be formally equivalent but can be meaningfully different when implemented in real computation. Above we point at the performance aspects, though other differences include the security/vulnerability, interpretability, stability and so on. Whilst these differences are often trivial, at scale they accumulate, having a meaningful impact on the tractability of certain formal entities (numbers at locations, algorithms using certain caches).We use the term “substrate” to capture this essential context.Discuss ​Read More

Leave a Reply

Your email address will not be published. Required fields are marked *