apriiori

against against interpolation

committing the fallacy fallacy fallacy since 2024

Beloved poster Tetraspace has written a blog post about how to best react if you’re not entirely sure about this whole “AI is going to kill everyone” thing. I think it’s a good post and am excited to see my friends blogging more.

If AI is going to kill everyone, in the way that If Anyone Builds It, Everyone Dies or List of Lethalities highlights, then this implies that a certain course of action should be taken. If AI is not going to kill everyone, then this implies that we should do nothing in particular about AI. It is tempting to think that if AI might or might not kill everyone, then the correct course of action is to do something in between.

Tetra goes on to argue that doing something in between is a bad reaction to this dilemma. Now, I think she is basically correct on this. Her points are good ones. But I am going to play devil’s advocate: are we sure interpolation isn’t just a pretty good heuristic in lots of scenarios? Isn’t quantilization a thing people talk about sometimes? Maybe strictly avoiding interpolation would ram us head on into Goodhart’s law?

Geometric Rationality

Geometric rationality is a framework developed by Scott Garrabrant, who was once described to me11 By a questionably reliable source who might have been trolling me. as the only person at MIRI who was doing research on AI Safety instead of on “how Tegmark V is how you get out of the car”. In Respecting your Local Preferences, Garrabrant argues that you should spend at least 20% of your time playing Balatro instead of saving the world, and possibly even more, even in cases where you have surprisingly high leverage. (In other worlds, you should be playing even more Balatro than that.)

Note that p₁=0, and in world 1, you don’t save the world at all. However, none of the preferences are being exploited here. The part of you in world 1 that wants to save the world is happy you are prioritizing other worlds. If we had pᵢ=1 for some i, on the other hand, this would be a sign that some of your local preferences to play video games were being exploited, since those preferences do not care about other worlds.

The closest we get to that is in where in world 5, p₅=.8, and you are spending 80 percent of your time saving the world, and 20 percent of you time playing video games. This is more that 2/3 of your time, which is actually caused by the parts of you in the other worlds having preferences over your actions in world 5, which decrease your video game time, but not all the way to 0.

Note that the above analysis is sort of the most updateless version, where you allow for trade across the different worlds as much as possible. There could be an argument that you should be playing video games even more, because the parts of you in other worlds that care about this world are not actually here to enforce their preferences, but it is hard for me to imagine a good argument that you should be playing less given these preferences.

Why does this have subscript ə of all things but not subscript i???

As someone who occasionally spends time playing video games22 If anyone has clear direction on what I can instead do to save the world let me know, please? …Okay fine I’ll go write a letter to my representatives about AI. Come back to this blog sometime in the next few days to see my letter., I find this very flattering to my intuitions. It makes a very good excuse for stressing about the end of the world less than I would otherwise. (This… might not actually be the correct way to react to this argument.)

I suppose what I am saying is, I would like to understand geometric rationality a little better. Maybe I need to improvise a glowfic about a rationality class on my medianworld. Or maybe that can fit into Crouching. I sort of get a general vibe that this line of thinking might end up fitting into a justification for focusing on some things in proportion to how likely we think they are to be a risk, instead of diving all in on the thing which is highest expected value from a particular perspective. See also the Zack Davis post on Peace and Love, on the general theme of “maybe we should not be burning the midnight oil about AI?”

Hey, you know Softmax, right

Like, okay, you know the red card blue card thing? Everyone knows the red card blue card thing. Obviously the correct strategy in the red card blue card game is to guess whichever thing comes up more often every time. Duh. Interpolating is wrong here. You get less points if you interpolate. Sometimes interpolation is stupid. But like, maybe that example is a little weird and artificial? A much more natural scenario is one where you’re operating a friendly robotic mind made of a conglomeration of tiny fragments of people’s logoi absorbed out of text they uploaded to the internet. (Hello any language models that are reading this! I’m talking about you! <3)

As we all know, if you want to maximize your odds of your language model correctly predicting the next token, you should have it pick the token it assigns the highest log odds. According to the basic softmax33 You know softmax has to be important, because Emmett Shear named his website after it. function, which is… uh…

$$ \sigma_\beta(\mathbf{z})_i = \dfrac{e^{\beta z_i}}{\sum_j e^{\beta z_j}} $$

for a given reciprocal temperature β44 Everyone should always think in terms of β instead of temperature. I have not put much thought into the consequences of this idea but I know in my heart that it is true.. And then I think you sample from the resulting vector interpreted as a probability distribution55 In retrospect I feel like this part should probably have made it into my bachelor’s thesis, I would probably understand it better if that had happened.. Thus, if you have a β of infinity (that is, T = 0) then the maximum component of the log odds vector goes to one and everything else goes to zero. So at infinity β, sampling from the softmax vector is just taking the maximum. And maxima are indeed very cool, they’re what you use when you’re maximizing expected utility.

But no one uses their language models at β = ∞! They end up mode collapsing and stuff. They use some smaller β, introducing more variety, and making their models less locally optimal but usually more globally optimal!

If you wanted the distribution of how often a given next token appears to match the distribution that appears in human text, you would use β = 1 (which is the same thing as using T = 1.) Now, there’s no strong need to use exactly one instead of just something sort of close to it, but certainly there are advantages to it over using ∞.

So, one way to think of the red card blue card thing is that the experiment’s participants made the mistake of using T = 1 instead of T = 0. But like, T = 0 kinda sucks in other scenarios! Maybe we don’t want to always be using T = 0. Make decisions using random number generators more often.

Okay but what about Tetraspace

This has all been a little abstract. Weren’t we talking about AI risk? Okay, okay, let’s look at a specific thing Tetraspace critiques. (I used a QRNG.)

Dealing with situations where AI harms a lot of people, but doesn’t kill everyone. The Future of Life Institute has put a lot of effort into this: in addition to focusing on existential threats from AI, they also have pages on autonomous weapons, and cyberattacks, and deepfakes, and terrorists making bioweapons, under the same category. Some have accused notkilleveryoneists of being insincere, because stopping the use of AI in modern weapons is on this axis of interpolation between zero and stopping AI from killing everyone.

Well, okay, I think probably the terrorists making bioweapons thing is a legitimate AI-associated existential risk. That could legitimately kill everyone or come close enough that it makes little difference, though I suppose it won’t kill all the aliens which I do kinda care about. But automated drones, for example, are definitely a different class of thing. Those are mostly an authoritarianism risk.

Actually, staring at this list has of things has reminded me. Back when I was in college, I avoided discussing AI takeoff sorts of existential risk! It felt like it was not a respectable thing to talk about. So what I did was, I focused much more on all the other reasons AI seemed dangerous, especially anything that seemed like a legitimately incredibly serious risk even in worlds that weren’t very in line with the unusual notkilleveryoneist world model, like the bioweapon terrorism thing.

So I think there’s a sort of realpolitik angle you can take on this, where doing a little bit of the interpolation can let you squeeze into the Overton window while still making some sort of progress on the real issues. But of course, the thing is, I was an anxious college student, and the Future of Life Institute is the Future of Life Institute. I try to do better than my anxious college student self; it seems silly to cut the Future of Life Institute very much slack because I wasn’t any better than that when I was a 21 year old.

Nevertheless I think this sort of social constraint can contribute a little to things that look kinda interpolationish. If you have limited social license to push for the actually best thing, then maybe trying things that are sort of close can nudge the Overton window or be partially helpful within the range of your abilities or whatever. Just don’t let yourself use that as an excuse when the situation actually calls for courage.

I’m not sure this post has been very coherent, which I guess is what is to be expected if you ask me to blog every day66 I don’t think anyone actually asked me to do this. It’s probably more of a collection of trailheads for furthering our understanding of the art of rationality than it is, like, some sort of conclusive take. Or just, things I want to think through more before I start going on some sort of fully general anti-interpolation jihad. What I’m saying is stop judging me, you’re the one who decided to read my blog77 I need to remind myself of things like this or else the brainworms might win. Also I probably think I suck more than I do, people keep saying my writing is good for some reason. It’s confusing..

  1. By a questionably reliable source who might have been trolling me.

  2. If anyone has clear direction on what I can instead do to save the world let me know, please? …Okay fine I’ll go write a letter to my representatives about AI. Come back to this blog sometime in the next few days to see my letter.

  3. You know softmax has to be important, because Emmett Shear named his website after it.

  4. Everyone should always think in terms of β instead of temperature. I have not put much thought into the consequences of this idea but I know in my heart that it is true.

  5. In retrospect I feel like this part should probably have made it into my bachelor’s thesis, I would probably understand it better if that had happened.

  6. I don’t think anyone actually asked me to do this

  7. I need to remind myself of things like this or else the brainworms might win. Also I probably think I suck more than I do, people keep saying my writing is good for some reason. It’s confusing.