Thoughts on the Eliezer vs. Hotz AI Safety Debate


The debate was quite fun to watch, but also frustrating.

What irked me about the debate—and all similar debates—is that they fail to isolate the disagreements. 90% of the discussion ends up being heat instead of light because they’re not being disciplined about:

  1. Finding the disagreement

  2. Addressing their position on that disagreement

  3. Listening to a rebuttal

  4. Deciding if that’s resolved or not

  5. Either continuing on 2-4 or moving onto the next #1

Instead what they do (they being the General They in these debates) is produce 39 different versions of #2, back and forth, which is dazzling to watch, but doesn’t result in a #4 or #5.

It feels like a Chinese martial arts movie from the 80’s after you’ve seen a lot of MMA. Like why don’t you just HIT him? Why all the extra movements?

I think we can do better.

How I’d characterize and address each of their positions

I’m not saying I’d do better in a live debate with either of them. I could very well get flustered or over-excited and end up in a similar situation—or worse.

But if I had time and notes, I’d be able to do much better. And that’s what I have right now with this written response. So here’s what I see from each of them.

Hotz’ arguments

I’m actually not too clear on Hotz’ argument, to be honest, and that’s a problem for him. It’s also why I think he lost this debate.

He’s flashy. And super smart, obviously, but I feel like he was just taking sniper shots from a distance while mobile.

Wait, I think I have it. I think he’s basically saying:

Hotz Argument 1: We’ll have time to adjust

  1. AI’s intelligence will not explode quickly

  2. Timing matters a lot because if it moves slowly enough we will also have AIs and we’ll be able to respond and defend ourselves

  3. Or, if not that, then we’d be able to stage some other defense

My response to this is very simple, and I don’t know why Eleizer and other people don’t stay focused very clearly on this.

  1. We just accidentally got GPT-4, and the jump from GPT-2 to GPT-4 was a few years, which is basically a nanosecond

  2. The evolution of humans happened pretty quickly too, and they had no creator guiding that development. That evolution came from scratch and no help whatsoever. And as stupid and slow as it was, it lead to us typing this sentence and creating GPT-4

  3. So given that, why would we think that humanity in 2023, when we just created GPT-4, and we’re spending what?—tens of billions of dollars?—on trying to create superintelligence, would not be able to do it quickly?

  4. It’s by no means a guarantee, but it seems to me that given #1 and #2, betting against the smartest people on planet Earth, who are spending that much money, being able to jump way ahead of human intelligence very soon is a bad, bad bet

  5. Also keep in mind that we have no reason whatsoever to believe human IQ is some special boundary. Again, we are the result of a slow, cumbersome chemical journey. What was the IQ of humans 2,000 years ago compared to the functional (albeit narrow) IQ of GPT-4 that we just stumbled into last year?

Hotz Argument 2: Why would AI even have goals, and why would they be counter to us?

  1. There’s no reason to believe they’ll come up with their own goals that are counter to us

  2. This is sci-fi stuff, and there’s no reason to believe it’s true

Eliezer addressed this one pretty well. He basically said that—as you evolutionarily climb the ladder—attaining goals becomes an advantage that you’ll pick up. And we should expect AI to do the same. By the way, I think that’s exactly how we got subjectivity and free will as well, but that’s another blog post.

I found his refutation of Holz Argument #2 to be rock solid.

Now for Eliezer’s arguments.

Yudkowski’s arguments

I think he really only has one, which I find quite sound (and frightening).

  1. Given our current pace of improvement, we will soon create one or more AIs that are vastly superior to our intelligence

  2. This might take a year, 10 years, or 25 years. Very hard to predict, but it doesn’t matter because the odds of us being ready for that when it happens are very low

  3. Because anything that advanced is likely to take on a set of goals (see evolution and ladder climbing), and because it’ll be creating those goals from a base of massive amount of intelligence and data, it’ll likely have goals something like “gain control over as many galaxies as possible to control the resources”

  4. And because we, and the other AIs we could create, are competitors in that game, we are likely to be labeled as an enemy

  5. If we lots of time and have advanced enough to have AIs to fight for us this will be our planet against their sun. And if not it’ll be their sun against our ant colony

  6. In other words, we can’t win that. Period. So we’re fucked

  7. So the only smart thing to do is to limit, control, and/or destroy compute

Like I said, this is extremely compelling. And it scares the shit out of me.

I only see one argument against it, actually. And it’s surprising to me that I don’t hear it more from the counter-doomers.

It’s really hard to get lucky the first time. Or even the tenth time. And reality has a way of throwing obstacles to everything. Including superintelligence’s ascension.

In other words, while it’s possible that some AI just wakes up and instantly learns everything, goes into stealth mode, learns everything, starts building all the diamond nanobots and weapons quietly, and then—BOOM—we’re all dead…that’s also not super likely.

What’s more likely—or at least I hope is more likely—is that there will be multiple smaller starts.

A realistic scenario

Let’s say someone makes a GPT-6 agent in 2025 and puts it on GitHub, and someone gives it the goal of killing someone. And let’s say there’s a market for Drone Swarms on the darkweb, where you can pay $38,000 to have a swarm go and drop IEDs on a target in public.

So the Agent is able to research the darkweb, find where they can rent one of these swarms. Or buy. Whatever. So now there’s lots of iPhone footage of some political activist getting killed from 9 IEDs being dropped on him in Trafalgar Square in London.

Then within 48 hours there are 37 other deaths, and 247 injuries from similar attacks around the world.

Guess what happens? Interpol, Homeland Security, the Space Force, and every other law enforcement agency everywhere suddenly goes apeshit. The media freaks out. The public freaks out. Github freaks out. OpenAI freaks out.

They find the Drone Swarm people. They find the Darkweb people. They bury them under the jail. And all the world’s lawmakers go crazy with new laws that go into effect on like…MONDAY.

Now, is that a good or a bad thing?

I say it’s a good thing. Obviously I don’t want to see people hurt, but I like the fact that really bad things like this tend to be loud and visible. Drone Swarms are loud and visible.

And so too are many other instances of an early version of what we’re worried about. And that gives us some hope. And some time.

That’s my only argument for how Eliezer could be wrong about this. Basically it won’t happen all at once, in one swift motion, in a way that goes from invisible to unstoppable.

Here’s how I wish these debates were conducted

  1. Point: We have time. Counterpoint: We don’t. See evolution and GPT-4.

  2. Point: We have no reason to believe they’ll develop goals. Counterpoint: Yes we do; goals are the logical result of evolutionary ladder climbing and we can expect the same thing from AI.

  3. Etc.

We should have a Github repo for these. 🤔 

Summary

  1. I wish these debates were structured like the above instead of like Tiger Claw vs. Ancient Swan technique.

  2. Hotz’ main argument is that we have time, and I don’t think we do. See above.

  3. Eliezer’s main argument is that we’re screwed unless we limit/control/destroy our AI compute infrastructure. I think he’s likely right, but I think he’s missing that it’s really hard to do anything well on the first try. And we might have some chances to get aggressive if we can be warned by early versions failing.

Either way, super entertaining to see both of them debate. I’d watch more.



Source link