Initial Thoughts on AI Safety Goals

4/20/2021

These may be obvious to people more familiar with the AI safety space, but for me a helpful framing of the goal of AI safety is:

  1. Make it irrational for the AI to do “unsafe” things (e.g. purposely kill all humans)
    • The AI will always be working toward some kind of “utility” function / goal of its own. We humans, as the creators of AI, must make sure it doing “unsafe” things in route to its goal is simply sub-optimal. Said another way, the AI should be coldly disinterested in doing “unsafe” things, ideally so much so that they don’t even consider them as options, like the way humans don’t even think of steel as a food option when hungry.
  2. A sub-goal (or possible path) to #1 is, to “keep the AI(s) stuck” in safe Nash Equilibria
    • Perhaps via “competition” with humans or other AI?
      • I expect this will eventually have to be vs. other AI because at some point humans will simply not be able to keep up
    • Nash equilibria are key because they are a rational way to force an “agent” / “actor” into a situation that is better globally for all agents / actors at a cost from any particular agent’s own best possible outcome (which getting back to safety might be an inherently “unsafe” option)

While this is pretty high level and there’s a lot of worked required to actually figure what a real, workable implementation is, I do believe this is a simple, but useful framing that can be helpful in thinking about this problem.

Sources

  1. https://en.wikipedia.org/wiki/Nash_equilibrium

Like This?

for More in the Future

with Comments / Questions / Suggestions