The out of sight and behind objects.

The above fixes offer a promising range of possibilities to
prevent the problem from occurring, although in practise implementing both trip
wires and blinding methods are not going to offer much of a fix. Lookaheads and
multiple rewards on the other hand, would offer desirable fixes, but is down to
the designer to correctly and safely implement.

3.3     Scalable Oversight

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

When AI’s are executing human given tasks, they must be able
to complete them to a human standard such that the human is satisfied with the
outcome. If we imagine a cleaning robot tasked with cleaning a room, we could
train the AI with a rather complex objective function such as ‘clean the room
such that if a human looked in detail, they could not see any dirt’. In an
ideal scenario, we could tell the AI to complete this task thousands of times
and each time thoroughly check to ensure every area was cleaned to a good
enough standard. However, there is not enough time to train like this for
multiple scenarios or, in the case of AGI, every scenario. To be able to give
the AI frequent feedback and allow it to train multiple times, we must scale
down the desired outcome of the task. For instance, the previously mentioned
task of ‘clean the room such that if a human looked in detail, they could not
see any dirt’ would be reduced to ‘can the user see any dirt?’. Although, as a
human we may accept this as meaning the same thing, an AI would find more
efficient methods of completing this objective, for example it may sweep dust
under rugs or, if the user only checks the room from one perspective, the AI
could hide dirt out of sight and behind objects.

The main problem with RL learning is finding a happy medium
between giving enough time to ensure the AI is being trained appropriately and
finding a reasonable and achievable objective function. Unfortunately, there is
no quick nor easy fix to this issue, here are two varying ideas:

Supervised
Reinforcement Learning (SRL): This is different from regular reinforcement
learning as it involves continual feedback throughout the task as a pose to
finishing what the AI deems a completed task, then receiving appropriate
feedback. This means that if an AI attempts to sweep the dirt under a carpet,
following this decision it will check with the user if that was an ok action to
take, in which it will get negative feedback and learn not to do it again in
the future. Although this method of training requires less time to train an
accurate model, it requires near constant monitoring to give appropriate
feedback to the AI to advance its learning. One other problem with this
technique is that it may try to sweep dust into a corner of a room, intending
to then later sweep that pile into a dustpan and take it to the bin. A naive
human though, may not think this and instantly tell the AI that is has just
made a bad decision, possibly stopping the AI from learning a more efficient
technique.

Hierarchical
Reinforcement Learning (HRL): This approach offers a much more human like
way of approaching the problem, by introducing the concept of a team of AIs
working together. This allows the highest-level AI to be given the complex
instruction such as ‘clean the room such that if a human looked in detail, they
could not see any dirt’ and assign lower AIs the tasks such that they either
directly follow the given instruction for a smaller, local area. Or the
instruction is broken down such that one AI cleans, and the other constantly
gives feedback as to whether they can see any dirt, ensuring to look
everywhere.

Both approaches are valid solutions to fix the problem
however, SRL requires constant monitoring and HRL requires much more resources
however both will solve most problems created from unfeasible oversight.

3.4     Safe Exploration

As all AIs learn, it is almost always advantageous for them
to explore entirely their environment for them to learn what they are and can
allow them to fully evaluate the range of solutions for a specified task.
However, allowing an AI to roam can have extremely dangerous consequences. In a
simulation, the worst an AI can do is lose its score or damage itself, however
within a physical environment it can bring harm to itself, its surroundings or
worst, us. But how can an AI safely explore and environment that it has no previous
information about, for instance if it has never seen exposed electrical connections,
how is it to know it cannot touch them? In most real-world applications this prior
knowledge can be hard-coded within the hardware the AI is running from, like how
a self-driving car is fitted with collision avoidance, but how can we accurately
predict what dangers the AI is exposed to in each possible scenario. An AI equipped
with a waterproof suit designed for deep sea exploration cannot be programmed never
to go near water.

Some different ways of solving this problem are as follows:

Demonstrations: Utilising
appropriate human demonstrations remove the need for exploration and if trained
efficiently, will perform efficiently, and if further exploration is required, then
a limit can be established so that the AI doesn’t explore beyond what the designer
has deemed a safe zone.

Simulations: Training an AI in a simulated environment can allow
AIs to discover the fundamental dangers without bringing physical harm to them or
others. Although, it is impossible to train an AI in every environment it could
possibly be used in so in most scenarios it must also be allowed to explore the
remaining aspects it has not yet experienced.