-->

Monday, November 26, 2012

A Small Experiment in Emergent Behavior


So, I am not interested in robotics simply to create neat mechanical platforms that do interesting things (although, that's a lot of fun). I have other motives for doing this. One of the primary questions I want to answer is this: can I, as a layman, create a robotics platform, and then write code for it, that in any way models cognition, or in some way models living processes that interact with their environment in intelligent ways?

It's a big question. A really big question.

If you take away the “as a layman” phrase, some very bright people have been tackling this very problem. It starts small, with such things as the Braitenberg's vehicles, a gedanken on minimalistic open-loop control systems and artificial intelligence, and ends up with things like Asimo, the DARPA challenges, and things such as what Boston Dynamics is doing.

Along the way we meet people such as John McCarthy and the LISP programming language, other languages such as PROLOG, the concept of “strong” AI, SLAM algorithms, emergence, and many other attempts at forming machines that in some way act intelligently, attempts at understanding cognition, and so forth.

In a way, mankind has been on this search for a very long time, as evidenced by such things as the golem legends.

My latest attempt structures around emergence. Here's the basic idea:

1. Let's define a vehicle which has a limited number of moves it can do. Such as, it can move forward, back, and turn in any direction as long as the turn it makes is some multiple of 45. That means 0 degrees (no turn), 45, 90, 135, and 180, in either direction. We'll also let it turn 22.5 degrees (half of 45), so it can make small corrections.

2. For this vehicle, let's restrict the inputs. The robot will likely have some pretty varying input available on its sensors, but we'll do a bit of sensor fusion to calm down what we see and put it into buckets. For instance, let's say we have a sonar or IR sensor which has a range of 3 to 100 inches, but is sensitive in the millimeter range. Instead of returning things like 78.4589998 inches, let's put it into buckets such as “very near”, “near”, “middle range”, “far”. We'll do sensor fusion and return those strings instead of the raw numbers. Same for all the sensors. A compass sensor may, instead of returning the raw theta value for its pose, be distilled down into compass rose points. “45”, “NE”, or “Northeast” would all be fine here, as long as it's in buckets and not things like “47.35”. This will allow us to reason in more general terms about the robot's current state. We could expand this to any number of sensors and sensor types – and is similar to how the human brain generalizes things in, say, the visual cortex, where things proceed from more chaotic and raw values (retina, optic nerve, V1), to more abstract, more general values (dorsal and ventral streams on through to the prefrontal cortex/executive function areas). This would roughly model things that happen before V1, and we're not doing anything even remotely close to the PFC here. We're also being pretty simplistic – the process does not keep state of any sort, does not deal with such problems as visual flow, does not mimic working memory, etc. It just throws sensor data into buckets.

3. From here, let's create a mechanism that can define rules. Rules will consist of a set of triggers (the buckets from point #2), and a set of actions (the actions from point #1). The general idea is this: we will continually get sensor data from our environment, sorted into the buckets described above. When the distilled sensor data matches the triggers for the rule, a set of actions will be executed. In reality, getting a complete match on the sensor values we're looking for is probably not going to happen very often, so we'll define a minimum threshold – if, say, 75% of the triggers match the current sensor data, that's good enough -- run the actions. The actions will be executed sequentially. The list of actions can be arbitrarily long – but, I have found by experimenting and tweaking, for the program I wrote, two actions are ideal. The first consists of a turn (or no action), and the second consists of a movement (or no action). For instance, an action list may be something like {“turn left 45 degrees”, “go forward” }. The actions defined are simplistic (turns, forward or backward, or “don't do anything”), but could easily be more complex actions (“use the A* algorithm to find a path out of the pickle you've gotten yourself into”)

4. The next thing we'll need to do is to determine whether executing a rule actually worked, and for that we'll need a goal. After executing the rule, we will determine how close we are to the goal state, and we will also determine whether we are closer or farther away from the goal state than before we executed the rule. To do this we will measure the current state (for some arbitrary goal), execute the actions, and then, after we have executed the actions, again measure the robot's perceived state against the goal state we define, and see if things got better, got worse, or stayed the same. This will determine what's called the fitment of our rule. In the long run, rules with a better fitment will survive (and spawn new rules), while rules that have a poor fitment will be discarded. For my experiment, I decided to define the goal state as having all distance sensors be registering equal distance, but having the one pointing directly forward reporting “far”. This has the effect that, if you are in a circle looking out (all sensors are reporting the same distance except the front one, which doesn't see anything), you are in the goal state. As it turns out, the goal state is impossible to reach – which is good! That tension will make the robot come up with interesting results, and hopefully ones we have not anticipated. If so, if it does that, then this would be an example of emergence. (very cool book on related subjects: The Computational Beauty of Nature by Gary William Flake)

5. Next, we'll need a way to create new rules. There's several ways we can go about this, and it brings up the question of whether our robot has a priori knowledge of its environment, or not. Some animals do – for instance, newly born calves know how to stand within minutes of being born, insects automatically know how to fly, and so forth. On the other hand, you were not born with the ability to speak, or walk, or many other things – but you figured it out. The question here is, do we build in some base rules and watch them grow and/or be discarded, or do we start with tabla rasa, and then allow things to build? The first is faster but doesn't tell us as much, the second has the ability to be far slower and either tell us nothing, or produce very interesting results. In the end, I decided to go for the second approach – that being starting with no rules and to let things grow by themselves.

But how do we create new rules?

At first, I thought, let's assume infinite processing power and storage. Given that, and removing it as a concern, I can generate all possibilities of the input states, since they are finite (if we constrain the length of the list of triggers), and the actions we can do are also finite – after all, we are dealing with eight compass rose points we can turn to in the first slot, and then two actions, forward and backward, in the second slot. While large, the list is finite, so, let's flip things on their head a little, and start with the set of all possible rules. This approach would be similar to the approach given for linguistics by Noam Chomsky in Syntactic Structures, where he starts with the set of all possible words in a language, and then defines recursive rule sets which can generate all possible sentences in a language. His notation is also similar in form to Backus-Naur Form (BNF), although BNF generally describe finite state machines (but not always) and Chomsky's generative grammars produce infinite sets. What we are doing here is closer to a finite state machine, in that the set is indeed finite, since the length of the lists (triggers and actions) are constrained, and since the rules are not recursive (there are no subrules within rules), and the rules are essentially a directed acyclic graph (DAG) restricted to a depth of one. A good idea, but it produced a problem. Let's say I allow for just five triggers (a good idea, it turns out, since it models how the human brain focuses its attention on a limited amount of stimuli at a time), and we'll keep with just the two actions, turn and then go forward and back. We'll also say the triggers must be unique – that is, if you say you're looking for “left front sensor = very near”, there's no sense in listing it twice, so, each trigger slot holds a unique value. If we have just five sensors (for instance, front, back, left and right sonar, and a compass), and each can hold about eight values (compass rose points for the compass, variants of near, far, wherever you are for the sonar), that means that with our five slots, we have the following:


possible trigger values == 8 * 7 * 6 * 5 * 4 == 6720
possible actions == 8 *2 =16
all possible permutations == 107,520

Ok, not bad. A computer can deal with that many rules. Still, if we say that each rule takes one second to test, 107,520 seconds is about 29 hours. I can deal with that, but, let's say we add one more sensor...

possible trigger values == 8 * 7 * 6 * 5 * 4 * 3 == 20160
possible actions == 8 *2 =16
all possible permutations == 322,560
Hours to test all permutations == 89.6
Simulated skid steer robot,
showing simulated sonar sensors (the green lines),
the compass (small magenta line and circle), and
the path it has traveled (the curved line)

But, this isn't realistic. If we go past where we are now, adding a couple of sensors, or adding a new type of movement, the number of permutations skyrockets, and before long, you're dealing with centuries worth of processor time. I don't have access to a supercomputer (well, I do, actually, but that's a different story :-) ), so, I am actually limited in the amount of horsepower I can throw at it. Our original premise becomes unworkable, even thought it's a great thought experiment. Also, I have my doubts that this is the way nature works. Instead of throwing all possible permutations at a problem, I have a hunch that nature takes shortcuts, and figures out the permutations that have a chance of working. In fact, ACO and PCO are good examples of just that.

So, instead of trying all possible permutations, what I did is take some tried and true methods, but with a twist.

To generate the first rules, again, without a priori knowledge, I used a Monte Carlo method for the actions, and empirical sensor readings for the triggers. For instance, given a set of previously unencountered sensor readings, I would randomly generate a response, first randomly generating a turn (or deciding to do nothing), and then randomly deciding on a movement (forward or back). From there, we measure the fitment and cull bad rules. From the survivors, I used a combination of genetic algorithms, mutation of rules (such as, randomly delete a trigger entry, add a new one, randomly change a resulting action), and hill climb. These are all fairly common ways to solve this problem, but, it occurred to me, this is how nature evolves organisms, but, it's likely not how they learn – or, at the least, it's not the entire picture. Evolution is glacial, and learning is not. So, that spawned a couple of things. Firstly, this layman needs to learn more about how organisms learn, and secondly, how can I alter the program so that learning is not so bumbling and glacial, which is what I got at first by using GA's and hill climbing? My response to the second was to do what I called “hill climb in place” – I made each of the possible actions have an “undo”. For instance, turning left 45 degrees is undone by turning right 45 degrees. Forward is undone by backing up. And so on. Once that is in place, it allows the robot to do trial and error. Given a completely randomly generated set of actions, it can gradually move from the action it has, measure how well it is in relation to the goal, and in relation to its initial state, undo, and try again – gradually hill climbing, in place, to reach local optima, if any. Trial and error! Organisms do that, and it's not excruciatingly slow, and it works! One other tweak needed to be in place, though – actions need a cost. For instance, turning 180 degrees should cost more than turning 22. Once that is in place, it makes sense to try things out in ascending order according to cost, until you get to a state where your measurement against the desired goal went down, instead of up. From there, simply undo what you did last, and there you are – the best thing you could have done for the state you are in is the thing you did right before the thing you did that screwed everything up :-)



/// <summary>
/// The hill climb in place routine
/// </summary>
/// <returns>A rule aimed towards a local optimum</returns>
private static RuleAction HillClimbSingleActionInPlace(Rule rule, RuleAction action, int position, IRuleInformationProvider provider, Func<IEnumerable<Evidence>> getEvidence)
{
    RuleAction result = action;
    double lastBestResult = provider.GetCurrentScore();

    var possibleActions = provider.GetPossibleActionsByActionListPosition()[position]
                            .Where(action1 => action1.Value.CanUndo)
                            .Select(action1 => action1.Value)
                            .OrderBy(ruleAction => Math.Abs(ruleAction.Cost - action.Cost));

    foreach (var candidateAction in possibleActions.ToList())
    {
        var runResult = candidateAction.Action(rule, getEvidence());
        double currentScore = provider.GetCurrentScore();

        candidateAction.Undo(rule, getEvidence());

        if (currentScore >= lastBestResult)
        {
            lastBestResult = currentScore;
            result = candidateAction;
        }
        else break;
    }

    return result;
}




So, that's the outline of the program. A limited set of triggers and actions, which we use to define rules, and then we use Monte Carlo, genetic algorithms, random mutation, hill climbing, and a tweaked version of hill climbing which allows us to do trial and error, in place, and figure out the best action to take.

One other note: groups of neurons fatigue after a while. After firing repeatedly for a while, their signal strength degrades. This serves multiple purposes in the brain, and I imitate it here. In the program, it means that, even if a rule is a good fit for a situation, if it fires too many times in a row, it has to sit down and be quiet for a while, and allow other rules to have a chance.

So does it work?

Yep. Sure does.

I created a simulation of a robot in a maze, and what I have observed is this:


Early on, I see completely stochastic movement, and a lot of bad decisions, such as running into walls, turning in a direction that is bad, and so on. If I let it run for a while, say an hour or so, I start to see good decisions. I also see elegant motion. For instance, instead of jerkily moving away from a wall, I will instead see the robot moving in arcs and curves, smoothly going around obstacles or following a wall. Keep in mind, I never once told it how to follow a wall or how to move in an arc. This is all emergent behavior, derived from simpler building blocks. The robot came up with these solutions to its problems by itself, and those solutions mimic nature in form by the process of emergence. Since the rules fatigue, I have also seen the simulation fall into patterns of rules, or repeating sets of rules. For instance, instead of sitting in one spot and running a rule over and over (in which case, it will quickly fatigue), the robot would instead move in patterns that look like a five pointed star, or a small circle, where it executes one rule, and then another, and so on, eventually coming back to a spot where the first one again can fire. In this way, it takes a much, much longer for a rule to fatigue, since it has a rest of five turns in between when it is triggered. Again, this behavior is emergent – there is nothing in the programming which allows for patterns of rules, and, in fact, the program is generally stateless. Yet repeating patterns and interlocking sets of rules arise.


Robot simulation executing organic-like, curved paths in search of its goal state


So this is learning, in miniature, in a way. Both for me and for the robot. It points out how much can be made from small, simple pieces, given a mechanism to allow for emergent behavior. It shows the tendency of life to move away from entropy and towards structure.

And it shows me how much more I need to learn.


3 comments:

  1. Really great stuff here, Rick. The whole thing was a great read and fascinating. By the end, though, I couldn't help but think of the emergent behaviors being similar to the observed patterns in Conway's Life, where the simple rules form predictably repeating patterns depending on the starting conditions.

    I liked your idea about fatiguing the rules. It is close, but opposite, to what I was thinking -- somehow apply stakes. Not sure what they could be, they would have to be artificial. But make the robot have a goal, a motivation, or have stakes on its performance. Perhaps something as simple as a timer to complete the maze, always trying to beat its previous score.

    ReplyDelete