Monday, November 26, 2012

A Small Experiment in Emergent Behavior

So, I am not interested in robotics simply to create neat mechanical platforms that do interesting things (although, that's a lot of fun). I have other motives for doing this. One of the primary questions I want to answer is this: can I, as a layman, create a robotics platform, and then write code for it, that in any way models cognition, or in some way models living processes that interact with their environment in intelligent ways?

It's a big question. A really big question.

If you take away the “as a layman” phrase, some very bright people have been tackling this very problem. It starts small, with such things as the Braitenberg's vehicles, a gedanken on minimalistic open-loop control systems and artificial intelligence, and ends up with things like Asimo, the DARPA challenges, and things such as what Boston Dynamics is doing.

Along the way we meet people such as John McCarthy and the LISP programming language, other languages such as PROLOG, the concept of “strong” AI, SLAM algorithms, emergence, and many other attempts at forming machines that in some way act intelligently, attempts at understanding cognition, and so forth.

In a way, mankind has been on this search for a very long time, as evidenced by such things as the golem legends.

My latest attempt structures around emergence. Here's the basic idea:

1. Let's define a vehicle which has a limited number of moves it can do. Such as, it can move forward, back, and turn in any direction as long as the turn it makes is some multiple of 45. That means 0 degrees (no turn), 45, 90, 135, and 180, in either direction. We'll also let it turn 22.5 degrees (half of 45), so it can make small corrections.

2. For this vehicle, let's restrict the inputs. The robot will likely have some pretty varying input available on its sensors, but we'll do a bit of sensor fusion to calm down what we see and put it into buckets. For instance, let's say we have a sonar or IR sensor which has a range of 3 to 100 inches, but is sensitive in the millimeter range. Instead of returning things like 78.4589998 inches, let's put it into buckets such as “very near”, “near”, “middle range”, “far”. We'll do sensor fusion and return those strings instead of the raw numbers. Same for all the sensors. A compass sensor may, instead of returning the raw theta value for its pose, be distilled down into compass rose points. “45”, “NE”, or “Northeast” would all be fine here, as long as it's in buckets and not things like “47.35”. This will allow us to reason in more general terms about the robot's current state. We could expand this to any number of sensors and sensor types – and is similar to how the human brain generalizes things in, say, the visual cortex, where things proceed from more chaotic and raw values (retina, optic nerve, V1), to more abstract, more general values (dorsal and ventral streams on through to the prefrontal cortex/executive function areas). This would roughly model things that happen before V1, and we're not doing anything even remotely close to the PFC here. We're also being pretty simplistic – the process does not keep state of any sort, does not deal with such problems as visual flow, does not mimic working memory, etc. It just throws sensor data into buckets.

3. From here, let's create a mechanism that can define rules. Rules will consist of a set of triggers (the buckets from point #2), and a set of actions (the actions from point #1). The general idea is this: we will continually get sensor data from our environment, sorted into the buckets described above. When the distilled sensor data matches the triggers for the rule, a set of actions will be executed. In reality, getting a complete match on the sensor values we're looking for is probably not going to happen very often, so we'll define a minimum threshold – if, say, 75% of the triggers match the current sensor data, that's good enough -- run the actions. The actions will be executed sequentially. The list of actions can be arbitrarily long – but, I have found by experimenting and tweaking, for the program I wrote, two actions are ideal. The first consists of a turn (or no action), and the second consists of a movement (or no action). For instance, an action list may be something like {“turn left 45 degrees”, “go forward” }. The actions defined are simplistic (turns, forward or backward, or “don't do anything”), but could easily be more complex actions (“use the A* algorithm to find a path out of the pickle you've gotten yourself into”)

4. The next thing we'll need to do is to determine whether executing a rule actually worked, and for that we'll need a goal. After executing the rule, we will determine how close we are to the goal state, and we will also determine whether we are closer or farther away from the goal state than before we executed the rule. To do this we will measure the current state (for some arbitrary goal), execute the actions, and then, after we have executed the actions, again measure the robot's perceived state against the goal state we define, and see if things got better, got worse, or stayed the same. This will determine what's called the fitment of our rule. In the long run, rules with a better fitment will survive (and spawn new rules), while rules that have a poor fitment will be discarded. For my experiment, I decided to define the goal state as having all distance sensors be registering equal distance, but having the one pointing directly forward reporting “far”. This has the effect that, if you are in a circle looking out (all sensors are reporting the same distance except the front one, which doesn't see anything), you are in the goal state. As it turns out, the goal state is impossible to reach – which is good! That tension will make the robot come up with interesting results, and hopefully ones we have not anticipated. If so, if it does that, then this would be an example of emergence. (very cool book on related subjects: The Computational Beauty of Nature by Gary William Flake)

5. Next, we'll need a way to create new rules. There's several ways we can go about this, and it brings up the question of whether our robot has a priori knowledge of its environment, or not. Some animals do – for instance, newly born calves know how to stand within minutes of being born, insects automatically know how to fly, and so forth. On the other hand, you were not born with the ability to speak, or walk, or many other things – but you figured it out. The question here is, do we build in some base rules and watch them grow and/or be discarded, or do we start with tabla rasa, and then allow things to build? The first is faster but doesn't tell us as much, the second has the ability to be far slower and either tell us nothing, or produce very interesting results. In the end, I decided to go for the second approach – that being starting with no rules and to let things grow by themselves.

But how do we create new rules?

At first, I thought, let's assume infinite processing power and storage. Given that, and removing it as a concern, I can generate all possibilities of the input states, since they are finite (if we constrain the length of the list of triggers), and the actions we can do are also finite – after all, we are dealing with eight compass rose points we can turn to in the first slot, and then two actions, forward and backward, in the second slot. While large, the list is finite, so, let's flip things on their head a little, and start with the set of all possible rules. This approach would be similar to the approach given for linguistics by Noam Chomsky in Syntactic Structures, where he starts with the set of all possible words in a language, and then defines recursive rule sets which can generate all possible sentences in a language. His notation is also similar in form to Backus-Naur Form (BNF), although BNF generally describe finite state machines (but not always) and Chomsky's generative grammars produce infinite sets. What we are doing here is closer to a finite state machine, in that the set is indeed finite, since the length of the lists (triggers and actions) are constrained, and since the rules are not recursive (there are no subrules within rules), and the rules are essentially a directed acyclic graph (DAG) restricted to a depth of one. A good idea, but it produced a problem. Let's say I allow for just five triggers (a good idea, it turns out, since it models how the human brain focuses its attention on a limited amount of stimuli at a time), and we'll keep with just the two actions, turn and then go forward and back. We'll also say the triggers must be unique – that is, if you say you're looking for “left front sensor = very near”, there's no sense in listing it twice, so, each trigger slot holds a unique value. If we have just five sensors (for instance, front, back, left and right sonar, and a compass), and each can hold about eight values (compass rose points for the compass, variants of near, far, wherever you are for the sonar), that means that with our five slots, we have the following:

possible trigger values == 8 * 7 * 6 * 5 * 4 == 6720
possible actions == 8 *2 =16
all possible permutations == 107,520

Ok, not bad. A computer can deal with that many rules. Still, if we say that each rule takes one second to test, 107,520 seconds is about 29 hours. I can deal with that, but, let's say we add one more sensor...

possible trigger values == 8 * 7 * 6 * 5 * 4 * 3 == 20160
possible actions == 8 *2 =16
all possible permutations == 322,560
Hours to test all permutations == 89.6
Simulated skid steer robot,
showing simulated sonar sensors (the green lines),
the compass (small magenta line and circle), and
the path it has traveled (the curved line)

But, this isn't realistic. If we go past where we are now, adding a couple of sensors, or adding a new type of movement, the number of permutations skyrockets, and before long, you're dealing with centuries worth of processor time. I don't have access to a supercomputer (well, I do, actually, but that's a different story :-) ), so, I am actually limited in the amount of horsepower I can throw at it. Our original premise becomes unworkable, even thought it's a great thought experiment. Also, I have my doubts that this is the way nature works. Instead of throwing all possible permutations at a problem, I have a hunch that nature takes shortcuts, and figures out the permutations that have a chance of working. In fact, ACO and PCO are good examples of just that.

So, instead of trying all possible permutations, what I did is take some tried and true methods, but with a twist.

To generate the first rules, again, without a priori knowledge, I used a Monte Carlo method for the actions, and empirical sensor readings for the triggers. For instance, given a set of previously unencountered sensor readings, I would randomly generate a response, first randomly generating a turn (or deciding to do nothing), and then randomly deciding on a movement (forward or back). From there, we measure the fitment and cull bad rules. From the survivors, I used a combination of genetic algorithms, mutation of rules (such as, randomly delete a trigger entry, add a new one, randomly change a resulting action), and hill climb. These are all fairly common ways to solve this problem, but, it occurred to me, this is how nature evolves organisms, but, it's likely not how they learn – or, at the least, it's not the entire picture. Evolution is glacial, and learning is not. So, that spawned a couple of things. Firstly, this layman needs to learn more about how organisms learn, and secondly, how can I alter the program so that learning is not so bumbling and glacial, which is what I got at first by using GA's and hill climbing? My response to the second was to do what I called “hill climb in place” – I made each of the possible actions have an “undo”. For instance, turning left 45 degrees is undone by turning right 45 degrees. Forward is undone by backing up. And so on. Once that is in place, it allows the robot to do trial and error. Given a completely randomly generated set of actions, it can gradually move from the action it has, measure how well it is in relation to the goal, and in relation to its initial state, undo, and try again – gradually hill climbing, in place, to reach local optima, if any. Trial and error! Organisms do that, and it's not excruciatingly slow, and it works! One other tweak needed to be in place, though – actions need a cost. For instance, turning 180 degrees should cost more than turning 22. Once that is in place, it makes sense to try things out in ascending order according to cost, until you get to a state where your measurement against the desired goal went down, instead of up. From there, simply undo what you did last, and there you are – the best thing you could have done for the state you are in is the thing you did right before the thing you did that screwed everything up :-)

/// <summary>
/// The hill climb in place routine
/// </summary>
/// <returns>A rule aimed towards a local optimum</returns>
private static RuleAction HillClimbSingleActionInPlace(Rule rule, RuleAction action, int position, IRuleInformationProvider provider, Func<IEnumerable<Evidence>> getEvidence)
    RuleAction result = action;
    double lastBestResult = provider.GetCurrentScore();

    var possibleActions = provider.GetPossibleActionsByActionListPosition()[position]
                            .Where(action1 => action1.Value.CanUndo)
                            .Select(action1 => action1.Value)
                            .OrderBy(ruleAction => Math.Abs(ruleAction.Cost - action.Cost));

    foreach (var candidateAction in possibleActions.ToList())
        var runResult = candidateAction.Action(rule, getEvidence());
        double currentScore = provider.GetCurrentScore();

        candidateAction.Undo(rule, getEvidence());

        if (currentScore >= lastBestResult)
            lastBestResult = currentScore;
            result = candidateAction;
        else break;

    return result;

So, that's the outline of the program. A limited set of triggers and actions, which we use to define rules, and then we use Monte Carlo, genetic algorithms, random mutation, hill climbing, and a tweaked version of hill climbing which allows us to do trial and error, in place, and figure out the best action to take.

One other note: groups of neurons fatigue after a while. After firing repeatedly for a while, their signal strength degrades. This serves multiple purposes in the brain, and I imitate it here. In the program, it means that, even if a rule is a good fit for a situation, if it fires too many times in a row, it has to sit down and be quiet for a while, and allow other rules to have a chance.

So does it work?

Yep. Sure does.

I created a simulation of a robot in a maze, and what I have observed is this:

Early on, I see completely stochastic movement, and a lot of bad decisions, such as running into walls, turning in a direction that is bad, and so on. If I let it run for a while, say an hour or so, I start to see good decisions. I also see elegant motion. For instance, instead of jerkily moving away from a wall, I will instead see the robot moving in arcs and curves, smoothly going around obstacles or following a wall. Keep in mind, I never once told it how to follow a wall or how to move in an arc. This is all emergent behavior, derived from simpler building blocks. The robot came up with these solutions to its problems by itself, and those solutions mimic nature in form by the process of emergence. Since the rules fatigue, I have also seen the simulation fall into patterns of rules, or repeating sets of rules. For instance, instead of sitting in one spot and running a rule over and over (in which case, it will quickly fatigue), the robot would instead move in patterns that look like a five pointed star, or a small circle, where it executes one rule, and then another, and so on, eventually coming back to a spot where the first one again can fire. In this way, it takes a much, much longer for a rule to fatigue, since it has a rest of five turns in between when it is triggered. Again, this behavior is emergent – there is nothing in the programming which allows for patterns of rules, and, in fact, the program is generally stateless. Yet repeating patterns and interlocking sets of rules arise.

Robot simulation executing organic-like, curved paths in search of its goal state

So this is learning, in miniature, in a way. Both for me and for the robot. It points out how much can be made from small, simple pieces, given a mechanism to allow for emergent behavior. It shows the tendency of life to move away from entropy and towards structure.

And it shows me how much more I need to learn.

Saturday, September 8, 2012


(addendum - FR3DDY got written about on Hack a Day! Awesome!)

So I've been up to some interesting stuff lately. Let me introduce you to FR3DDY.

Freddy (easier to type) is a Heathkit HERO 1 robot, model ETW-18. The HERO robot series are some of the quintessential examples of early personal robotics, and which many roboticists have drawn inspiration from, such as the White Box Robotics  group of robots (which now uses the Heathkit name), and, of course, my very own HouseBot. The ETW-18 model was the factory assembled model, whereas the ET-18 was the do-it-yourself kit.

When I was a kid, it was every nerd's dream to own one of these. But they were very expensive! Now that I'm older, and thanks to eBay, I can. They're occasionally available, but, they are all antiques. Some are in better shape than others. Freddy's in pretty good shape, but does need some work here and there.

So why the name Freddy? When you turn this old boy on, one of the first things it does is say "Ready!" in an old 1980's style, shall-we-play-a-game, War Games type voice. My girlfriend thinks it sounds like "FREDDY", so, Freddy it is, and "FR3DDY", just to emphasize the computer generated nature of it all.

Freddy saying "Freddy"!

Another thing that Freddy says from time to time is "LOW VOLTAGE", which comes out sounding like "NO CARNAGE". Coming from a robot, it's awfully nice to know that it doesn't intend to cause murder and mayhem. Freddy's a good robot that way.

"LOW VOLTAGE" or "NO CARNAGE"? You decide...

As one of the defining examples of early personal robotics, Freddy is, of course, a restoration project. First and foremost, he needs to be brought back to top running condition. And, in fact, when he came to me, he was not doing that bad. I've had to work a little on the sonar, and he's got some minor problems with the main drive wheel, and there were some missing outer side panels (since replaced), but in general, not too bad -- not too bad at all for being approximately 30 years old.

Work continues on the restoration project, but what fun is just doing that? The ET-18 and ETW-18 robots were primarily programmable in machine code, by directly entering bytecodes through the hex numeric keyboard on the robot's head. It's exactly as awful as it sounds, maybe more so! So what I did is contact this guy, who sold me an upgrade kit for the HERO BASIC ROM, plus the memory upgrade needed to support it. You also need some sort of serial or USB connection to hook to some sort of terminal. I generally use my laptop and puTTY for this. Installation was pretty simple -- it mostly just snapped in to existing ports inside the robot.

So that's cool, but, of course, not great. I'd love to be able to work with this machine using some more modern programming paradigms than a 30 year old dialect of BASIC.

I want to program it in Python.

And I don't want to have some huge wire running to it across the floor. So I'll need wifi or xbee of some sort. The problem here is, Freddy *is* a restoration project, so anything that I do has to be easily reversible, and as non-invasive to the original hardware as possible.

So enter OpenWrt. This is where the fun stuff begins! As you may know, a lot of routers these days are hackable. You can load OpenWrt, dd-WRT, Tomato, and so forth on your router, and turn a kind of clunky interface and router into a sleek, linux-driven router capable of doing a lot more than what it could just out of the box. My personal favorite for just being a router is Tomato. Unfortunately, Tomato has a read only file system, and, while it can be made writeable, it means compiling your own kernel, which means installing the toolchain, which means.. yeah... nah, let's find another route for now.

So what I did is use OpenWrt.  OpenWrt is perhaps a bit more basic than some others, as far as the interface, but it's very capable, and will certainly allow you to hack and tweak your router to how you want it, including having a writeable file system, allowing you to SSH and Telnet into the router, and allow you to install packages (through the opkg package manager).

Now, I need a router that has a serial port. Many have serial ports built in. I lucked out here -- I walked into my local thrift store, and found a WRT54GS, just sitting there waiting to be hacked! Five bucks! I grabbed it as quickly and as calmly as I could, and made for the door. I have an awesome thrift store near me, where I find all sorts of stuff like this --  and wild horses won't drag out of me where it is!

Freddy's new auxiliary brain, partially hacked

The WRT54xx models are the ones that started all of this hackable router business, and it has two built in serial ports, although you have to solder headers on them and do level conversion in order to use them. They also have several GPIO lines, although they're pretty much tied up monitoring the switches and running the lights on the router. If it's an open router thing you want to do, you can probably do it on one of these devices or one of their descendants -- although the early ones aren't so great in the memory department. (They didn't need to be -- they're just routers!)

So, after an install of OpenWrt, some configuring of the router to put it in client mode and have it automatically log into my house's wifi, allow for ssh and telnet login, and install of some packages, I now have a router that has a pared down version of Python on it, the PySerial library (to talk to Freddy through the serial port), and minicom (a hyperterm type program), nano (because the vi editor irks me), and is running BusyBox Linux.

Freddy's router up and running, logged in remotely via SSH, and doing some communication tests

Using the arduino software's built in serial port monitor, I can respond across the router's serial port

Plenty of this was needed - that really is the size of a soup bowl!

This is the same - or better - than some less capable and more expensive boards I've played around with in the past, such as the TS-7800 -- and with wifi, two serial ports, five ethernet ports, and a bunch of GPIO lines built in. Ready to be hooked up to Freddy!

The router, with wires soldered onto the serial port headers, 
and a USB FTDI Pro from CKDevices on port 0, and a BrainStem MAX232 level converter on port 1.

Some soldering on the router, and I have wires coming from the serial ports. Information on how to solder serial headers to your WRT54G, as well as a lot of other information about the WRT54G router, can be found here. Max232 serial converters can be found a lot of different places, but the one I used for this was the BrainStem level converter from Acroname Robotics. Serial port 0 (/dev/tts/0) is the debug port, where you can see the router boot sequence, telnet to a shell, and so forth. /dev/tts/1 is free to use, so, I can talk to Freddy on that, using either minicom, or writing programs in Python which send commands to the serial port. I could also hook up another device on /dev/tts/0, such as an arduino to run some sensors, but the arduino would have to be tolerant of the stream of data coming from the router during bootup -- probably by waiting for a specific code before it started interacting with the router.

So good enough, we have a little computer that runs linux, has wifi, and has Python installed on it -- although not all the batteries are included, as the Python philosophy would have us do. As you might surmise, the router doesn't have a whole lot of space on it, and after installing the bare minimum I need, I only have about 1.5 MB free. Fortunately, you can expand the storage it has by using an SDIO card, but that's a project for another day.

The next thing to do is to install this in Freddy, with minimal to no impact to the original machinery. The router normally runs off of 5 volts at 2.5 amps, but a little experimentation proved that it could do just fine on less than half that amperage. I assume the antenna output is less, but so far haven't seen any problems with it. What remains is to find a steady source of 5 volts on Freddy, plug up the serial port, and find a place for the router to live.

Always good to have handy...
A little bit of experimentation, and I found that the main processor board had a nice supply of 5 volts at a reasonable amount of amperage, and running the router off this line did not interfere with the main board in the robot.

If you blow a fuse while experimenting, wrapping it with tinfoil to make it work again is not the correct solution. This is a bad example and you should never do this. (it sure works in a pinch, though!)

Next was finding a place to mount the router. I had hoped to find a place on the lower chassis to mount it, perhaps with a little velcro, but the router was too thick. I ended up mounting it in the head, underneath the keyboard.

This almost worked out, not quite...

The router, mounted under the keyboard

The cover on, and the wire to the serial port hooked up

All put back together, my cat gives Freddy an inspection

And there you have it! A ~30 year old robot, accessible through wifi, capable of running Python (not to mention a built in, extendable web site run by the Lua programming language), and with minimal impact to the original hardware!

Just to watch it work, here's a demo of Freddy running a short Python script.

And here's the code:

import serial
from serial import *

import time

port = Serial(port='/dev/tts/1', baudrate=9600, bytesize=SEVENBITS, parity=PARITY_EVEN, stopbits=STOPBITS_ONE, timeout=1,  interCharTimeout=0.25)

def read():
 result = port.read(port.inWaiting())
 return result
def wait():
 i = 0
 while port.inWaiting() == 0 and i < 100:
def write(line):
 for c in list(line):

def writeln(line):
 write(line + '\r')

def execute(command):
 return read()

def get(command):
 result = execute(command)
 return str(result.split('\r\n')[1])
execute('dprint "$hello"')
execute('speak 64709')
execute('dprint "$freddy"')

print "run complete."

Thursday, August 23, 2012

In which I bark orders at a robot, and it actually listens!

So, here's my latest adventure with HouseBot. Since I have the Kinect, and since the Kinect has directional microphones in it, I decided to do a little experimenting around with the Microsoft Speech SDK. In an earlier iteration of HouseBot (well, same platform, much less powerful computer, much more finicky drive wheels.. lots of improvements since then), I had also played around with this, but without quite as much success. The main reason for that was that I was using a microphone plugged into the computer directly, instead of using the Kinect microphones. With the other microphone, you basically had to be right on top of it (and sometimes shout) in order to get it to respond. With the Kinect, since it is designed for gaming (et al), you can be across the room and still have the mic respond.

So, I think I'll just jump right into the finished result, and then do a little of a dive into how it's done. Here's a film of HouseBot responding to voice commands.

As you can see, she's still not going to get me a beer. *sigh*... Science, such a harsh mistress you are..

So how does it work?

In order to do something like this, you need to do a couple of things:

1. Build a robot
2. Build a vocabulary of the expected commands
3. Stream audio from the Kinect to the voice recognition code
4. On recognizing a command, take some action

Underneath it all, it is using the Microsoft Speech SDK, the Kinect for Windows SDK, and the Kinect For Windows Developer Toolkit (more info here). The code shown here, at least the setup code in the recognizer object, is an adaptation of the C# speech recognition example in the toolkit. I highly recommend the toolkit! I wrote all this in C# using Visual Studio 2010 , but there's no reason why this couldn't be developed in either one of the express editions, or, in one of my favorite freeware products, SharpDevelop (which is also great for developing in IronPython!)

So keep in mind, there's a lot of support code for the robot that I'm not going to show here. It'll be fairly obvious where I'm calling into the robot, but some of the basic idea is that the robot has various tools attached to it (objects that implement ITool), such as the light, the turret, and the speech generator. The robot itself is a platform (implements IMobilityPlatform) and a sensor provider (implements ISensorProvider). So, if you see something that says "UseTool", that method call is asking the tool to do its core action or some alternate action, and  if you tell a platform to turn or move forward, that's the robot itself. Some tools, such as the turret or voice generation tool also have some specialized actions -- such as "Say()" on the voice tool.

What it looks like when you are consuming all this is that first, we'll do a little setup:

      ITool recognizer;
      ITool light;
      KinectViewAngle kinectAngle;
      ArduinoStepperMotor turret;
      Voice voice;
      StringSensor speech;
      ObjectSensor objectSensor;
      Robotics.Framework.Oscillators.Timeout timeout = new Robotics.Framework.Oscillators.Timeout(new TimeSpan(hours: 0, minutes: 0, seconds: 2));

      public override void Setup()
            recognizer = (Platform as IToolProvider).Tools.Where(tool => tool.Name == "Speech Recognizer").First();
            turret = (ArduinoStepperMotor)((Platform as IToolProvider).Tools.Where(tool => tool.Name == "Head Position Motor").First());
            voice = (Voice)((Platform as IToolProvider).Tools.Where(tool => tool.Name == "Speech").First());
            speech = (StringSensor)((Platform as ISensorProvider).Sensors["Speech Recognition"]);
            light = ((Platform as IToolProvider).Tools.Where(tool => tool.Name == "Light").First());
            kinectAngle = (KinectViewAngle)((Platform as IToolProvider).Tools.Where(tool => tool.Name == "Kinect View Angle 1").First());
            objectSensor = (ObjectSensor)((Platform as ISensorProvider).Sensors.Values.Where(sensor => sensor is ObjectSensor).First());
            speech.SensorChanged += new SensorEvent(speech_SensorChanged);


And then we'll just wait for voice commands. A note - the recognizer object is the actual object doing the speech recognition, but, to make things easier, it fills in a value on a StringSensor object, which is a kind of base sensor object type I use on the robotics platform to easily represent and sense things that are string values (RFID sensor values, sensed speech, values coming from an IR remote receiver, things like that). This is the code that interprets commands:

        bool isProcessing = false;
        void speech_SensorChanged(object sender, SensorEventArgs args)
            if (isProcessing) return;

            isProcessing = true;

            if (!string.IsNullOrEmpty(speech.StringValue) && timeout.IsElapsed)
                if (isWaitingForCommand)
                    switch (speech.StringValue)
                        case "speech on":
                            isQuietMode = false;
                            Say("Speech is turned on now.");

                        case "speech off":
                            Say("Speech is turned off.");
                            isQuietMode = true;

                        case "center":
                            Say("Centering the turret.");

                        case "forward":
                            Say("Moving forward.");

                        case "backward":
                            Say("Moving back.");

                        case "use turret":
                            Say("Commands will move the turret.");
                            isUsingTurret = true;

                        case "use motors":
                            Say("Commands will move the drive motors.");
                            isUsingTurret = false;

                        case "left":
                            if (isUsingTurret)
                                Say("Turning turret left.");
                                Say("Turning left.");

                        case "right":
                            if (isUsingTurret)
                                Say("Turning turret right.");
                                Say("Turning right.");

                        case "turn around":
                            Say("Turning around.");

                        case "get beer":
                            Say("Ha. Go get your own beer! They are in the fridge.");
                            kinectAngle.PositionAt(0, 15, 0);
                            kinectAngle.PositionAt(0, 0, 0);

                        case "stop":
                            Say("Stopping. Say 'row bought' to continue.");
                            isWaitingForCommand = false;

                        case "light on":

                        case "light off":

                        case "look up":
                            kinectAngle.PositionAt(0, 15, 0);

                        case "look down":
                            kinectAngle.PositionAt(0, -15, 0);

                        case "look middle":
                            kinectAngle.PositionAt(0, 0, 0);

                        case "good job":
                            Say("Thank you.");

                        case "robot":
                            Say("I am awaiting commands.");

                        case "status":

                        case "yes":
                        case "no":

                            //Say(string.Format("The phrase, '{0}', does not map to a command.", speech.StringValue));

                    if (speech.StringValue == "robot")
                        isWaitingForCommand = true;
                        Say("I am listening.");

                    if (speech.StringValue == "good job")
                        Say("Thank you.");

            speech.StringValue = String.Empty;
            isProcessing = false;

Underneath the scenes, there's a little setup going on. In the recognizer object, we're setting up a vocabulary, getting a reference to the Kinect audio stream, and setting up a reference to the voice recognition engine provided from the Kinect, like so:

namespace Robotics.Platform.HouseBot.Kinect
    using System;
    using System.Threading;
    using Microsoft.Kinect;
    using Microsoft.Speech.AudioFormat;
    using Microsoft.Speech.Recognition;
    using Robotics.Framework.Tools;

    public class KinectSpeechRecognizer : ITool
        const double ConfidenceThreshold = 0.5;  // Speech utterance confidence below which we treat speech as if it hadn't been heard

        public delegate void SpeechRecognizedEventHandler(object sender, SpeechRecognizedEventArgs args);
        public event SpeechRecognizedEventHandler SpeechRecognized = delegate { };

        RecognizerInfo recognizer;
        private KinectSensor sensor;
        private SpeechRecognitionEngine speechEngine;

        public KinectSpeechRecognizer(KinectSensor newSensor)
            sensor = newSensor;
            InUse = false;

        private RecognizerInfo GetKinectRecognizer()
            var recognizers = SpeechRecognitionEngine.InstalledRecognizers();
            foreach (RecognizerInfo recognizer in recognizers)
                string value;
                recognizer.AdditionalInfo.TryGetValue("Kinect", out value);
                if (value == "True" && recognizer.Culture.Name == "en-US")
                    return recognizer;

            return null;

        private void InitializeSpeechRecognition()
            int i = 0;

            while (recognizer == null && ++i < 10)
                recognizer = GetKinectRecognizer();
                if (recognizer == null)

            if (recognizer == null)

            speechEngine = new SpeechRecognitionEngine(recognizer.Id);

            var phrases = new Choices();

            phrases.Add(new SemanticResultValue("go forward", "forward"));
            phrases.Add(new SemanticResultValue("back up", "backward"));
            phrases.Add(new SemanticResultValue("stop", "stop"));
            phrases.Add(new SemanticResultValue("turn left", "left"));
            phrases.Add(new SemanticResultValue("turn right", "right"));
            phrases.Add(new SemanticResultValue("turn the light on", "light on"));
            phrases.Add(new SemanticResultValue("turn the light off", "light off"));
            phrases.Add(new SemanticResultValue("use the turret", "use turret"));
            phrases.Add(new SemanticResultValue("use the drive motors", "use motors"));
            phrases.Add(new SemanticResultValue("center", "center"));
            phrases.Add(new SemanticResultValue("robot", "robot"));
            phrases.Add(new SemanticResultValue("look up", "look up"));
            phrases.Add(new SemanticResultValue("look down", "look down"));
            phrases.Add(new SemanticResultValue("look straight", "look middle"));
            phrases.Add(new SemanticResultValue("good job", "good job"));
            phrases.Add(new SemanticResultValue("get me a beer", "get beer"));

            var grammarBuilder = new GrammarBuilder { Culture = recognizer.Culture };

            var grammar = new Grammar(grammarBuilder);

            speechEngine.SpeechRecognized += speechEngine_SpeechRecognized;

            var stream = sensor.AudioSource.Start();
            speechEngine.SetInputToAudioStream(stream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null));


        void speechEngine_SpeechDetected(object sender, SpeechDetectedEventArgs e)
            //  throw new NotImplementedException();

        void speechEngine_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
            // throw new NotImplementedException();

        void speechEngine_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
            // throw new NotImplementedException();

        Robotics.Framework.Oscillators.Timeout timeout = new Robotics.Framework.Oscillators.Timeout(new TimeSpan(days: 0, hours: 0, minutes: 0, seconds: 1, milliseconds: 500));
        void speechEngine_SpeechRecognized(object sender, Microsoft.Speech.Recognition.SpeechRecognizedEventArgs e)
            if (!timeout.IsElapsed)
            if (e.Result.Confidence >= ConfidenceThreshold)
                SpeechRecognized(this, new SpeechRecognizedEventArgs { RecognizedSpeech = e.Result.Semantics.Value.ToString(), });


        public bool InUse { get; set; }

        public string Name { get; set; }

        public bool UseTool()
            if (speechEngine == null)

            return true;

        public bool UseTool(int action)
            throw new NotImplementedException();

        public void Stop()
            if (speechEngine != null)

        public bool PositionAt(double degreesX, double degreesY, double radius)
            throw new NotImplementedException();

So that's about it. One of the hurdles I had to overcome was to make sure I had references to the correct SDK objects. If I set the references to the Microsoft.Kinect assembly in the IDE, things didn't work correctly. I had to look at the csproj file in the developer toolkit example, and manually edit my csproj to match. Once that was figured out, it was smooth sailing. Play with the recognition threshold if you get spurious recognitions that you don't want. At one point I had it set to 0.9 -- less than 90% certainty, and it won't respond. This actually seemed a pretty good setting.


Sunday, July 22, 2012

Dance, my pets! Dance!

Yesterday,  I went into a thrift shop, and found not one, but two RoboSapiens in great condition, with a remote. Both were marked as $14.99, but they let me have one for $9.99, since there was only one remote.

$25! Not one, but two Robosapiens for $25! They normally go for $100 each! (YMMV)

Where did I find such a steal? Oh yeah, riiiight. like I'm going to reveal my sources on a public blog -- especially with a find like that.

I was thinking of hacking one of them. Wikipedia says they are very hackable, including their use in robot soccer, and, in fact there are many examples of hacked RoboSapien robots on YouTube.

Meanwhile, here they are, doing some synchronized dancing.

And so begins my mechanical army of minions.

Sunday, July 15, 2012

Get me a beer, robot!

I make robots. Well, I do a lot of things, but one of the things I do is make robots. One of my main projects over the past few years has been Housebot. Housebot's stated purpose in life is a very challenging technical and engineering problem, which, if you look around on the youtubes, you will see many attempts at solving -- some attempts failing much more spectacularly than others. What's this challenge?

Go to the fridge. Get me a beer. Bring it back.

Extra credit: open it for me.

Now, you're smirking. I see you there. Ha, he wants a machine to go get him a beer. Lazy.

But think about it. There's many things that the robot (or, any machine) will have to do in order to complete this task. Here's a quick rundown, and then I'll show some pictures.

1. First, the robot has to know I want a beer. Otherwise, it just continually just brings me beers. While cool, I do have to go to work tomorrow. Many ways to solve this. Easy is press a button, harder is voice recognition ("Computer! <fleebp> Go make me a sammich!"), harder yet is using artificial intelligence of some sort.

2. Next up is figuring out where the fridge is. This is usually accomplished by what is called SLAM (Simultaneous Localization and Mapping) -- also not an easy problem, and, in fact, one we as an industry have not satisfactorily solved yet. It involves questions about whether the machine has a priori knowledge about its locale, or if it discovers it as it goes, how signal noise from the sensors is dealt with, and many other problems. Some of the best minds on the planet have been tackling this problem, as evidenced in the DARPA Grand Challenge.

3. Next up is getting to the fridge -- preferably without destroying any of the furniture along the way. This involves determining the robot's pose relative to its known world, determining features in the landscape that it can use as waypoint markers/beacons, and then correctly moving relative to those markers in order to navigate to the fridge. It also involves artificial intelligence at this point, in that you have to do planning of some sort to determine the path to take - the likely solution to this is the A* algorithm, but there are others, such as binary searches, or ACO (Ant Colony Optimization)

So, wow. Now we're at the fridge. And we've done a lot of work to get there, and there were many pitfalls, and many technical and engineering challenges. At this point, we're not even halfway done. We need to open the fridge (actuators, robotic arms, motor control, and all sorts of related things are needed here), recognize a beer (vision systems and pattern recognition here), get the beer (more actuator control here), close the fridge, and return (repeat steps 1-3, above, in order to return to the couch).

So, yeah, at this point I've already gotten my own damn beer.

But, it brings us to Housebot's second mission: be a configurable robotics experimentation platform, capable of modular expansion, and hackability. So, without further ado, I'll let Housebot introduce herself:

So, I'm sure you're asking, why wood? Well, wood as a construction material is cheap, and readily available. All of those pieces were bought at the Home Depot in Alameda, CA, which was right down the road for me. Wood is easily workable, and, if you screw it up, you just chuck it in the fireplace, no big deal. Also Housebot makes a nice end table when not in use, which was pretty useful in my cramped San Francisco area apartment.

So how's it work?

Basically, there's a USB cable sticking out of it, and you plug a brain into it. There used to be a built-in computer below decks, but I found it was much more useful (and versatile) to be able to plug in any sort of device, such as a laptop, a Beagleboard, or, even, if you're trying to figure something out and need a bit more horsepower, your desktop (if you actually still have a Grandpa Box, that is) Also, since practically everything inside of Housebot is arduino or netduino based, it can actually do a lot of things on its own, without any sort of a computer involved, and just using its microcontrollers. Generally, though, I have a laptop that is more or less dedicated to Housebot. It dual boots Windows 7 and Ubuntu, but I mainly use Win7 for my development work with it. Since the laptop is actually an old tablet computer, I will likely spend the $40 and upgrade to Windows 8 when it comes out, which should open up some interesting ways for Housebot to interact with her world.

So let me wrap this up with a couple pictures!

Here's Housebot from the front. From the top, we have the Kinect on the turret, then below that on the front panel, there are three Sharp long distance infrared distance sensors (two on the corners, one left middle), one Maxbotix sonar sensor, a motion detector from Radio Shack, and a couple of switches and LEDs. The reddish switch is the emergency kill switch, which drops power to all motors. This has proven to be very important -- Housebot is very strong.  I take Porsche's approach to this -- there's no such thing as "too much horsepower". This has proven dangerous at times, and the kill switch is necessary. The LEDs, the black pushbutton switch, and the gauge in the middle are all programmatically controllable, i.e., their function (if any) is defined by the program currently running.

This is the drive motor for the turret. It's a windshield wiper motor out of a Chevy S-10 pickup, driven by an Arduino and a 10 amp capable H Bridge. The motor has a worm drive in it with tons of torque -- it's physically stronger than I am. I can't stop the motor when it's running.

This is the drive train. It's pretty simple -- it's a kit from parallax. The kit has some low resolution encoders on it -- 16 per revolution, if I remember -- which means I can't turn a single degree. My lowest resolution/the smallest turn I can make is about 11 degrees. This turns out to be basically OK --  I can hit all the points on the compass rose, and for sub-encoder-resolution turning, I can always use timing (i.e., short bursts of PWM activity to the motors) and the AHRS to tell me where I am pointing. It hasn't turned out to be necessary, though -- 11 degrees is quite sufficient.

And here's the guts! It's scary in there! Believe it or not, underneath that snake nest, there are three Arduino boards (including an Arduino Mega), a Netduino, and a Robotics Connection Serializer (now sold by cmrobot.com as the Element - a very capable board in my opinion)

I hope you enjoyed my introduction of Housebot!

As a footnote, there is one highly successful robot which has solved the Go Get Me A Beer problem: a fascinating robotics platform from Willow Garage, called the PR2. Here's a film of it in action.

Unfortunately, they have a larger budget than I do; last I understood, a PR2 went for about $400,000 USD.