Gesture Recognition

The goal of my summer internship was to get the computer to recognize gestures made with an accelerometer. The accelerometer feeds instantaneous accelerations to the computer, and that is it. Using these accelerations however, provides the opportunity to modify and analyze their graphs. Using text files to store the readings and algorithms to transform them, a set of average readings can be made for each gesture. Using these averages allows us to check if a gesture was performed. The internship was spent trying to find the best way to create these average graphs so the gesture recognition was the most accurate.

Getting an Average

Once a gesture was chosen, several samples of the gesture were needed to create a realistic average. However, the problem with using several samples is that each sample won't be the same length of time, and thus have different amount of readings. Some samples might also be more vigorous than others, making it hard to create an average. This is where most of the time was spent; deciding on the best approach to these problems. The very first thing that had to be done was getting rid of "noise". Accelerometers are incredibly sensitive to subtle movements, which makes their graphs very hard to analyse. They will often be very "bumpy", creating problems for the average.


The solution to this first problem is through smoothing. The first five readings are averaged out into a new value, and that becomes the first value of the new graph. The second reading and the four after it are averaged out to become the second value of the new graph, and so on. The amount of readings averaged out is known as the kernel, and in this case for most gestures the kernel was chosen to be five. The process is demonstrated visually on the following page. This process creates a new graph that is much easier to analyze as its features are more prominent. This is done to every sample for every gesture, as noise always occurs.


The graph after smoothing now looks like this, all the bumps are gone:


The next problem is that this graph now has 18 readings, while others might have 26, 16, 21, etc. It will rarely be consistant. To fix this, the graphs need to be normalised over time so they are always the same length. To do this the critical points need to be analyzed. A critical point in this situation is defined when the derivative of the graph (since this is a graph of accelerations, the derivative is the jerk, or change in acceleration) changes sign. From positive to negative or negative to positive.

A closer look at the critical point on the Z-axis and what the derivative looks like:


The fact that it goes from positive to negative shows that this is a critical point. Since these are discrete sets of data, and not functions, the derivative is calculated by subtracting each point by the point before it. For example, the second point is the critical point, and subtracting it by its predecessor is a positive result, and subtracting the third by the second is a negative result. To normalise the graphs, all the important critical points must be placed at the same spot. This will result in some data creation (interpolation) or data removal. A major problem over the summer is deciding what are important critical points. By looking at the smoothed graph, it appears there should be two critical points on the Z-axis for this motion, and zero for X and Y. However, on closer inspection this particular graph has three critical points, the third being at location 5. This creates problems when two are expected.

At first, a text file was created that stored information for each gesture. It indicated how many critical points each axis should have, and if they have any, their normalised location. A program was written to open this text file and use this information to decide how to normalise a particular gesture. Each sample would be normalised through this method. Since each sample is different, some might have more or less critical points than required. The first resolution to this was that if a sample had the incorrect amount of critical points, it would be considered invalid. This proved to be a problem as more than half the samples had inconsistant amounts of critical points, even when all their graphs had similar shapes. It was concluded that there will still minor bumps, even after smoothing. The next solution was for each sample, increase the kernel size until it had the right amount of critical points. If smoothing made it have less than required critical points, the sample was deemed incorrect. The kernel would be increased each time by two.

Graph of critical point with a kernel size of 5 and of 7:


This method however also had some problems. Frequently the kernel size would become grossly enlarged, sometimes to 21 or higher, which would create problems. This happened especially with the axes that should have zero critical points. This would also happen to axes with motion.

Another approach was to continually smooth the entire graph and each subsequent graph with the same kernel. This would stop if the number of critical points doesn't change, or the required number of critical points is reached. For example, a graph would have 6 critical points after one smooth. This new graph would then be smoothed again and would have let us say 4 critical points, then 2, then after a final smooth it will still have 2. This implies that it should have 2 critical points. If this is not what is desired, the sample is rejected. An example of a sample after one smoothing, then another:


As you can see, the graphs are much smoother now and that extra bump on the Z-axis is gone, but another problem is observed. This method will occasionally remove real critical points that are near the edge of the graph. The second graph is missing the local minimum the first graph has. Really bumpy axes which should have no have motion would also have problems as they would be continously smoothed until all that is left is a single point. This single point would then have to be repeated for the entire normalised graph, which isn't realistic.


Even though the previous solutions allowed for more than two critical points (which complex gestures possess), the final solution decided upon would only allow for two or less. If a sample had more than two critical points, it was decided that the program would only look at the maximum critical point and the minimum critical point. In this example, there are two local maxes, but one of them is higher than the other, and that will be the one normalised:


If the axis is only meant to have one critical point in a particular axis, the text file with all the critical point data will mention if this is mean to be a local max or local min. The program will then look for the absolute max or min of all the critical points it finds depending on which it should be looking for. If the axis is not meant to have any critical points, after a single smoothing the new graph will simply be stretched to the length of the normalised graphs. For most of the gestures the normalised length was 30 readings. To stretch the axis (as well as normalise the other axis with motion) an adaptation of the Bresenham algorithm was applied.

The Bresenham algorithm was created to calculate how to draw a line on a computer. Since computer monitors use pixels, a line can only go horizontally, vertically, or 45 degrees. To get from point a to point b, there needs to be a certain amount of horizontal movements and vertical movements to simulate a diagonal line. Normalising data to a size larger than it was originally requires some data to be interpolated. This simple algorithm was applied to decide where there would be real data and where there would be interpolated data. Instead of drawing a line vertically or horizontally, the algorithm would either put a real data point or extrapolate. Since interpolation requires real data points, all the real data points were placed first and wherever interpolation should occur an impossible value was placed. After the algorithm was run, while looping through the readings, whenever the impossible value was detected interpolation would occur. The code is here:

        double rate = (from - 2) / (to - 2);
        double error = 0;

        // Set first and last values to be real no matter what
        norm[toBeginIndex] = arr[fromBeginIndex];
        norm[toEndIndex] = arr[fromEndIndex];

        for (j = toBeginIndex + 1; j < toEndIndex; j++) {
            error = error + rate;
            if (error >= 1.0 || (1.0 - error) <= 0.000000000000005) {
                norm[j] = arr[k];
                error = error - 1.0;
            else norm[j] = IMPOSSIBLE;	
        // impossible value, used to check when interpolating
        for (j = toBeginIndex + 1; j < toEndIndex; j++) {
            if (norm[j] == IMPOSSIBLE)
                interpolateData(norm, toEndIndex + 1, j);

The variables from and to represent how many readings we are going from and what we are normalising to. In the case where there are meant to be no critical points, fromBeginIndex = 0, toBeginIndex = 0, fromEndIndex = #of readings we started with -1, and toEndIndex = what we are normalising to -1. The first and last values of the normalised array are forced to be real, the first and last values we started with. Rate is a value calculated, and while looping through the original array rate is continously added to error. Whenever error is greater than or equal to 1, or really close to one (due to computer rounding issues), a real value is used. Otherwise, an impossible value is placed. After the algorithm finishes, whenever IMPOSSIBLE is found, interpolation occurs. Interpolation is done through a simple equation. If you have n gaps between two values, you find the difference between the two values and divide by n+1. The result is then added to the first real value, which is then put in the first gap. For each subsequent gap you add the result to the previous real value.

For example: 6, _, _, 15:

15 - 6 / 2 + 1 = 3. 6 + 3 = 9, 9 + 3 =12: 6, 9, 12, 15.

The following code uses j, which indicates the current location of the first spot that needs interpolation. It then continuously goes left in the data until it finds a real value. It does the same for going right. It then uses these values for the formula above to interpolate all the data. Together with the Bresenham algorithm and interpolation, it is simple to simply stretch a sample to any size, in this case 30.

        int i;
        int ngaps = 0;
        double leftValue = IMPOSSIBLE, rightValue = IMPOSSIBLE, interpolatedData;
        for (i = j; i >= 0; i--) {
            if (norm[i] != IMPOSSIBLE) {
                leftValue = norm[i];

        for (i = j; i < endOfArray; i++) {
            if (norm[i] != IMPOSSIBLE) {
                rightValue = norm[i];
        interpolatedData = (rightValue - leftValue) / (double) (ngaps + 1);

        for (i = 0; i < ngaps; i++)
            norm[j + i] = norm[j + i - 1] + interpolatedData;

Another case is where a sample has more readings than what we are normalising the samples to. Instead of the Bresenham algorithm, some data is ignored using this simple algorithm.

        // shrink it by ignoring some data
        rate = (double) from / (double) to;
        for (i = 0; i < to; i++) {
            index = (int) (rate * i);
            norm[i] = arr[index];

A very similar approach is done to normalise arrays with critical points. To normalise axes with two critical points, first the location of the absolute max and min of the critical points must be found. Knowing the locations of the normalised critical points (found in the text file), the critical points are explicitly placed at their new locations. This creates three sections, called A, B, and C. The X signifies critical points:


Each section is then treated separately. A will always start off with the first value of the original array. If the amount of data in A in the original (from) is less than the amount in A in the normalised array (to), Bresenham algorithm is used, otherwise some data is ignored. For A: fromBeginIndex and toBeginIndex are both 0, fromEndIndex is the location of the first critical point minus one, and toEndIndex is the location of the first normalised critical point minus one. This is because Bresenham places real values are those locations. B and C are similar, with different values for the variables. For B: fromBeginIndex is the first critical point plus one, toBeginIndex is the location of the first normalised critical point plus one, fromEndIndex is the second critical point minus one, and toEndIndex is the location of the second normalised critical point minus one. For C: fromBeginIndex is the second critical point plus one, toBeginIndex is the location of the second normalised critical point plus one, fromEndIndex is from minus one, and toEndIndex is to minus one.

The final situation is if there is a gesture where an axis has only one critical point, it is placed in the middle of the normalised array, creating only two sections. The same methods are performed as an axis with two critical points, there is just no section B. For A: fromBeginIndex and toBeginIndex are both 0, fromEndIndex is the location of the critical point minus one, and toEndIndex is the location of the normalised critical point minus one. For C: fromBeginIndex is the critical point plus one, toBeginIndex is the location of the normalised critical point plus one, fromEndIndex is from minus one, and toEndIndex is to minus one.

Each axis of every sample is smoothed then normalised in this fashion. The axes are done separately because some might have different critical points than others. After normalising every array and axis are all the same length with critical points at the same spots. However, averaging them the way they are might pose a problem. Depending on the force of the motion, some axes will have their data points all be negative, and some have all of them positive. If the average of these two were taken, it would average out to 0.


To avoid this, almost all the data points are shifted up or down the same way, so as to have the least amount of negative data points. Also, to make the maximum critical points all at the same magnitude, every normalised sample is multiplied so as the critical points are all the same, in this case the chosen value was 5.

After smoothing, normalising and stretching, the samples are ready to be averaged out. The whole point of all this was to have an average data set to compare live data against. If a gesture is made during a game, to be able to recognize if it is a real gesture, we need something to tell the computer "this is what this gesture should look like". The average of the arrays is exported to a text file that can be later imported into a program, and a graph can be made of it in Excel.


Live Data

Once an average text file is made for every gesture that we want to recognize, a program for live detection must be made. Similar to how samples were recorded, the button on the accelerometer must be pressed while the gesture is being performed. Each time a gesture is attempted, it is stored in an array which is then checked against each gesture stored. Each check smoothes, normalises and stretches the gesture. If at any point the check fails (too few critical points for example), the gesture is not recognized, and it continues to the next gesture. If it successfully finishes stretching the sample, it is then compared to the average text file of the gesture it is checking. To compare two, the Cartesian Difference is taken. To get the Cartesian Difference, this equation is used:


Where ref[i] is the ith data point of the reference text file and norm[i] is the ith data point of the stretched and normalised sample. This equation is done for each axis. The result is the Cartesian Difference. A maximum Cartesian Difference is decided and if the calculated Cartesian Difference is less than the maximum, the gesture is recognized. If it is more than the maximum, the gesture is not recognized and the check continues.