Pets, Plants, and Computer Vision
Header
Tufted Titmouse - I love these guys, so cute.

Tufted Titmouse – I love these guys, so cute.

I’ve been working on building an automated bird call recognition system using python. The first step in getting everything working is to pull down a data set of bird calls from which to train and test my ideas. To get this data I needed a lot of bird calls. It just so happens that there are a couple of large repositories of this type of data including the Xeno-Canto library and the Cornell Ornithology Library. The only problem is that it lives in websites with embedded players and I need the raw data. I decided to try and write a basic web-scraper that would pull down the data I wanted. To do this I first created a list the scientific names of all of the song birds that I am pretty sure live in my neighborhood (at least the common ones I’ve seen before). I checked a couple of websites to cross check my assumptions and developed the following list:

To do the scraping I used the BeautifulSoup python library to help me navigate the DOM from xeno-canto bird library. The code works by crafting a query for each bird species, and parsing the DOM to look for the xc-button-audio in a div element. In that div element there is a sub tag called data-xc-filepath which points to the mp3 file URL. My friend Ben helped me figure out that last little bit as I am barely competent as a web money. Once I have the mp3 file URL I use os.system to call wget on the mp3 url. I also do some book keeping to keep all the bird calls in different directories and navigate the multiple pages of results. If I get some more time I will try to extract all of the metadata and save it to a CSV file, but for right now this works. You can see the code below:

Now that I have the data I need to figure out how to extract individual calls from each file that can contain multiple calls. My working idea is to look for regions where the peak-to-peak audio values are low enough to be considered silence. I will use these silence intervals to split the files into individual calls. Once I get to the short individual calls I am thinking I will run an FFT over the audio and then bin different regions of the spectrum to create a feature vector. I will also probably keep some information about the call length and maybe try to determine the “warblyness” of the call (i.e. how many different sub-tweets make up a call). I am thinking that it may be useful to do the binned FFT over fixed time slices of the call and calculate the FFT on that (e.g. break the call up into five time chunks and get the FFT values for each chunk). I have an idea that a binary descriptor can be used to compress each time slice if I set an appropriate threshold at each frequency (e.g. use a 32 bit int to encode if one of 32 chunks the frequency space are discernible). If I can get that idea to work I could probably encode each call very succinctly in only a few bytes of memory. Once I have my data I suspect that a K Nearest Neighbors classifier will work reasonably well. I may combine the KNN with a final correlation with the truth data to choose between the K best matches.



I’ve had this idea, that I have been working on for awhile, to make a system that can recognize fasteners (screws, nuts, bolts, nails, washers, etc) on the fly and then measure some of the descriptive statistics of the parts (e.g. length, width, inner and outer diameter etc). Today I finally got a prototype of the recognition system up and running. The system uses a raw threshold of the scene to extract the parts from the scene background and then does some operations to get the parts into a standard form (namely aligned to the major axis) and then extracts some basic statistics like edge length and orientation histograms, and Hu moments. This feature vector then gets piped into a support vector machine to do the recognition. Right now the system runs at about 8FPS at full resolution. The training error on the SVM was about 13% but the training data was really, really, poor and not that large (i.e. 75 samples with about 10% of those being basically misfires from an automatic data extraction module I wrote). I still have a long way to go, as the feature extractor could use some work and the whole data processing pipeline needs to be optimized. Right now there are some fairly costly image rotation operations that can be modified to improve performance. I also need to train the full set of features not just bolts and nuts.

Once the recognition system is working well I hope to use the ALVAR augmented reality library and its fiducials to determine the part dimensions by assuming that the part is effectively planar with the fiducial. The fiducial should also give us a three dimensional location for the recognized part. Right now I am doing this work for the CGUI lab at Columbia. Our end game is to wrap this code up into an augmented reality system for maintenance applications where there may be knowledge shared between a remote subject matter expert and an on-site maintenance technology. Our hope is that a system like this can speed up maintenance tasks by assisting the maintainer in identifying parts and locating them faster. Part of the problem of using AR in this domain is that head mounted displays really don’t have all that great of resolution which reduces your visual acuity and makes it difficult to recognize individual parts.

Classifying Images

March 25th, 2011 | Posted by admin in C++ | classification | code | Columbia | computer vision | machine learning | OpenCV | segmentation - (Comments Off)

Beach classifier, I wish I was here.

I have been burning the midnight oil finishing up a project for my Computational Photography course at Columbia University. For this project we had to make two classification systems, one which classified beach and grassland imagery using a given feature vector description, and a second classifier for any two objects using whatever technique we wished to generate the feature vectors. It was suggested that we do our work in Matlab, but we had the option to work in C++. I opted for the latter as I really wanted to write something that I could possible re-use in another project. The final system was developed under Windows using Visual Studio 9, and makes liberal use of OpenCV 2.2 and LibSVM 3.0.

"Misclassification"

The beach / grassland images were classified by dividing the image into a three by three grid and calculating the color average, color standard deviation, and color skew for each of the HSV channels. This feature vector was then used in a support vector machine with a linear kernel. The overall error rate was 13.33%. For the beach images 11.67% were misclassified as grassland, while 15.00% of the grass images were classified as beach. The classification is written in the top, left corner. If the image was misclassified there are two values listed. The red value is the classification, and the green value is true value.

Correctly Classified Grassland

For the second part of the project I wrote a system that classifies images as either screws or nails. The system assumes that both of these objects are aligned to be roughly vertical. I wrote a separate class awhile back that would re-orientate the images based on the major axis of the extracted contour. To do the classification I first thresholded the gray scale image and then extracted the resulting contour. After doing this a few morphological operations were performed on the contour and the Hu-moments and a few other statistics were calculated. I also applied the Canny edge detector to the images and piped the results into a Hough line detector. The results of the line detector were then binned according to length and orientation. This data was used to generate a feature vector which was used for classification via a support vector machine with a linear kernel. The overall error rate was 3.58%. 1.33% of the screw images were misclassified as nails, while 7.27% of the nail images were misclassified as screws. The classification is written in the top, left corner. I used the letter “N” to indicate nails, and the letter “S” to indicate If the image was misclassified there are two values listed. Some results are shown below.

Samples from the screw / nail classifier

The complete set of beaches and grassland images can be found in my beach / grassland classification set on flickr along with the complete set of screw/nail classification results. The code is posted on my computational photography Google code page. The code was written under the gun so it isn’t nearly as clean as I would like it to be, and everything is very data set specific. Hopefully once the semester is over I can go back and refactor it to be a more general solution.

RPS simulator

March 5th, 2011 | Posted by admin in artificial intelligence | code | Columbia | demo | In the news | machine learning - (Comments Off)

I came across this cool interactive feature in the New York Times: RPS Simulator. Basically you play rock, paper, scissors against an algorithm that has learned how to play an optimal game based on prior data. The trick is that humans try to think about the game, versus playing truly randomly. If you play a truly random game you should be able to at least tie the computer. To generate random numbers I moved my mouse around with my eyes closed and guessed my move based on the mouse location. Alternatively you could use the seconds hand on a clock and modulo the number by three.

For my machine learning class this week we have to write this algorithm given a training data set. I will post the code after the homework submission deadline.