Pets, Plants, and Computer Vision
Header
Tufted Titmouse - I love these guys, so cute.

Tufted Titmouse – I love these guys, so cute.

I’ve been working on building an automated bird call recognition system using python. The first step in getting everything working is to pull down a data set of bird calls from which to train and test my ideas. To get this data I needed a lot of bird calls. It just so happens that there are a couple of large repositories of this type of data including the Xeno-Canto library and the Cornell Ornithology Library. The only problem is that it lives in websites with embedded players and I need the raw data. I decided to try and write a basic web-scraper that would pull down the data I wanted. To do this I first created a list the scientific names of all of the song birds that I am pretty sure live in my neighborhood (at least the common ones I’ve seen before). I checked a couple of websites to cross check my assumptions and developed the following list:

To do the scraping I used the BeautifulSoup python library to help me navigate the DOM from xeno-canto bird library. The code works by crafting a query for each bird species, and parsing the DOM to look for the xc-button-audio in a div element. In that div element there is a sub tag called data-xc-filepath which points to the mp3 file URL. My friend Ben helped me figure out that last little bit as I am barely competent as a web money. Once I have the mp3 file URL I use os.system to call wget on the mp3 url. I also do some book keeping to keep all the bird calls in different directories and navigate the multiple pages of results. If I get some more time I will try to extract all of the metadata and save it to a CSV file, but for right now this works. You can see the code below:

Now that I have the data I need to figure out how to extract individual calls from each file that can contain multiple calls. My working idea is to look for regions where the peak-to-peak audio values are low enough to be considered silence. I will use these silence intervals to split the files into individual calls. Once I get to the short individual calls I am thinking I will run an FFT over the audio and then bin different regions of the spectrum to create a feature vector. I will also probably keep some information about the call length and maybe try to determine the “warblyness” of the call (i.e. how many different sub-tweets make up a call). I am thinking that it may be useful to do the binned FFT over fixed time slices of the call and calculate the FFT on that (e.g. break the call up into five time chunks and get the FFT values for each chunk). I have an idea that a binary descriptor can be used to compress each time slice if I set an appropriate threshold at each frequency (e.g. use a 32 bit int to encode if one of 32 chunks the frequency space are discernible). If I can get that idea to work I could probably encode each call very succinctly in only a few bytes of memory. Once I have my data I suspect that a K Nearest Neighbors classifier will work reasonably well. I may combine the KNN with a final correlation with the truth data to choose between the K best matches.



Here are a few cool things that came up at CVPR. Today KIT released a benchmark data set for autonomous vehicles. KIT has spent a small fortune outfitting a vehicle with a Velodyne LIDAR, a stereo camera rig, GPS, INS, and all of the other goodies you would expect a DARPA urban challenge vehicle to have. KIT drove the vehicle around a city and recorded six hours of real world data. KIT then painstakingly rectified everything together and paid someone to mechanically segment and classify the data in the scenes (i.e. all pedestrians and vehicles have 3D boxes around them and are labeled in the data). The data is also registered to open street map data. This means that the world now has open source, real-world data for autonomous vehicle navigation. Since KIT provides benchmark along with the data it should be trivial to use the data and compare how your algorithms perform. This work will really serve to drive competition in the field.

Tomorrow at CVPR Kitware is hosting a Python for Computer Vision workshop. Kitware provides open source python tools for computer vision, and they have opened up the materials. You can find them here. I will report more information tomorrow after the workshop

RPS simulator

March 5th, 2011 | Posted by admin in artificial intelligence | code | Columbia | demo | In the news | machine learning - (Comments Off)

I came across this cool interactive feature in the New York Times: RPS Simulator. Basically you play rock, paper, scissors against an algorithm that has learned how to play an optimal game based on prior data. The trick is that humans try to think about the game, versus playing truly randomly. If you play a truly random game you should be able to at least tie the computer. To generate random numbers I moved my mouse around with my eyes closed and guessed my move based on the mouse location. Alternatively you could use the seconds hand on a clock and modulo the number by three.

For my machine learning class this week we have to write this algorithm given a training data set. I will post the code after the homework submission deadline.