So I got this friend who built a machine to chase away intruding cats from his home. He’s not on Reddit but I suggested to post it for him to see what people who know this stuff had to say about it.
Here’s his story:
When I got home from work my house was a mess. A plant from the windowsill was on the floor, the cats’ drinking bowl was upside down, it reeked of cat pee, and our own two cats seemed very stressed. We found out why next Sunday morning when we were rudely awakened by screeching noises and I found a huge red cat in the hallway: A neighborhood cat came into our house.
It did not take long after placing a webcam in the kitchen until I caught it on tape: Here you can see how the red cat enters our house, eats our food, drives our cat Zoey out of the door and knocks over the water bowl.
I decided to drive this cat out of my house using data science.
There are cat doors that selectively allow cats into the house based on their chip. These are quite expensive and also take a few seconds to register the chip and open. My fear was that our cats, while waiting for the hatch to let them in, would still be molested by that big red cat. For that reason (and because I am a bit of a geek) I decided to try to scare this red cat out of my house with data science.
I approached this like a real project, with the following phases:
1. Functional requirements
3. Data gathering
1. Functional requirements
My requirements were simple: The red cat had to be chased out of my house, without my own cats being bothered and without disturbing us in our daily life. In addition, the costs had to be low: less than the 90 euros that an electric cat door costs.
Even when you do not know exactly how everything is going to turn out, it helps to male a preliminary design. I wanted to train a so-called convolutional neural network that had to recognize the red cat. After the intruder was identified, a sound had to be played. So I needed: a camera, a computer and a speaker. See part a) of the schematic at the bottom.
3. Data gathering
To build a neural network that recognizes a specific cat, I had to teach my “artificial intelligence” what this cat looks like. For that I needed images, lots of images. Data collection was therefore by far the most work. In short, I took the following steps:
Collecting images: The Linux package Motion turned my laptop with webcam into a security camera that collected images when motion was detected.
Region of interest cropping: By identifying which pixels in the photo had changed, I could automatically crop the parts from the photos where the movement had occurred. The following image shows how that works.
Categorizing: I sorted the images into different classes by hand; our cats, the red intruder, the robot vacuum cleaner, empty pictures and people.
Multiplying: To get a large and robust dataset I increased the number of images by shifting, mirroring and disturbing them. As a result, I had about 40 times as much data at my disposal. The following image shows an example.
Splitting and combining I split the the dataset into a train, validation and test dataset for each category. Then I created a combined dataset with about the same number of images for each class.
Convolutional neural networks are deep learning models that are able to classify images into groups based on characteristics. In order to do this, the model must be trained. During training you show the network images of which the label is known, the weights in the model are continuously adjusted. After one round of training, the validation data is checked to see how well the model works and overarching model settings are adjusted based on the result. This process is repeated until the model no longer improves. Finally, the test set, with images that the model has never seen before, is used to measure the quality of the model.
Technicalities: For the data scientists among us; I wrote my code in Python and built my models using Keras (with the TensorFlow backend). I trained multiple networks with different architectures, settings and datasets. Usually I used about 30,000-35,000 images, about 80% (25,000) for the training set, 20% (5,000) for the validation set, and I kept about 1,000 images as a validation set. The training time per model was approximately 8 hours on the CPU, but was reduced to 30-60min when I let Keras use my NVIDIA GPU. The global model architecture can be seen in part c) of the image at the bottom of this article.
Quality: All successful models were more than 95% accurate in classifying images. However, at first I struggled with bias : my AI was a racist! In addition, the accuracy for the red cat was not high enough. It is great to have a model that can distinguish the three cats, but it is much more important that it recognizes the red cat and that it does not unjustifiably scare my own cats away. I wanted to minimize the false positives regarding the red cat as much as possible.
The bias and the false positives were solved by using three neural networks in succession; The first checked if there was a cat in the picture, the second and third checked if this cat was the red cat. Using three different neural networks, that had to agree, not one of my own cats was confused with the red intruder in my test set (0% false positives). The accuracy for the red cat did go down to 85%, but this was more than enough for my goal.
A biased artificial intelligence
A machine learning model is trained for a specific task using a selection of training data. If a certain type of information is missing during training, the model will not handle this well in practice. It has bias. If a football player is never passed a ball on his left leg during practise, he will also struggle when this happens during a match.
My first model didn’t just have a bias, it was a racist! It had been trained with the images of three cats and was very capable of distinguishing them. If it was presented with an empty image, an image of a robot vacuum cleaner or a human, these were also classified into one of these three cat categories. People and robot vacuum cleaners were therefore often incorrectly labeled as a red cat!
An unbiased data set (filled with all types of realistic practical situations) is essential for a properly functioning machine learning application.
When we left the house in the morning I turned on the setup. In my kitchen there was a laptop (later replaced with a Raspberry Pi) with a webcam and sound. The motion library turned my webcam into a security camera. A script was running that fed the images from the security camera to my model when motion had occurred.
Real-world testing: The image of the moving object was classified by the neural networks. Only when all three models were in agreement, I deemed the identification sufficient. Since my webcam took 6 frames per second, I didn’t mind that the red cat was only recognized in 85% of the cases. I thought it was more important not to scare my own cats away. After a week of testing, the red cat was identified within 2 seconds every time he entered the house. Not once was one of my own cats wrongly classified as the red cat. Test passed!
Expel mechanism: One thing was missing: A method to expel the cat from my kitchen. That’s why I made a number of sound recordings together with my girlfriend in which we screamed and clapped our hands. When the red intruder was identified, the script randomly triggered some of these sound recordings. They were then played as loud as possible. See item b) of the following figure.
For less than € 70 in investment in new materials (webcam, raspberry PI, housings and cables) I built a system that was able to distinguish one specific cat from my own cats and robot vacuum cleaner in real time. Once the intruder was identified, a chase mechanism was activated. For this I trained a neural network that classified the images from the webcam with deep learning techniques: An “artificial intelligence” that was able to recognize the red cat under varying circumstances. I did this on my own laptop and completely with open source software.
Two days after the go-live of this setup, the red cat entered my kitchen. The following animation shows the scene and the actions taken by the automated script. First the model does not recognize the intruder as a cat (false negative). Then the ‘AI’ identifies the red cat and the furious screams of me and my girlfriend are played through the speakers. The red cat looks straight into the camera for a moment, looking for the source of this noise. Then it flees from the kitchen.
This process was repeated the following days. The cat came in and was chased by my neural network setup. Unfortunately, in all fairness I must admit, the success did not last. It took many evenings of work and thousands of images to teach this artifical intelligence to recognize the cat, but it only took this smart cat a week to learn that this setup did nothing more than make noise. I had learned using deep learning in order to train specialized neural networks to classify images and autonomously activate a cat-scaring mechanism. The cat had learned to ignore the result.
After a week, my ‘artificial intelligence’ was beaten by a cat… maybe this is for the best.