Douwe Osinga's Blog: Introducing Karakame

Wednesday, October 5, 2016

This weeks project is a camera project: Karakame. Sorry, Android guys, iOS only. The app takes 5 pictures with 3 seconds in between. After adjusting for small movements of the camera, it will then for each pixel in the five images, pick the median one. This has the effect that when pointed at a scene where people walk in and out, it will remove those people in the aggregate picture.

It works reasonably well. The app is by all means no replacement for the main camera app, more a proof of concept. It seems like the sort of thing main stream camera apps should add - if you have an app like that you can get the source for this at We were in Leipzig this weekend and I tried it out on a statue of Bach:
Bach in Leipzig

See? No people.

Karaoke famously means "Empty Orchestra" in Japanese - "hauntingly beautiful". Except for that it doesn't quite. Kara means empty (see also Karate - empty hand), but the "oke" bit is just the last bit of the English word orchestra. So I called the app Karakame, from the almost Japanese for "empty camera".

Some notes on the implementation. The app uses OpenCV which you can quite easily integrate into iOS these days. I extracted the interoperability code into a OpenCVBitmap class, so have a look if you're interested in that sort of thing. The image stabilization works really well. I normalize to the middle bitmap (i.e. the third one if you take five pictures). Image stabilization leads to the fact that some of the border pixels will be missing from some of the pictures, but by picking the median pixel value, most of the time we'll have values from other bitmaps.

I also experimented with object detection. OpenCV comes with a set of detectors called haar cascades that can detect faces, cars and people - no deep learning needed. It works well for face detection, but for cars and people I didn't get a lot of good results. The idea was to leave pixels inside rectangles that were classified as cars or people out of the median voting, but I took that out again.

Finally the median pixel implementation. Calculating medians in higher dimensions is expensive so I decided to just calculate the medians for the red, green and blue channels. This could lead to weird results, but in my testing it seemed ok. I suppose I could do a little better by calculating the median for the three colors and in the case where there is a disagreement, pick whatever pixel has the smallest distance to the other candidates.

If you have read this far, you're probably ready to get the project from github: