We begin where others give up.


(C) Seewald Solutions, 1180 Wien, Austria. Commercial use prohibited.


Projects Publications CV KDD WEKA Contact Business

Handwritten Digit Recognition

To have full control over preprocessing, I have created my own dataset for handwritten digit recognition. The full preprocessing is described in technical report (citation see below), and the digits were contributed by Austrian university students as part of my 2005 lecture AI Methods for Data Analysis, sadly discontinued.

Please mail me if you want to set up something similar for your own lecture. I welcome contributions, as it would take around 5 years for one similar-sized lecture to create a dataset of similar size as MNIST using this approach, and size seems to be the main determinant of error rate for SVM classifiers. Note that MNIS has due to its automated segmentation a segmentation error rate of around 1%, which makes interpretation of quoted error rates less than 1% quite hard.

The approach has the following advantages for you and your students.

Download

All files are in gzipped ARFF format (for WEKA). Please gunzip before use.

If you use this dataset, please cite: Seewald A.K.: Digits - A Dataset for Handwritten Digit Recognition. Technical Report, Austrian Research Institut for Artificial Intelligence, TR-2005-27, 2005. PDF

I've also revisited some assumptions about machine learning in 2009 and found that state-of-the-art machine learning systems are just as brittle as their old classical AI counterparts. Brittleness in this context means that their generalization performance on the whole task space (estimated by three distinct datasets) is very unsatisfactory -- they are unable to recognize handwritten digits in general, and the models are very specific to each dataset. You can find the empirically well-founded argumentation in this paper. This might be generally true as well, although that would be extremely hard to prove.

Seewald A.K.: On the Brittleness of Handwritten Digit Recognition Models. Technical Report, Seewald Solutions, Wien, 2009. PDF