HPL Isolated Handwritten Tamil Character Dataset

Isolated Handwritten Tamil Character Dataset

This dataset contains approximately 500 isolated samples each of 156 Tamil “characters” (details) written by native Tamil writers including school children, university graduates, and adults from the cities of Bangalore, Karnataka, India and Salem, Tamil Nadu, India. The data was collected using HP TabletPCs and is in standard UNIPEN format.

Tamil Characters

An offline version of the data is also available in the form of bi-level TIFF images, generated from the online data using simple piecewise linear interpolation with a constant thickening factor applied.

The data is available only for research use. Subsets of this dataset were used for the IWFHR 2006 Tamil Character Recognition Competition.

Downloads

Downloading the dataset implies that you have understood and accepted the terms of the license agreement.

hpl-tamil-iso-char

Complete dataset containing approximately 500 samples per character.

Online data, UNIPEN format, tar.gz file (Version 1.0, Released June 08, 2006, 45 MB)
Offline (image) data, Bi-level TIFF, tar.gz file (Version 1.0, Released June 08, 2006, 41 MB)

hpl-tamil-iso-char-train

Subset of approx 300 samples/char used as training set for IWFHR 2006 Online Tamil Handwritten Character Recognition Competition.

Online data, UNIPEN format, tar.gz file (Version 1.0, Released Feb 1, 2006, 30 MB)
Offline (image) data, Bi-level TIFF, tar.gz file (Version 1.0, Released Feb 1, 2006, 25 MB)

hpl-tamil-iso-char-test

Subset of approx 170 samples/char (total of 26926 samples) used as test set for IWFHR 2006 Online Tamil Handwritten Character Recognition Competition. The samples have been randomised across writers and classes, and are serially numbered from 00000 - 26925. Ground truth is available here.

Online data, UNIPEN format, tar.gz file (Version 1.0, Released May 04, 2006, 15.4 MB)
Offline (image) data, Bi-level TIFF, tar.gz file (Version 1.0, Released May 04, 2006, 12.6MB)

Note: On downloading these files with Internet Explorer on Windows XP, the filename extension is changed to ".tar.tar", which is incorrect. It is recommended the file be restored to ".tar.gz" once downloaded.

Report an issue with this dataset

Isolated Handwritten Tamil Character Dataset

Related Links

Downloads