Cursive Character Challenge (CCC)

This is the home page of Cursive Character Challenge (C-Cube), the new benchmark for machine learning and pattern recognition algorithms.

Get Data


Publications | Data Format

The database contains 57293 cursive characters manually extracted from cursive words, including both upper and lower case versions of each letter.
The database is split into:

- training set (38160 characters)
- test set (19133 characters)

 

Publications


Articles reporting results over the CCC benchmark


F.Camastra, M.Spinetti and A.Vinciarelli. Offline Cursive Character Challenge: a New Benchmark for Machine Learning and Pattern Recognition Algorithms, Proceedings of International Conference on Pattern Recognition (ICPR), 2006.

Data Format


The description of the vector files (extension .vec) is at the bottom of the page, below the description of the character files (extension .chr).


Description of the character file format (extension .chr)

As an example, we show three bitmaps (see below). For each character there are five integer numbers corresponding to the following quanities:

  1. Width (number of columns in the bitmap)
  2. Height (number of rows in the bitmap)
  3. Distance baseline-upperline (distance between baseline and upperline)
  4. Position upper extreme (distance of the upper extreme from thebaseline, i.e. the line on which the words is written)
  5. Position lower extreme (distance of the lower extreme from the baseline)

After the five above integer numbers the bitmap is written using the following convention:

  1. "0" for white pixels
  2. "1" for black pixels

the bitmaps are reported row by row and contain Width (see above) columns and Height (see above) lines.

At the bottom of the bitmap there is the groundtruth, i.e. the class the bitmap belongs to.

The characters are written one after another without blank spaces between one character and the following one (see below).


20 20 28 -19 0
00000001111110000000
00000011111111100000
00000111111111111000
00011111111111111100
00111111111111111111
01111011100011111111
11111000000001111110
11110000000000111110
11100000000000011111
11110000000000000111
11110000000000000011
11110000000000000011
11110000000000000111
11110000000000000111
11111110000000001111
11111111111101111111
01111111111111111110
00111111111111111100
00011011111111110000
00000000011100000000
o
33 26 54 -37 -12
000100000000000000000000000000000
001110000000000000000000000000000
001111000000000000000000000000000
111111000000000000000000000000000
111110000000000000000000000000000
111111000000000000000000000000000
011111000000000000000000011000000
011111000000000000000000111100000
111110000000000000000001111100000
111111000000000000000001111110000
011111000000000000000011111110000
011111000000000000000011111100000
111111000000000000001111111100000
111111000000000000011111111000000
111111000000000001111111111000000
111111000000000011111111111000000
111111000000000111111111111100000
111111100000011111111111111100000
111111111111111111110011111100000
011111111111111111100011111110000
001111111111111111000001111111000
000111111111111000000000111111100
000011111111010000000000111111111
000000000000000000000000011111111
000000000000000000000000001111111
000000000000000000000000000001111
u
18 43 54 -56 -14
000000000001111000
000000000011111000
000000000011110000
000000000111110000
000000001111110000
000000000111110000
000000000111110000
000000001111100000
000000011111100000
000000011111000000
000000011111000000
000000111111000000
000000111110000000
000000111110000000
000001111100000000
000001111100000000
000011111100000000
000011111000000000
000111111000000000
000111111000000000
000111110000000000
001111110000000000
000111111000000000
000111111000000000
000111111000000000
000111111000000000
000111111000000000
000111110000000000
001111110000000000
001111110000000000
001111110000000000
001111100000000000
111111100000000111
111111100001111111
111111111111111111
011111111111111111
011111111111111111
111111111111111100
111111111111110000
111111111110000000
111111000000000000
011110000000000000
001100000000000000
L


Description of the vector file format (extension .vec)

The vector files have the following format:

The first line contains the vector dimension (in the case of our data the dimension is 34). The following line contain, for each character, the 34 components (separated by a single blank) and then the character label (separated from the last component by a single blank).

Below you can see the first vectors of the file test.vec (the carriage return is always after the label):

34
0.492188 0.058594 0 0.5 0.061013 0.46516 0.028676 0.47672 0 0.5 0 0.5 0.114704 0.478785 0.042709 0.558553 0.014643 0.506298 0.145821 0.565497 0.217206 0.513586 0.127517 0.592522 0.092739 0.517026 0.125076 0.549628 0.067114 0.503845 0.117755 0.549835 0.122636 0.536008 f
0 0.169118 0 0.5 0.104143 0.423875 0.075028 0.44925 0 0.5 0.113102 0.504082 0.238522 0.51348 0.166853 0.543632 0.12766 0.54592 0.157895 0.446195 0.097424 0.4385 0.006719 0.498125 0.053751 0.505957 0.12206 0.430894 0.031355 0.495312 0 0.5 0 0.5 f
0.432692 0.067308 0 0.5 0.059361 0.455547 0.020548 0.483564 0 0.5 0.118721 0.58998 0.187215 0.540695 0.147641 0.601309 0.112633 0.585089 0.108828 0.609092 0.172755 0.49628 0.104262 0.48486 0.13242 0.512003 0 0.5 0.064688 0.52058 0.151446 0.537297 0.064688 0.515998 f
0.378947 0.082456 0 0.5 0.022167 0.49449 0.062397 0.453885 0 0.5 0.130542 0.605603 0.187192 0.537299 0.168309 0.564711 0.108374 0.565149 0.098522 0.592561 0.139573 0.499713 0.038588 0.547971 0.133826 0.484528 0 0.5 0.062397 0.479501 0.116585 0.558689 0.100985 0.49206 f
0.358025 0.069959 0.086909 0.584098 0.115512 0.599831 0.167217 0.547775 0.115512 0.569966 0.178218 0.547025 0.185919 0.537472 0.216722 0.518732 0.19692 0.502536 0.011001 0.507278 0.028603 0.528223 0.070407 0.445766 0.047305 0.544132 0 0.5 0.023102 0.486382 0.083608 0.441774 0 0.5 f