Thursday, August 31, 2006

Announcing Tesseract OCR

An acquaintance who is working in OCR (Optical Character Recognition) for a university library turned me on to this. She is really excited by this and I think it is pretty neat, as well.

Tesseract is a rather old HP project. Along the way, HP decided they did not really want to be in the OCR business. Tesseract is not a product and in fact is fairly far away from being ready for prime time. However, the "engine" is unique and contains features that might serve other open source OCR projects. Conversely, features from other projects might be added to Tesseract. At least those are the hopes.

While there are already a host of pretty good OCR products out there, both proprietary and free software, there is room for improvement in all of them. So any new technology released for this use is interesting and has the possibility of changing our user experience for the better.


