Optical Character Recognition

Optical Character Recognition/OCR is the name of the process by which computer software is able to read data from scanned documents.

Support for OCR is dependent on both the scanning hardware and the scanning software.  In other words, one must have a scanner that supports OCR scanning in addition to having the software to actually run the “OCR engine.”

OCR works by analyzing scanned images for individual letters and comparing the identified characters to a database of letter images. The engine assigns a probability based on which letters the identified character matches most closely and then goes with whichever matches best.  This means that OCR has the most trouble dealing with characters that look similar to other characters, such as the capital letter ‘O’ and the number ‘0’ or a capital ‘S’ vs a ‘5’. The more advanced engines are able to base probabilities on the context in which the character appears.

OCR is most effective when working with uniform, computer-printed type and even the best OCR Engines can only promise from 90-98% accuracy on the type of text most commonly seen in invoices, receipts, etc.

It’s not a stretch to say the world runs on OCR technology, as without it most business automation would grind to a halt. OCR is key in order to accomplish key business processes like indexing and storing data.

