Jump to content United Kingdom-English
hp.com UNITED KINGDOM home products and services support and drivers solutions how to buy
» Contact HP
hp.com UNITED KINGDOM
Tips & support

Advanced usage tips

» 

Small & Medium Business

»

Printing & imaging expertise centre

» Products
» Solutions and services
» Help me
» Learning center
» More Information
»

Hot offers

»

Subscribe to eNewsletter

»

Buy from HP

»

Find an HP Preferred Partner

Content starts here

Adventures in OCR: You, Too, Can Get Good Results

Once upon a time, the thought of using a scanner to convert the printed word to text that you could manipulate and share made many a brave soul shudder. Though Optional Character Recognition (OCR) sounded good in theory, until recently the amount of cleanup that went along with OCR'd text simply wasn't worth the potential time savings. It certainly seemed easier to retype something than OCR it.

Time and technology has changed everything

Why should OCR be the exception to the rule? OCR is now a practical and time-saving way to translate large amounts of printed text (and images and tables) to a computer-friendly file you can edit at will. When you start with OCR software like that integrated into the Scan Directory available with many of HP's Scanjet scanners, take the time to carefully prepare your source documents, and keep an eye on a few details during the OCR process, your OCR activities should be a resounding success. You'll save time and convert content to a format you can work with.

Start with a good source

The better your source document, the better your OCR results. And while not every document you scan has to be pristine, the following tips will help you prepare your documents for a more successful scan.

Quality: Always start with a high-quality original. Tears, wrinkles, and smudges can confuse the OCR software and lead to errors in the final output. Touch up a dirty original with a correction fluid, or make a photocopy to improve the contrast of the original.
Simplicity: OCR software generally prefers large amounts of clean text with no columns, rules, text boxes, or other layout elements.
Parameters: Scanning text from a page with multiple columns is simple if you handle each column as a smaller component. OCR software programs generally allow you to designate certain areas of a page of text to scan. If you define each column as a separate text field, the OCR process will start with the first column, then move to the second, et cetera.

Tips for better execution

Once you are ready to scan your documents, the following suggestions should help improve the results:

Verify your scanner settings. Be sure your scanner is not using the dithering or halftone settings. While these settings can improve the quality of photographic scans, they make it difficult for the OCR software to process text.
Consider your paper colour. If you are scanning text printed on coloured paper, increase the brightness and contrast by about 10%.
Increase the resolution. You will get better results from a 200dpi image than you will a 100dpi image, and a 600dpi image is better than one scanned at 200dpi. However, before you ratchet up the dpi settings on your scanner, remember that high-resolution scans take up quite a bit of space on your hard drive. Be sure to balance your available space with your resolution settings.
Double-check your language. Be sure that your OCR software is set to process in the proper language. Most OCR software supports a variety of languages, so make sure that your language selection and the language you are trying to OCR match.
Make use of trial and error. If you're going to be scanning a large or long document, try scanning the first page and then process it all the way to the final output text. This will give you plenty of chances to find and address any errors or deficiencies.
Work with the right equipment. The faster your computer processor and scanner, the less time you'll spend waiting. You'll need at least 64MB of RAM (though 128MB is better) for basic functions such as differentiating images from text, identifying characters, and translating a document's layout into electronic form. In addition, if you need to scan many pages at once, an automatic document feeder (ADF) might be a good investment.
Read the manual. The more you know about any tool, the more success you will have using it. Before you start scanning and OCRing, take the time to read the manuals and readme files for both your scanner and your software. The 30 minutes you devote to familiarizing yourself with the OCR tools at your disposal can save you hours of time and make you significantly more productive.

Printer

» Labelling made easy - document templates for printing
» HP Deskjets bring borderless photos, full bleeds in-house
» CMYK: what it is and how it affects your print projects
» Accessorize your printer
» A giant leap forward: wireless, mobile printing

Scanner

» Adventures in OCR: You, Too, Can Get Good Results
» Choosing the right flatbed scanner

Office applications

» Be more productive with send to e-mail
» Duplex printing in Microsoft Word 2000
Printable version
Privacy statement Using this site means you accept its terms
© 2007 Hewlett-Packard Development Company, L.P.