Miscellaneous Ramblings

OCR Software for the Mac

A 'Best of Miscellaneous Ramblings' Column

Charles Moore - 2001.11.27 - Tip Jar

Optical Character Recognition (OCR) software enables your Mac to interpret and convert text characters rendered in a graphics format such as TIFF into editable text.

If you have a scanner, a limited-functionality "lite" version of some OCR program probably came bundled with it, but for serious OCR work you will likely want full-featured OCR software.

Like speech recognition dictation software, good OCR software is either pretty amazing or frustratingly annoying, depending upon your expectations and proofreading skills. The better OCR programs may be up to 99 percent accurate, given reasonably clear images to interpret, but even at that level of accuracy, a 2000 word article is going to have 20 bloopers that need correction, and you can't necessarily depend on a spellchecker to catch them.

That caveat notwithstanding, OCR software can be a great boon when you need to transcribe faxes, letters, or published articles into text that you can edit and paste into word-processing documents or database files.

The traditional big names in Mac OCR software have been Caere (now ScanSoft), OmniPage Pro, and Xerox (also now ScanSoft) TextBridge. Unfortunately, TextBridge for the Mac has been discontinued, and so has development of the Mac version of OmniPage Pro, version 8.1 (1999 vintage) of which is still available.

However, a couple of newcomer Mac OCR programs from England and Russia are now available to fill the slot more or less abdicated by OmniPage and TextBridge. Here's what's currently available in Mac OCR software:

OmniPage Pro 8.01

I don't do a lot of OCR work myself, having realized the paperless office paradigm to a substantial degree, OmniPagebut do I have an old (1996 or 1997) copy of OmniPage Pro 7 that works quite well when I need it, and I don't doubt that version 8.0.1 is better yet.

OmniPage Pro can handle multiple columns, inset text boxes, color art, and fonts in multiple sizes, colors, and styles.

OmniPage Pro 8.01 features:

  • Over 99% accurate on laser quality documents using standard fonts: OmniPage Pro 8.01 features improved ability to OCR "trouble" documents such as skewed or crooked pages, faded and degraded faxes, poor-quality photocopies, extremely small text, reversed-out type (white text on a dark background), multiple language documents, and more. With version 8.0 you can straighten crooked pages up to 10 degrees for better OCR results.
  • Retains color images: OmniPage Pro 8.01 supports the ability to retain color images in the recognized document, whether they're loaded or scanned (using most major color scanners). This capability makes OCR useful for projects that include color graphics. Color images appear in the Thumbnail Viewer, Image Viewer and Text Viewer, and they are maintained in the final document at either 75 or 150 dots per inch.
  • Maintains original page layouts: OmniPage Pro's exclusive True Page technology lets you maintain your document's original formatting, including columns, graphics, tables, and font attributes (size, bold, underline, italics, etc.).
  • OCR Proofreader: Automatically checks your OCR results within OmniPage Pro and displays the original image for comparison. Like a spell checker, it highlights suspicious words and suggests corrections.
  • Easy to learn. Easy to use: OmniPage Pro's AutoOCR Toolbar steps you through the OCR process with the push of a single button. Customize OCR settings directly from the Toolbar to maximize accuracy.
  • Maximum zoning flexibility: OmniPage Pro identifies regions of text and graphics on a page to improve recognition and formatting. You can create zones automatically or manually and modify zones created by OmniPage Pro. You can add, delete, move, combine and split zones, or zone one part of the page, and let OmniPage Pro do the rest. You can even draw irregularly shaped zones.
  • Define styles: OmniPage Pro lets you create personally defined document styles that can be used for current and future OCR jobs. You can specify font, size, bold, italics, margins, and other elements to save any recognized document in a format of your choosing. You can transform any document into a common look regardless of the original formatting. OmniPage Pro also offers several predefined style sets for commonly used formats like memos and magazine articles.
  • Output HTML: You can save documents directly to HTML documents while preserving fonts, formats, and graphics. OmniPage Pro 8.01 opens and saves to a wide variety of file formats.

Accuracy features:

  • 3D OCR uses grayscale data from your scanner to accurately recognize hard-to-read characters, particularly on colored pages.
  • Language Analyst uses dictionary and linguistic information to maximize accuracy.

Other features:

  • Thumbnail Viewer shows miniature versions of pages so you can easily manage multipage documents.
  • Schedule OCR to run any time of the day, even when you're not at your desk.
  • Comprehensive language support automatically detects and recognizes 11 Western languages - Danish, Dutch, English (US and UK), Finnish, French, German, Italian, Norwegian, Portuguese (Lisbon and Brazilian), Spanish, and Swedish - even multiple languages on a single page.
  • WYSIWYG text editor allows you to compare your originals against your recognized text in side-by-side, zoomable, resizable windows. Correct and edit results before saving to your word processor.
  • Thumbnail representations of individual pages make managing multipage OCR jobs easier. Bars representing the status of the page (i.e., "zoned" and "recognized") are placed below each thumbnail image. The thumbnails are drag-and-drop movable.
  • Smart Windows intelligently adjust interface elements based on the user's working environment. Windows are automatically sized to match the user's screen. And the floating palettes reposition themselves so they never sit in the active work area.
  • OmniPage Guide provides onscreen assistance using Apple Guide technology.
  • Direct Input allows users working in an application to activate OmniPage Pro from the Apple menu, perform OCR on an image, and automatically place the resulting text into the application. OmniPage Pro does not have to be opened separately.
  • Double-sided recognition allows users to quickly scan a stack of double-sided pages using an automatic document feeder (ADF). Double-sided recognition first scans the front side of a stack of pages and then the back side. OmniPage Pro automatically places the pages in the correct order.
  • Output format support: Added support for AppleWorks allows the user to save files in a format accessible by AppleWorks.

System Requirements

  • Power Macintosh or greater (will not run on 68K Macs)
  • System 7.5 or later
  • 10 MB free RAM
  • 25 MB free hard disk space
  • 640 x 480 pixel resolution or better

And now for the bad news. The price for OmniPage Pro Mac 8.01 Full Version, despite the fact that there has been no new development of the application for three years, is a suck-in-your-breath $499.99.

Frozen in development and expensive notwithstanding, in my experience OmniPage Pro is a very satisfactory product.

OmniPage Pro 7.0

However, there are less expensive OCR alternatives.

One is, interestingly, OmniPage Pro 7.0, which is still available occasionally as remaindered stock. MegaMacs.com is currently offering new, shrink-wrapped copies of OmniPage Pro 7.0 for a very friendly $29.99.

OmniPage 7.0 includes Thumbnail Views that display small graphic representations of documents as they are scanned in, and a MacPaint-like "eraser" tool, used to eliminate unwanted noise that may appear on a scanned image. Erasing the noise speeds the OCR process.

OmniPage Pro 7.0 also provides several customizable features, such as Smart Windows, which intelligently resizes the tool palettes you are using consistent with the size of your monitor. Open palettes automatically reposition themselves so none overlap. Tool and zone palettes float above the scanned image window, making it easier to access commonly used OCR tools.

You can manually select portions or "zones" of the page rather than having to submit the whole page to the OCR process. You are also now able to proofread the recognized text against the scanned image before inputting the finished text into the word processor. While this does not mean higher initial recognition accuracy, it speeds the overall OCR process by reducing the time spent on cleaning up misrecognized text.

You can also define document styles that can be used to transform scanned text into differently formatted documents. Font size, margins and typeface controls are included. Style formats can be saved and used for future OCR jobs.

Readiris Pro 6.0

The U.K. software developer I.R.I.S. also offers Mac OCR software. Like OmniPage Pro, Readiris Pro 6 for Mac OS converts Readirisprinted documents such as letters, faxes, magazine articles, columns in a newspaper, etc. into editable text files with a very high rate of recognition accuracy.

Readiris Pro 6 features:

  • Simple and easy-to-use toolbar. Scan and recognize at the touch of a button. Change the OCR language and formatting options with a few clicks. Each function is easily identified by a tooltip.
  • Reads all type of fonts, in any size, in any style. Readiris Pro also recognizes faxes, dot matrix printouts, and complex documents. Texts on a colour background and text blocks printed in colour are also recognized.
  • Readiris Pro automatically processes complex formats and recreates word processor output that maintains the original layout. Columns, tables, and graphics are saved in your text result.
  • Click the language icon to select one of the 55 available languages entries, based on Latin, Greek, and Cyrillic alphabets. You need to read Spanish, Portuguese, French, or Russian? Readiris Pro is your solution. Even mixed alphabets are no problem.
  • OCR accuracy at up to 1,000 characters per second (on a G4 based machine)
  • Opens documents saved in JPEG, recognizes documents scanned in colour, including text on colour backgrounds, and saves pictures in colour.
  • Autoformat technology recreates the original document layout, including graphics and tables. Outputs to word processors and spreadsheets. Generates ASCII, RTF, and HTML output. Sends the result directly to any applications such as Word, Excel, AppleWorks, and ClarisWorks.
  • Automatic table recognition detects and reads tables; recreates table objects in Excel, in Word, and in RTF files. Improved recognition of "ungridded" tables
  • Auto button - single-click OCR for automatic processing.
  • Supports all the scanners that use the Photoshop plug-ins technology.
  • Graphic input
    • colour, greyscale and black-and-white images in FlashPix, GIF, JPEG, MacPaint, Photoshop, PICT, PNG, QuickDraw GX, QuickTime, Silicon Graphics, Targa, TIFF, Windows Bitmaps (BMP)
    • allows direct recognition of fax files
    • "drag and drop" of images onto the Readiris image zone
  • Text output
    • All leading Macintosh word processor applications and text formats including ASCII, RTF, and HTML
    • direct export of OCR result to any applications
    • paragraph detection ensures word wrap
  • Table output
    • generic table format, including Excel
    • direct export of OCR result to spreadsheet and word processor
  • Graphic output
    • black-and-white, greyscale and colour graphics in the output.
    • graphics included in text file when "autoformatting" is applied
  • Fonts
    • virtually any proportional and fixed ("monospace") typeface
    • normal, bold, italic and underlined typestyles
    • detection and recreation of font type, typestyle and point size of original document in "autoformatting" mode
    • character sizes 6 to 72 point (0,08 to 1" or 0.21 to 2.54 cm)
    • drop letter ("drop caps") recognition
  • Character sets
    • all American and European character sets, including the Central-European, Cyrillic ("Russian"), and Greek alphabets
    • use of mixed character sets for recognition of "Western" words (proper names, brand names etc.) in Cyrillic ("Russian") and Greek documents
    • numeric mode for recognition of tables and figures
    • trainable on any special symbol: mathematic and scientific symbols, dingbats etc.
  • Zoning
    • automatic page analysis discriminates text zones, graphics, gridded and ungridded tables
    • adjustable manual windowing of relevant zones
    • storage of zoning templates for future use
  • Verification and learning tools - easy correction of mistakes, possibility to train the system on new symbols.

System Requirements

  • A Mac OS Computer
  • 25 MB free disk space
  • 32 MB free RAM
  • Mac OS 8.5 with QuickTime 4.0 installed

All this at a very reasonable $79.99.

FineReader 5 Pro for Mac

ABBYY FineReader 5 Pro for Mac is an OCR program from Russia with a high level of word accuracy and format retention FineReader- even when converting complex pages and poor quality documents - and claims to be the most Mac-friendly OCR on the market.

FineReader Pro 5, codeveloped by Sound & Vision, is designed from the ground up as a Macintosh application with features that fully leverage the strengths of the Mac platform. The software's controls, including toolbars, icons and dialog boxes, are designed to work seamlessly with the Mac OS Appearance Manager.

The application utilizes Apple Speech to enable a voice-read-back tool that helps users to easily proof read recognition results. FineReader also takes advantage of Mac OS technologies such as QuickTime, Drag-and-Drop, and Navigation Service. In addition, the program supports AppleScript, enabling users to run the application from scripts, which can be written to automate repetitive tasks, such as recognizing fax files as soon as they are received.

FineReader 5 Pro for Mac is priced at $129 for a competitive upgrade from any OCR software, including products and versions bundled free with scanners.

FineReader 5 Pro for Mac features:

  • Excellent recognition quality: ABBYY'S new IPA (Integrity, Purposefulness and Adaptability) technology is incorporated into FineReader, enabling it to provide top notch recognition quality, and to overcome all kinds of print defects. Even low quality documents (dot matrix printouts, typewritten texts, photocopies, faxes etc.) are recognized with a remarkable degree of accuracy.
  • Full retention of source document layout: Improved document layout analysis means that the complete layout of any source document can be retained, including columns, tables, pictures, fonts, and font sizes.
  • Fast Internet publishing. - Convert your documents into web pages. FineReader retains the original document layout, including pictures, and tables, and exports it in HTML format.
  • Save your documents in PDF format: FineReader supports the following types of PDF format:
    • text over the image
    • text under the image
    • text and pictures only
FineReader can also replace uncertain characters with their corresponding PDF images.
  • A Mac-like user-friendly interface: Thumbnails in the Batch window allow you to quickly identify and select page images, with the selected images subsequently appearing in the Image window in color.
  • Click on the Scan&Read button
  • The Scan&Read Assistant makes OCR easy: the assistant guides you through the OCR process, allowing you to select the OCR settings of your choice, and to save and automate repetitive tasks.
  • Batch Document Support provides you with the tools you need to work with multipage documents. Processes such as "read", "rotate", "locate blocks", "despeckle", and "save" can all be applied universally, with control maintained by means of thumbnail diagnostic icons. You can even add your own comments to a page. Results can be saved to file, or exported to the word processing application of your choice (Apple Works, MS Word, MS Excel, or Simple Text).
  • A spellcheck system, with an ergonomic interface highlights any text containing uncertain characters, shows a list of suggested words, and zooms in on the relevant image area.
  • Image processing: FineReader supports a wide range of image formats, including TIFF and PICT. Images originating from fax modems and other sources can be saved in the latter two formats before being recognized by FineReader.
  • Color images
  • AppleScript Interface support: FineReader can be run using scripts, without any need for the keyboard or mouse, and tasks such as the detection and recognition of fax files can also be automated. FineReader is both scriptable and recordable; not only does it respond to Apple events, it also allows you to write your own scripts by recording events as they occur.
  • Recognition Languages: FineReader supports 117 recognition languages and can spellcheck 23 languages. Even multilingual documents can be recognized (e.g. English and Latin in the case of medical documents).
  • FineReader works with all scanners via the TWAIN standard, Silver Fast drivers, or the PowerPC Adobe Photoshop Import plug-in. For a complete list of compatible scanners, see the ABBYY website.

You can Download a trial version of FineReader which works like the full-functional version for 30 OCR sessions. After the trial time elapses, the application switches to a demo mode that cannot save the recognition results.

System requirements:

  • iMac, iBook, PowerBook, PowerMac, PowerPC compatible computer, G3 or higher processor recommended
  • Mac OS 8.6 or later, MacOS X (Classic Environment only)
  • 32 MB RAM (64 MB recommended). If 32 MB RAM used, 80 MB virtual memory required
  • Free hard disk space: 80 MB for installation and 20 MB for system functioning
  • 100% Twain-compatible scanner, digital camera or fax-modem or scanner accessible via PowerPC Adobe Photoshop Import Plug-Ins
  • CD-ROM drive

Supported Image Input Formats

  • PICT: b/w, gray, color
  • PCX, DCX: b/w, gray, color
  • JPEG: gray, color
  • PNG: b/w, gray, color
  • TIFF: b/w, gray, color, multipage.

Methods of TIFF compression:

  • Unpacked
  • CCITT Group 3
  • CCITT Group 3 FAX (2D)
  • CCITT Group 4
  • PackBits
  • JPEG

Document Saving Formats

  • RTF (MS Word 98 or higher, AppleWorks 5.0 or higher)
  • MS Excel
  • Simple Text
  • HTML
  • PDF
  • DBF
  • CSV
  • Text (Unicode, ANSI, MAC)

You need to have a TWAIN-compatible scanner to get your images. If you do not have a scanner, you can still use FineReader to recognize images in TIFF, PCX, JPEG, or PICT formats (you may have got these images by fax, for example).


I haven't used either RAEADIris Pro or Finereader Pro 5, but reports from readers indicate that either of these programs equals or surpasses OmniPage Pro in both features and performance for a fraction of the price.

Join us on Facebook, follow us on Twitter or Google+, or subscribe to our RSS news feed

Charles Moore has been a freelance journalist since 1987 and began writing for Mac websites in May 1998. His The Road Warrior column was a regular feature on MacOpinion, he is news editor at Applelinks.com and a columnist at MacPrices.net. If you find his articles helpful, please consider making a donation to his tip jar.

Links for the Day

Recent Content

About LEM Support Usage Privacy Contact

Custom Search

Follow Low End Mac on Twitter
Join Low End Mac on Facebook

Favorite Sites

MacSurfer
Cult of Mac
Shrine of Apple
MacInTouch
MyAppleMenu
InfoMac
The Mac Observer
Accelerate Your Mac
RetroMacCast
The Vintage Mac Museum
Deal Brothers
DealMac
Mac2Sell
Mac Driver Museum
JAG's House
System 6 Heaven
System 7 Today
the pickle's Low-End Mac FAQ

Affiliates

Amazon.com
The iTunes Store
PC Connection Express
Macgo Blu-ray Player
Parallels Desktop for Mac
eBay

Low End Mac's Amazon.com store

Advertise

Open Link