Optical Character Recognition (OCR) software enables your Mac to
interpret and convert text characters rendered in a graphics format
such as TIFF into editable text.
If you have a scanner, a limited-functionality "lite" version of
some OCR program probably came bundled with it, but for serious OCR
work you will likely want full-featured OCR software.
Like speech recognition dictation software, good OCR software is
either pretty amazing or frustratingly annoying, depending upon
your expectations and proofreading skills. The better OCR programs
may be up to 99 percent accurate, given reasonably clear images to
interpret, but even at that level of accuracy, a 2000 word article
is going to have 20 bloopers that need correction, and you can't
necessarily depend on a spellchecker to catch them.
That caveat notwithstanding, OCR software can be a great boon
when you need to transcribe faxes, letters, or published articles
into text that you can edit and paste into word-processing
documents or database files.
The traditional big names in Mac OCR software have been Caere
(now ScanSoft), OmniPage Pro, and Xerox (also now ScanSoft)
TextBridge. Unfortunately, TextBridge for the Mac has been
discontinued, and so has development of the Mac version of OmniPage
Pro, version 8.1 (1999 vintage) of which is still available.
However, a couple of newcomer Mac OCR programs from England and
Russia are now available to fill the slot more or less abdicated by
OmniPage and TextBridge. Here's what's currently available in Mac
OCR software:
OmniPage Pro 8.01
I don't do a lot of OCR work myself, having realized the
paperless office paradigm to a substantial degree, but do I have an old (1996 or 1997) copy of OmniPage Pro
7 that works quite well when I need it, and I don't doubt that
version 8.0.1 is better yet.
OmniPage
Pro can handle multiple columns, inset text boxes, color art,
and fonts in multiple sizes, colors, and styles.
OmniPage Pro 8.01 features:
- Over 99% accurate on laser quality documents using standard
fonts: OmniPage Pro 8.01 features improved ability to OCR "trouble"
documents such as skewed or crooked pages, faded and degraded
faxes, poor-quality photocopies, extremely small text, reversed-out
type (white text on a dark background), multiple language
documents, and more. With version 8.0 you can straighten crooked
pages up to 10 degrees for better OCR results.
- Retains color images: OmniPage Pro 8.01 supports the ability to
retain color images in the recognized document, whether they're
loaded or scanned (using most major color scanners). This
capability makes OCR useful for projects that include color
graphics. Color images appear in the Thumbnail Viewer, Image Viewer
and Text Viewer, and they are maintained in the final document at
either 75 or 150 dots per inch.
- Maintains original page layouts: OmniPage Pro's exclusive True
Page technology lets you maintain your document's original
formatting, including columns, graphics, tables, and font
attributes (size, bold, underline, italics, etc.).
- OCR Proofreader: Automatically checks your OCR results within
OmniPage Pro and displays the original image for comparison. Like a
spell checker, it highlights suspicious words and suggests
corrections.
- Easy to learn. Easy to use: OmniPage Pro's AutoOCR Toolbar
steps you through the OCR process with the push of a single button.
Customize OCR settings directly from the Toolbar to maximize
accuracy.
- Maximum zoning flexibility: OmniPage Pro identifies regions of
text and graphics on a page to improve recognition and formatting.
You can create zones automatically or manually and modify zones
created by OmniPage Pro. You can add, delete, move, combine and
split zones, or zone one part of the page, and let OmniPage Pro do
the rest. You can even draw irregularly shaped zones.
- Define styles: OmniPage Pro lets you create personally defined
document styles that can be used for current and future OCR jobs.
You can specify font, size, bold, italics, margins, and other
elements to save any recognized document in a format of your
choosing. You can transform any document into a common look
regardless of the original formatting. OmniPage Pro also offers
several predefined style sets for commonly used formats like memos
and magazine articles.
- Output HTML: You can save documents directly to HTML documents
while preserving fonts, formats, and graphics. OmniPage Pro 8.01
opens and saves to a wide variety of file formats.
Accuracy features:
- 3D OCR uses grayscale data from your scanner to accurately
recognize hard-to-read characters, particularly on colored
pages.
- Language Analyst uses dictionary and linguistic information to
maximize accuracy.
Other features:
- Thumbnail Viewer shows miniature versions of pages so you can
easily manage multipage documents.
- Schedule OCR to run any time of the day, even when you're not
at your desk.
- Comprehensive language support automatically detects and
recognizes 11 Western languages - Danish, Dutch, English (US and
UK), Finnish, French, German, Italian, Norwegian, Portuguese
(Lisbon and Brazilian), Spanish, and Swedish - even multiple
languages on a single page.
- WYSIWYG text editor allows you to compare your originals
against your recognized text in side-by-side, zoomable, resizable
windows. Correct and edit results before saving to your word
processor.
- Thumbnail representations of individual pages make managing
multipage OCR jobs easier. Bars representing the status of the page
(i.e., "zoned" and "recognized") are placed below each thumbnail
image. The thumbnails are drag-and-drop movable.
- Smart Windows intelligently adjust interface elements based on
the user's working environment. Windows are automatically sized to
match the user's screen. And the floating palettes reposition
themselves so they never sit in the active work area.
- OmniPage Guide provides onscreen assistance using Apple Guide
technology.
- Direct Input allows users working in an application to activate
OmniPage Pro from the Apple menu, perform OCR on an image, and
automatically place the resulting text into the application.
OmniPage Pro does not have to be opened separately.
- Double-sided recognition allows users to quickly scan a stack
of double-sided pages using an automatic document feeder (ADF).
Double-sided recognition first scans the front side of a stack of
pages and then the back side. OmniPage Pro automatically places the
pages in the correct order.
- Output format support: Added support for AppleWorks allows the
user to save files in a format accessible by AppleWorks.
System Requirements
- Power Macintosh or greater (will not run on 68K Macs)
- System 7.5 or later
- 10 MB free RAM
- 25 MB free hard disk space
- 640 x 480 pixel resolution or better
And now for the bad news. The price for OmniPage Pro Mac 8.01
Full Version, despite the fact that there has been no new
development of the application for three years, is a
suck-in-your-breath $499.99.
Frozen in development and expensive notwithstanding, in my
experience OmniPage Pro is a very satisfactory product.
OmniPage Pro 7.0
However, there are less expensive OCR alternatives.
One is, interestingly, OmniPage Pro 7.0, which is still
available occasionally as remaindered stock. MegaMacs.com is currently offering
new, shrink-wrapped copies of OmniPage Pro 7.0 for a very friendly
$29.99.
OmniPage 7.0 includes Thumbnail Views that display small graphic
representations of documents as they are scanned in, and a
MacPaint-like "eraser" tool, used to eliminate unwanted noise that
may appear on a scanned image. Erasing the noise speeds the OCR
process.
OmniPage Pro 7.0 also provides several customizable features,
such as Smart Windows, which intelligently resizes the tool
palettes you are using consistent with the size of your monitor.
Open palettes automatically reposition themselves so none overlap.
Tool and zone palettes float above the scanned image window, making
it easier to access commonly used OCR tools.
You can manually select portions or "zones" of the page rather
than having to submit the whole page to the OCR process. You are
also now able to proofread the recognized text against the scanned
image before inputting the finished text into the word processor.
While this does not mean higher initial recognition accuracy, it
speeds the overall OCR process by reducing the time spent on
cleaning up misrecognized text.
You can also define document styles that can be used to
transform scanned text into differently formatted documents. Font
size, margins and typeface controls are included. Style formats can
be saved and used for future OCR jobs.
Readiris Pro 6.0
The U.K. software developer I.R.I.S.
also offers Mac OCR software. Like OmniPage Pro, Readiris Pro 6 for
Mac OS converts printed documents such as
letters, faxes, magazine articles, columns in a newspaper, etc.
into editable text files with a very high rate of recognition
accuracy.
Readiris Pro 6 features:
- Simple and easy-to-use toolbar. Scan and recognize at the touch
of a button. Change the OCR language and formatting options with a
few clicks. Each function is easily identified by a tooltip.
- Reads all type of fonts, in any size, in any style. Readiris
Pro also recognizes faxes, dot matrix printouts, and complex
documents. Texts on a colour background and text blocks printed in
colour are also recognized.
- Readiris Pro automatically processes complex formats and
recreates word processor output that maintains the original layout.
Columns, tables, and graphics are saved in your text result.
- Click the language icon to select one of the 55 available
languages entries, based on Latin, Greek, and Cyrillic alphabets.
You need to read Spanish, Portuguese, French, or Russian? Readiris
Pro is your solution. Even mixed alphabets are no problem.
- OCR accuracy at up to 1,000 characters per second (on a G4
based machine)
- Opens documents saved in JPEG, recognizes documents scanned in
colour, including text on colour backgrounds, and saves pictures in
colour.
- Autoformat technology recreates the original document layout,
including graphics and tables. Outputs to word processors and
spreadsheets. Generates ASCII, RTF, and HTML output. Sends the
result directly to any applications such as Word, Excel,
AppleWorks, and ClarisWorks.
- Automatic table recognition detects and reads tables; recreates
table objects in Excel, in Word, and in RTF files. Improved
recognition of "ungridded" tables
- Auto button - single-click OCR for automatic processing.
- Supports all the scanners that use the Photoshop plug-ins
technology.
- Graphic input
- colour, greyscale and black-and-white images in FlashPix, GIF,
JPEG, MacPaint, Photoshop, PICT, PNG, QuickDraw GX, QuickTime,
Silicon Graphics, Targa, TIFF, Windows Bitmaps (BMP)
- allows direct recognition of fax files
- "drag and drop" of images onto the Readiris image zone
- Text output
- All leading Macintosh word processor applications and text
formats including ASCII, RTF, and HTML
- direct export of OCR result to any applications
- paragraph detection ensures word wrap
- Table output
- generic table format, including Excel
- direct export of OCR result to spreadsheet and word
processor
- Graphic output
- black-and-white, greyscale and colour graphics in the
output.
- graphics included in text file when "autoformatting" is
applied
- Fonts
- virtually any proportional and fixed ("monospace")
typeface
- normal, bold, italic and underlined typestyles
- detection and recreation of font type, typestyle and point size
of original document in "autoformatting" mode
- character sizes 6 to 72 point (0,08 to 1" or 0.21 to 2.54
cm)
- drop letter ("drop caps") recognition
- Character sets
- all American and European character sets, including the
Central-European, Cyrillic ("Russian"), and Greek alphabets
- use of mixed character sets for recognition of "Western" words
(proper names, brand names etc.) in Cyrillic ("Russian") and Greek
documents
- numeric mode for recognition of tables and figures
- trainable on any special symbol: mathematic and scientific
symbols, dingbats etc.
- Zoning
- automatic page analysis discriminates text zones, graphics,
gridded and ungridded tables
- adjustable manual windowing of relevant zones
- storage of zoning templates for future use
- Verification and learning tools - easy correction of mistakes,
possibility to train the system on new symbols.
System Requirements
- A Mac OS Computer
- 25 MB free disk space
- 32 MB free RAM
- Mac OS 8.5 with QuickTime 4.0 installed
All this at a very reasonable $79.99.
FineReader 5 Pro for Mac
ABBYY FineReader 5 Pro for Mac is an
OCR program from Russia with a high level of word accuracy and
format retention - even when converting
complex pages and poor quality documents - and claims to be the
most Mac-friendly OCR on the market.
FineReader Pro 5, codeveloped by Sound & Vision, is designed
from the ground up as a Macintosh application with features that
fully leverage the strengths of the Mac platform. The software's
controls, including toolbars, icons and dialog boxes, are designed
to work seamlessly with the Mac OS Appearance Manager.
The application utilizes Apple Speech to enable a
voice-read-back tool that helps users to easily proof read
recognition results. FineReader also takes advantage of Mac OS
technologies such as QuickTime, Drag-and-Drop, and Navigation
Service. In addition, the program supports AppleScript, enabling
users to run the application from scripts, which can be written to
automate repetitive tasks, such as recognizing fax files as soon as
they are received.
FineReader 5 Pro for Mac is priced at $129 for a competitive
upgrade from any OCR software, including products and versions
bundled free with scanners.
FineReader 5 Pro for Mac features:
- Excellent recognition quality: ABBYY'S new IPA (Integrity,
Purposefulness and Adaptability) technology is incorporated into
FineReader, enabling it to provide top notch recognition quality,
and to overcome all kinds of print defects. Even low quality
documents (dot matrix printouts, typewritten texts, photocopies,
faxes etc.) are recognized with a remarkable degree of
accuracy.
- Full retention of source document layout: Improved document
layout analysis means that the complete layout of any source
document can be retained, including columns, tables, pictures,
fonts, and font sizes.
- Fast Internet publishing. - Convert your documents into web
pages. FineReader retains the original document layout, including
pictures, and tables, and exports it in HTML format.
- Save your documents in PDF format: FineReader supports the
following types of PDF format:
- text over the image
- text under the image
- text and pictures only
- FineReader can also replace uncertain characters with their
corresponding PDF images.
- A Mac-like user-friendly interface: Thumbnails in the Batch
window allow you to quickly identify and select page images, with
the selected images subsequently appearing in the Image window in
color.
- Click on the Scan&Read button
- The Scan&Read Assistant makes OCR easy: the assistant
guides you through the OCR process, allowing you to select the OCR
settings of your choice, and to save and automate repetitive
tasks.
- Batch Document Support provides you with the tools you need to
work with multipage documents. Processes such as "read", "rotate",
"locate blocks", "despeckle", and "save" can all be applied
universally, with control maintained by means of thumbnail
diagnostic icons. You can even add your own comments to a page.
Results can be saved to file, or exported to the word processing
application of your choice (Apple Works, MS Word, MS Excel, or
Simple Text).
- A spellcheck system, with an ergonomic interface highlights any
text containing uncertain characters, shows a list of suggested
words, and zooms in on the relevant image area.
- Image processing: FineReader supports a wide range of image
formats, including TIFF and PICT. Images originating from fax
modems and other sources can be saved in the latter two formats
before being recognized by FineReader.
- Color images
- AppleScript Interface support: FineReader can be run using
scripts, without any need for the keyboard or mouse, and tasks such
as the detection and recognition of fax files can also be
automated. FineReader is both scriptable and recordable; not only
does it respond to Apple events, it also allows you to write your
own scripts by recording events as they occur.
- Recognition Languages: FineReader supports 117 recognition
languages and can spellcheck 23 languages. Even multilingual
documents can be recognized (e.g. English and Latin in the case of
medical documents).
- FineReader works with all scanners via the TWAIN standard,
Silver Fast drivers, or the PowerPC Adobe Photoshop Import plug-in.
For a complete list of compatible scanners, see the ABBYY website.
You can Download a trial version of FineReader which works like
the full-functional version for 30 OCR sessions. After the trial
time elapses, the application switches to a demo mode that cannot
save the recognition results.
System requirements:
- iMac, iBook, PowerBook, PowerMac, PowerPC compatible computer,
G3 or higher processor recommended
- Mac OS 8.6 or later, MacOS X (Classic Environment only)
- 32 MB RAM (64 MB recommended). If 32 MB RAM used, 80 MB virtual
memory required
- Free hard disk space: 80 MB for installation and 20 MB for
system functioning
- 100% Twain-compatible scanner, digital camera or fax-modem or
scanner accessible via PowerPC Adobe Photoshop Import Plug-Ins
- CD-ROM drive
Supported Image Input Formats
- PICT: b/w, gray, color
- PCX, DCX: b/w, gray, color
- JPEG: gray, color
- PNG: b/w, gray, color
- TIFF: b/w, gray, color, multipage.
Methods of TIFF compression:
- Unpacked
- CCITT Group 3
- CCITT Group 3 FAX (2D)
- CCITT Group 4
- PackBits
- JPEG
Document Saving Formats
- RTF (MS Word 98 or higher, AppleWorks 5.0 or higher)
- MS Excel
- Simple Text
- HTML
- PDF
- DBF
- CSV
- Text (Unicode, ANSI, MAC)
You need to have a TWAIN-compatible scanner to get your images.
If you do not have a scanner, you can still use FineReader to
recognize images in TIFF, PCX, JPEG, or PICT formats (you may have
got these images by fax, for example).
I haven't used either RAEADIris Pro or Finereader Pro 5, but
reports from readers indicate that either of these programs equals
or surpasses OmniPage Pro in both features and performance for a
fraction of the price.