Scanning and OCR

Articles to help you counter hoplophobic arguments.

Moderator: GenMod

Scanning and OCR

Postby GOSA » Mon, 2007-09-10 08:42

Peter Moss posted the following to SALGO. Posted here with permission.


Scanners have been with us for many years and just like PCs
remain a mystery to some. They are cheap and many could not do
with out one once you know how to use it correctly.


A scanner is not difficult to use but like everything there are
certain rules that apply depending on the end result you want.
I am only going to discuss scanning an article so as to be able
to post it to SALGO.


First the principle of garbage in garbage out is never more
true.


To be able to recognise characters all such software requires a
crisp clean scan at a resolution of 300DPI or better. Almost
every scanner sold comes with an OCR package. Some better than
other but all are usable.


First align the article as close as possible so that the writing
is not slanted. Yes you can correct it later but you must
remember to do this as a slanted character is not the same and
makes recognition more difficult. Once again some packages with
automatically correct this but unless you know for sure align it
properly first. You can use the preview function for this.


Next make sure that the contrast and brightness are set
correctly. All packages have default settings but in many cases
this can be improved. You need to either scan as Black and
White or Greyscale. A test OCR of even a small section of a
large document can be done to show any improvement.


Look at the scan does it have a large amount of characters that
run into each other and are not separate. Reduce contrast. Is
the scan lacking in white and this appears grey. Increase
brightness. Are there a large amount of speckles and dots,
reduce contrast and check again.


Dirty marks can be fixed with the editor program. Small stray
speckles can be removed with the editor program. Superfluous
lines and junk can be removed with the eraser and large
rectangular areas can be removed with the marque tool and cut.


For example you may not want that photo included with the
article, cut it out.


Use the ZOOM function to get to areas that would normally be to
small to correct.


If a large number of characters are misrecognised you did not do
a good job or the original has problems. Try and identify what
they are. It maybe a problem of resolution, increase the scan
resolution to 600DPI. This is my standard scan resolution, note
300DPI is the absolute minimum required for a scan. Once
recognised and checked as correct that huge file created by the
scan can be deleted but hard disk space is so cheap these day
you may want to keep the original if it is a can of a legal
document.


Nothing beats a good graphic editor for corrections like Adobe
Photoshop and other often free editors like The Gimp. All are
complex multi-layer editors requiring some getting used to.
Even attempts to stop OCRing like red text on a black backgroung
can be "fixed" by removing one of the colours.


Do not just save as type PDF. Scanner OCR software will not
even be used and the scans are saved as graphics. This is the
same as creating a WORD doc and just saving the scanned page by
inserting a graphic.


Some OCR packages can save as type PDF in the correct character
recognised format. This option will be available only from the
OCR package and not from the scanner menu. If you are not sure
read the result with Acrobat Reader and set the text selection
tool. Now try and highlight some text. If all you get is a
dotted line or nothing at all with no text being highlighted it
is a graphic.


Yes you can print it but this result is not going to help anyone
who wants to post it to a list server like SALGO. The result
will be just as poor as viewing it on the PC screen.


Adobe Acrobat and Distiller... are not cheap packages. There
however shareware programs that can do a fine job of PDF
conversion of a document editor output.


With a bit of practice you can become very proficient at
converting pages of documentation. Imagine how much that will
help SALGO and your kids with their school projects.


Your PC is only a tool that can be used for a lot more than
playing games.....


Was this of any help to anyone?


Peter
--
Peter Moss


After one hundred and fifty years and many thousands of
firearms control laws introduced throughout the world to
reduce crime and the supply of guns to criminals. The
list of successes should be long and illustrious.
Where is the list?
GOSA
Site Admin
 
Posts: 1598
Joined: Sun, 2006-01-29 15:42
Location: South Africa

Postby GOSA » Mon, 2007-09-10 08:42

From: "Custos Te Nosce" <legitimate.defense@>
Date: Sun, 9 Sep 2007 14:14:11 +0100
Subject: Re: [SALGO] Scanners


Finereader is one of the best OCR packages out there. You can
download a free trial at
http://www.abbyy.com/finereader8/?param=45596


CutePDF Writer enables you to "print" to PDF, and is free.
You need to install both packages, both are free.
http://www.cutepdf.com/download/CuteWriter.exe 1,6 MiB
http://download.cutepdf.com/download/converter.exe 5MiB


Foxit Reader is a fast alternative to Adobe Reader, and if you buy a
serial number, you can use it to modify PDF documents.
http://downloads.foxitsoftware.com/foxi ... _setup.exe 2,1MiB


There are many programs that (claim to) do the same, but these do it right.
GOSA
Site Admin
 
Posts: 1598
Joined: Sun, 2006-01-29 15:42
Location: South Africa


Return to Educational Articles

Who is online

Users browsing this forum: No registered users and 2 guests

cron