<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=6896177&amp;fmt=gif">

5 min read

Optical character recognition: A guide to making your job easier

Optical character recognition: A guide to making your job easier

As a plan reviewer, you may be able to work with several sets at a time — but you only have one pair of eyes to find the specific page you need. This makes the process challenging, no matter how efficient an organizational system you have. You're used to having paper plan sets and the storage space required to house them. It's not easy to keep track of it all, much less keep it sorted properly.

What if all of those plan sets were made digital? You wouldn't have to spend so much on storage space. You could sort and search the documents far more easily. And, perhaps best of all, you could collaborate on the same digital plan set in real time with other plan reviewers.

Optical character recognition (OCR) enables you to take all those paper plan sets and make them digitized and editable. When you mark up the set, everyone can see the changes you make as you're making them.

What is optical character recognition?

OCR is a technology for making digital, machine-readable and editable text from a physical document or PDF file to edit in another program such as Microsoft Word or Google Docs. With OCR capabilities, you can scan something like a paper or plan set and turn the written text into something machine-readable.

How OCR works

On paper, an OCR system is simple: OCR software works with a scanner to turn physical text into digitized text. This creates a picture for the software to analyze. When the image is formed, the software uses one of two methods — pattern recognition or feature detection — to look for individual letters, numbers and symbols. Another black-and-white image is created with all the text being black on a white background. This uses data extraction to create a machine-readable digital document that can be edited.

The challenge an OCR program aims to overcome is being able to read various fonts and styles of text while still accurately scanning and processing it. This is true for digital text but is even more of an issue with handwritten text. In this instance, the OCR software needs to be able to analyze the handwriting for patterns and design a font that matches the handwriting.

Another issue is separating the text from noise. Physical documents are rarely perfect and can contain dust, creases or other imperfections. OCR software needs to differentiate between the text and other artifacts to produce an accurate document.

There are two types of OCR: Pattern recognition and feature detection. Each works differently, though they have the same end of making text editable.

Pattern recognition works by comparing the physical text with the digital text in the software. The software has a library of letters, numbers and symbols it recognizes and matches them to the original document. Whenever it finds a match, it creates that text in editable form. The OCR software typically has a wide range of fonts and formats to better match the text to its digital counterpart.

Feature detection identifies text with rules for what makes up the information. For example, if it's looking for an A, then the software knows how an A is drawn utilizing its rules: the slants and lines. If it has curves, such as with the letter C, it recognizes the letter because it knows the curve makes a C a C. This is a slightly smarter method of OCR and is often used for deep learning.

A brief history of OCR

Before OCR, the only option for turning text into digital form was to manually type it into a device. If you wanted to copy a newspaper article, for example, you would have to read the physical article and copy the text with a keyboard, similar to how scribes used to copy books by hand before the printing press was invented. Naturally, this was a painstaking process.

Some might argue that the first imagining of OCR was the Optophone, invented in 1913 by Dr. Edmund Fournier d'Albe. But Ray Kurzweil was the creator of the first modern OCR. Its original use was to enable blind people to read in 1974 through text-to-speech. Kurzweil's company, Kurzweil Computer Products, Inc., was sold to Xerox in 1980.

OCR took off as a means of digitizing newspapers in the early 1990s. Computer scientists have continuously improved OCR accuracy until we arrived at today's solutions.

What is scene text recognition (STR)?

STR is a form of OCR that utilizes computer vision to read text against natural scenes instead of merely as black letters on a white background. It's a common technology in self-driving cars, for example. With STR, the car can read road signs, logos and billboards, among other things.

Who uses optical character recognition?

Plan reviewers, doctors, lawyers, retail clerks, IT personnel — almost everyone can take advantage of OCR.

Where is OCR commonly used?

OCR is used for a massive range of applications. Here are some examples:

  • Electronic plan review. When a plan set is scanned by OCR software such as e-PlanSoft™ goPost™, it's transformed into an editable document for easy, digital mark-ups.
  • Retail. Take a voucher, for example. A smartphone can scan the serial code on a voucher to identify it.
  • Banking. If you've ever deposited a check remotely, you've used OCR. The text on a check is scanned on the banking app so the bank can recognize it. No human element is required here besides taking a picture of the check — the rest of the process is completely computerized and automatic. Mortgage applications and payslips can also be scanned with OCR.
  • Travel. Checking into a hotel using self-check-in requires OCR. Here, you can scan your passport to access your room.
  • Health care. Records, insurance payments, X-rays and tests can all be scanned with OCR.
  • Law enforcement. Special cameras can read plate numbers and send the information to the appropriate department.

What are the benefits of optical character recognition?

OCR, at its core, digitizes and localizes data. Rather than have all your information scattered in both digital and physical form, you can turn it all digital. This has several advantages:

  • Security. Physical documents can be damaged, destroyed, lost or stolen. When everything's in a central data repository, it's all protected under the same lock. Additionally, digitized information can't be physically damaged like paper documents can nor be misplaced or stolen.
  • Searchability. When searching a tangible piece of paper, you can only use your eyes to scan the page. When looking for that document in the first place, you have to sort through whatever collection it's a part of. When all information is digital, it's also all searchable with a computer in a central data set.
  • Accelerated workflows. If people constantly look for information on printed pages, that's time unnecessarily wasted. OCR enables workers to have all their information in one place, minimizing searching time.

If you have a PDF reader, do you need an OCR tool?

To make a PDF editable, you need an OCR tool. A PDF reader only creates a static image; it is not editable unless first processed with OCR technology.

How do you use optical character recognition software?

Most OCR solutions work behind the scenes. In e-PlanSoft™ goPost™, OCR happens automatically when the plan is formatted for e-PlanReview®.

Once project applications are submitted, they're tracked and managed within goPost™. PDF Scout™, an application in goPost™, scans the application for viability. This includes checking for resolution, PDF version, that it has no attachments and is in a proper state to be reviewed. If the plan set fails the test, the software will inform you of why so you can make the proper corrections. Any documents that don't pass PDF Scout's test cannot be submitted for review.

Using goPost™, OCR technology reads the sheet numbers on the plan set and sorts them automatically. This helps you avoid constantly sorting through page after page of sheets, instead putting it all in order for you.

Optical character recognition puts everything at your fingertips

OCR is a popular technology with many different uses, and it's come a long way since its inception. The accuracy of OCR and the advancement of related technologies means it's become much more reliable and easier to use over time. It can be invaluable in saving you time and accelerating your workflow.

With e-PlanSoft's line of products, you can ensure high-quality digital plan sets. Request a demo today to learn more and get started.

Transforming Municipal Workflows with Electronic Plan Review Software

Transforming Municipal Workflows with Electronic Plan Review Software

In the modern age of technology, municipalities face growing pressure to manage complex projects efficiently while maintaining transparency and...

Read More
Top 5 Reasons to Use e-PlanSoft™ for Managing Comment Letters

Top 5 Reasons to Use e-PlanSoft™ for Managing Comment Letters

Handlingcomment letters during plan reviews can be tedious filled with opportunities for delays and miscommunication. Traditionally managed through...

Read More
Making the Move to Electronic Plan Review - e-PlanSoft

Making the Move to Electronic Plan Review - e-PlanSoft

Top tips and considerations in adopting an electronic plan review solution If you’ve joined the ranks who have come to understand the many ways that ...

Read More
2023 Comparative ePlan Review: Bluebeam, Adobe, e-PlanSoft™, Avolve

2023 Comparative ePlan Review: Bluebeam, Adobe, e-PlanSoft™, Avolve

Introduction In the evolving landscape of digital design, plan review, and construction software, selecting the right tool for your project is more...

Read More
What are annotation tools, and what are they used for?

What are annotation tools, and what are they used for?

You're one of many plan reviewers handed a plan set, working together to communicate with each other what should go into it so you can build...

Read More