GovNews Blog


How to Solve the Biggest Problems with Data Capture

Posted by Troy Burke on Sep 8, 2016 8:30:00 AM

Data Capture

As I travel around the country talking to elected officials and government employees one of the most common talking points is the challenge with managing their data.  Many of them still do a lot of manual data entry while others struggle with getting accurate indexing data from submissions using electronic recording/filing.  This often leads to:

  • Disorganization
  • Inconsistencies
  • Inability for Users to Locate Documents

In 2015, the Property Records Industry Association (PRIA) released best practices to address some of these issues.  Visit the PRIA Resource Directory for more information.

 

What is data capture?

Data capture refers to collecting data electronically instead of using manual data entry.  This can be accomplished with:

  • Barcodes
  • Receiving XML or Text Files as Electronic Submissions
  • Using OCR/ICR Technologies

Old_Doc.jpg

What obstacles do people face
when using OCR/ICR for data capture?

Poor Quality Images – the lower the quality of the images the less accurate data capture solutions will be.  Making sure images are scanned at 300 DPI or higher will help to alleviate most issues.  If the images are already scanned at a lower resolution some vendors have a variety of image clean-up tools that can be used to thicken, thin, or smooth characters, deskew or despeckle documents, remove hole punches or borders, and more. The cleaner the documents the better results. 

Incorrect Data Captured – if you’ve done everything possible to generate the best quality images and you are still not capturing information correctly there are a couple solutions.  Fuzzy text search matches all approximate results for a word, name or number pattern, despite spelling mistakes and number transpositions or unclear OCR results. Secondly, you can use an existing database to validate data to ensure proper results.  If all else fails, documents that don’t meet a certain OCR character confidence threshold can be sent for human review.

Changing Indexing standards – over time systems and indexing standards can change.  PRIA’s recommendation is to adopt a “key it as you see it” approach.  By eliminating data manipulation and translation tables vendors simply return the data exactly the way it is found on the document.  

Unstructured Documents – form documents make it easy for solutions to collect the required data because it always falls into the same location.  The challenge is finding information in unstructured documents where the desired text can fall anywhere within the document.  A rules-based solution doesn’t rely on the information always being in the same location instead identifying key-words or clues to capture text.

Document Classification – capturing data properly can be aided greatly by classifying documents prior to searching for the desired index data.  Once we know the document type specific logic can be applied based on the required fields for each unique documents.  Machine-learning can be used to train software to recognize document type automatically, and ultimately improve the data capture results.

 

We suggest our customers use a combination of automated data
capture followed by a human review.  Once the data is captured
you need to decide what to do with the information. 

We’ll keep that topic for a future discussion.

 

Interested in learning more? Reach out to us to see our products in action and ask how we can customize our 'rules' to capture discrete data and streamline your workflow.

 

   

Subscribe to GovNews

Recent blog posts