Engineering Drawing OCR

OCR (Optical Character Recognition) is a technology that can be used to systematically read and index text within images. When used as a part of an Engineering Data and Drawing Management System such as Lunr, this can be used to unlock a large portion of the data set that would not ordinarily be retrievable using search. Searching on text format documents like word files, or text-based PDFs (as opposed to image only PDFs) is a simple matter of loading each of the files into a database that supports full-text search. However, this approach doesn't work when it comes to image files, which don't support indexing in the same way.

The Problem

  • Historical scanned drawings (usually TIFF or PDF) often do not contain adequate metadata for search and retrieval. This leads to situations where people either need to spend a large amount of time browsing through archives to find the files or worse, they give up and re-draw the diagram from scratch.

  • Operations and maintenance manuals, contracts, and similar files are often too large to appropriately index using standard metadata tagging techniques. Searching for the file's project name or asset description may work in some cases, but more broad searches by part description or contractor name are impossible.

The Solution

Our solution is to use Lunr's XRAY technology to OCR all image files as they are uploaded. The text extracted during this process is then indexed, allowing searching over a much greater portion of your document repository.

