Engineering Drawing OCR
OCR (Optical Character Recognition) is a technology that systematically reads and indexes text within images. When used as part of an Engineering Document Management System such as Lunr, this can unlock a large portion of the data set that would not ordinarily be retrievable using search. Searching on text-format documents like Word files or text-based PDFs (instead of image-only PDFs) loads each file into a database that supports full-text search. However, this approach only works with image files, which don't support indexing similarly.
The Problem
Historical scanned drawings (usually TIFF or PDF) often need to contain adequate metadata for search and retrieval, leading to situations where people either need to spend a lot of time browsing through archives to find the files or, worse, give up and re-draw the diagram from scratch.
Operations and maintenance manuals, contracts, and similar files are often too large to appropriately index using standard metadata tagging techniques. Searching for the file's project name or asset description may work, but more broad searches by part description or contractor name are impossible.
The Solution
Our solution is to use Lunr's XRAY technology to OCR all image files as they are uploaded. The text extracted during this process is then indexed, allowing searching over a more significant portion of your document repository.