Skip to content
PrinterArchive

Glossary · Definition

OCR (Optical Character Recognition)

OCR is the process of recognising text in an image of a document so that it can be searched, selected, and edited rather than treated as a flat picture.

By PrinterArchive EditorialEdited by PrinterArchive Editorial

When a page is scanned, the result is initially an image: a grid of pixels with no understanding of the letters it contains. Optical character recognition analyses that image, identifies character shapes, and produces a text layer that software can search and select.

OCR is what makes a scanned PDF searchable. Without it, the document looks correct on screen but its words cannot be found, copied, or indexed. Accuracy depends on scan quality, contrast, language, and the legibility of the original.

Continue in the archive

Related reading