
That's because it doesn't contain any special effects or unique text blocks. For instance, the image shown below opens directly in Microsoft Word, without losing its original formatting. Once the document is loaded, it's best to check it carefully and review the formatting and adjust accordingly. Such documents also take slightly longer to load. Microsoft states that text documents don't lose as much of their formatting as other PDF documents that include tables, graphs, footnotes, and charts. Related: Formatting Tips for Perfect Tables in Microsoft Word That's because of the functional differences between the original software program that was used to create the PDF file, as well as Microsoft Word. You might lose special font effects as well. This includes footnotes, page breaks, columns, tables, margins, any associated track changes, or special formatting. It's important to understand that you will likely lose the formatting of the original document. The message simply states that Word will now convert the document and load it, which will take slightly longer than standard documents. Some OCR systems can even reproduce formatted output that closely resembles the original page, including images, columns, and other non-textual components.Once you click on the PDF file and open it in Word, you will see a dialog box appear (as shown above). While early versions of OCR needed to be trained with images of each character and worked on one font at a time, advanced systems are now capable of producing highly accurate recognition for most fonts and support a variety of digital image file formats. OCR is a field of research in pattern recognition, artificial intelligence, and computer vision. Digitized text can be electronically edited, searched, stored more efficiently, and used in machine processes such as cognitive computing, machine translation, and text mining. OCR is commonly used to digitize printed text from paper records such as passports, invoices, bank statements, business cards, and mail. Word will convert the PDF file to an editable Word document so you can edit, copy, cut or format the text. You may get a prompt that says Word will now convert your PDF to an editable document.


OCR technology can convert scanned documents, photos of documents, scene-photos, or subtitles superimposed on an image into machine-encoded text. Open a Word document, select File > Open, find the PDF file and select Open. Optical character recognition (OCR) is a process that converts images of typed, handwritten, or printed text into machine-readable text.
