Working with articles in PDF format can be extremely useful. Providing digital texts is a great way to make some of your course content more accessible to different learners. But for some learners, simply having access to a digital version of a text may not be enough to ensure that the content is accessible. Many students rely on screen readers and text-to-speech software to access their readings. Often times, low quality PDFs or PDFs that have not had an optical character recognition (OCR) performed on them will not function properly with these programs. However, there are steps that can be performed on these documents before they are distributed to students that can eliminate these potential problems.
Optical Character Recognition with Adobe Acrobat Pro IX
Step One: How to tell if a PDF needs to have an OCR performed
The first step is to ensure that you are working with a good quality PDF and this can often be a challenge in a University setting. Many of the electronic versions of readings and articles that are passed out originate from low quality scans. If the text on the page is blurry or blacked out, an OCR will not return accurate results. Because of this, working with documents that are clear and of high quality is important. If you are already working with a high quality PDF, then the first step is to check if an OCR is needed.
The simplest way to check if a PDF needs to have an OCR performed is to try to highlight text with your cursor. If you can highlight specific text in the PDF document, that means that the text is at least being recognized by your computer. See image below for referenc
Notice in the image above that we are able to select text by clicking and dragging over it. This means that the text in your document is at least being recognized. Although, it is important to note that just because the text is selectable it does necessarily mean your computer is accurately recognizing the text, only that it acknowledges the text is there. You can test the accuracy by simply copy & pasting text from your PDF into a word document or notepad window. If the text copy & pastes without a problem, it means that you have a PDF with a good quality OCR! However, if you end up with misspelled words or symbols after pasting, it means that your text is being inaccurately recognized, and this will still cause problems with text-to-speech software.
However, if you are unable to select the text and the entire page is selected instead or you are only able to drag a selection square across the page, this means your computer is not recognizing the text. See image below for reference:
As we can see the image above, the text is not selectable. This means that an OCR should be performed.
Step 2: How to Perform an OCR on a PDF with Adobe Acrobat Pro XI
If you have access to the professional version of Adobe Acrobat, then performing an OCR can be done quickly and easily. The first thing to do is to open the “Tools” sidebar by clicking on it in the top right corner of the screen. The image below highlights the “Tools” button in a red rectangle.
Once the “Tools” sidebar is open, click on the “Recognize Text” tab. From here you will be able to launch the OCR by clicking on the “In This File” option. The image below highlights this option in a red rectangle.
Other Options for OCR
If you do not have access to the professional version of Adobe Acrobat, there are other options available to you. One of the easiest is to us a web-based OCR application. There are several free options available to accomplish this. These sites will perform an OCR on documents you upload and then will let you download the result. However, several of these sites will place restrictions on the size of the document you wish to perform an OCR on and will not let you upload files that are too large. Here are some web based OCR options:
Using the methods outlined above will make most PDFs useable with screen readers, however there are instances where some extra steps need to be taken. If the steps above do not produce a document with searchable and selectable text, please contact timothy [dot] swiffen [at] mcgill [dot] ca (Timothy Swiffen), our Access Technologist, for further assistance.