How to Process OCR for the Rotated Image in a PDF Using C# and VB.NET?

The OCR is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract OCR engine.
Using this library, the PDF document containing rotated images is converted to a searchable and selectable document using C# and VB.NET.

Steps to process OCR for the rotated image in PDF programmatically:

Create a new C# console application project.
Install the Syncfusion.Pdf.OCR.WinForms NuGet package as a reference to your .NET Framework application from NuGet.org.

Note: The osd.traineddata file should be present in the Tessdata folder to enable the AutoDetectRotation property.

Note: The AutoDetectRotation property is supported only in .NET Framework applications.

Include the following namespaces in the Program.cs file.

using Syncfusion.Pdf.Parsing;
using Syncfusion.OCRProcessor;

VB.NET

Imports Syncfusion.Pdf.Parsing
Imports Syncfusion.OCRProcessor

Use the following code sample to process OCR for the rotated image in the PDF.

// Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{
    // Load the PDF document.
    PdfLoadedDocument ldoc = new PdfLoadedDocument("Input.pdf");
    // Language to process the OCR.
    processor.Settings.Language = Languages.English;
    // Enable the AutoDetectRotation.
    processor.Settings.AutoDetectRotation = true;
    // Process OCR by providing a loaded PDF document.
    String text = processor.PerformOCR(ldoc);
    // Write the text to the file.
    File.WriteAllText("ExtractedText.txt", text);
}

VB.NET

'Initialize the OCR processor.
Using processor As OCRProcessor = New OCRProcessor()
    'Load the PDF document.
    Dim ldoc As PdfLoadedDocument = New PdfLoadedDocument("Input.pdf")
    'Language to process the OCR.
    processor.Settings.Language = Languages.English
    'Enable the AutoDetectRotation.
    processor.Settings.AutoDetectRotation = True
    'Process OCR by providing a loaded PDF document.
    Dim text As String = processor.PerformOCR(ldoc)
    'Write the text to the file.
    File.WriteAllText("ExtractedText.txt", text)
End Using

A complete working sample can be downloaded from OCRSample.zip.

By executing the program, you will get the PDF document as follows. Output screenshot.png

Take a moment to peruse the documentation, where you will find other options like performing OCR on an image, a region of the document, a rotated page, and large PDF documents with code examples.

Refer here to explore the rich set of Syncfusion Essential® PDF features.

Note: Starting with v16.2.0.x, if you reference Syncfusion® assemblies from the trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering the Syncfusion® license key in your application to use the components without a trial message.

Conclusion

I hope you enjoyed learning about how to process OCR for the rotated image in a PDF using C# and
VB.NET.

You can refer to our PDF feature tour page to learn about its other groundbreaking feature representations. You can also explore our documentation to understand how to create and manipulate data.

For current customers, you can check out our components from the
License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums or feedback portal. We are always happy to assist you!

Did you find this information helpful?

Yes

Comments (0)

How to Process OCR for the Rotated Image in a PDF Using C# and VB.NET?

Steps to process OCR for the rotated image in PDF programmatically:

Access denied