Articles in this section

Perform OCR on PDF files with multiple languages using C#

The Syncfusion® .NET Optical Character Recognition (OCR) Library empowers developers to extract text from multilingual PDF files using C#. With minimal code, you can transform scanned PDFs containing raster images into searchable and selectable documents across various languages.

Steps to Perform OCR on a multilingual scanned PDF file using C#:

  1. Create a new project: Start a new Console application in .NET Core.
    Console.png
  2. Install required packages: Add the Syncfusion.PDF.OCR.Net.Core NuGet package as a reference in the console application from Nuget.org.
    NuGetPackage.png
  3. Set up the environment: In the Program.cs file, include the following namespaces.
    C#
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;
using Syncfusion.Pdf.Graphics;
  1. Use the following C# code sample in Program.cs to perform OCR on PDF files containing multiple languages.
    C#
// Load the PDF document
using (PdfLoadedDocument loadedDocument = new PdfLoadedDocument("Input.pdf"))
{
   // Create an OCR processor instance
   OCRProcessor processor = new OCRProcessor();
   // Set Unicode font to preserve special characters in the output PDF
   processor.UnicodeFont = new PdfTrueTypeFont("ARIALUNI.ttf", 8);
   
   // Specify OCR languages
   processor.Settings.Language = "eng+deu+ara+ell+fra"; // English, German, Arabic, Greek, French
   
   // Set the path to the Tesseract language data folder
   processor.TessDataPath = "Tessdata";
   // Perform OCR on the loaded PDF document
   processor.PerformOCR(loadedDocument);
   // Save the PDF document
   loadedDocument.Save("Output.pdf");
} 

A complete working sample is available for download from GitHub.

By executing the program, you will generate the following PDF document.

Output.png

Take a moment to peruse the documentation, where you will find other options like performing OCR on image, region of the document, rotated page, and large PDF documents with code examples.

Conclusion
I hope you enjoyed learning how to perform OCR on PDF files with multiple languages using .NET Core.

You can refer to our ASP.NET Core PDF feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications. You can also explore our ASP.NET Core PDF example to understand how to create and manipulate data.

For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Access denied
Access denied