How the performance of the OCR process can be improved?
The Syncfusion .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract OCR engine.
Using this library, one can enhance the speed of the OCR, resulting in rapid OCR performance. However, it’s important to note that the output may not be as accurate as in the slow mode.
Steps to speed up the performance of OCR in PDF document:
- Create a new C# console application project.
2. Install the Syncfusion.Pdf.OCR.WinForms NuGet package as a reference to your .NET console application from NuGet.org.
- Install the following namespaces in the Program.cs file.
C#
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;
Imports Syncfusion.OCRProcessor
Imports Syncfusion.Pdf.Parsing
- Use the following code example to speed up the performance of OCR in a PDF document.
C#
//Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{
//Load an existing PDF document.
PdfLoadedDocument document = new PdfLoadedDocument("../../Data/Input.pdf");
//Set OCR language.
processor.Settings.Language = Languages.English;
//Set the OCR performance
processor.Settings.Performance = Performance.Rapid;
//Perform OCR with input document.
processor.PerformOCR(document);
//Save the PDF document.
document.Save("Sample.pdf");
//Close the document.
document.Close(true);
Process.Start("Sample.pdf");
}
'Initialize the OCR processor.
Using processor As OCRProcessor = New OCRProcessor()
'Load an existing PDF document.
Dim document As PdfLoadedDocument = New PdfLoadedDocument("../../Data/Input.pdf")
'Set the OCR language.
processor.Settings.Language = Languages.English
'Set the OCR performance.
processor.Settings.Performance = Performance.Rapid
'Perform OCR with input document.
processor.PerformOCR(document)
'Save the PDF document.
document.Save("Sample.pdf")
'Close the document.
document.Close(True)
'This will open the PDF file, the result will be seen in the default PDF Viewer.
Process.Start("Sample.pdf")
End Using.
A complete working sample can be downloaded from the Improvise_the_OCR_performance_on_PDF.zip.
By executing the program, you will get the text file (contains extracted text) as follows.
Benchmark
In this context, accuracy means recognizing the data through OCR. In both cases using rapid and slow the accuracy is almost same, but for few cases with complex document structure the slow performance performs better while recognizing the data when compared to rapid, since in slow performance we used to perform page segment analysis using Tesseract which leads to more accuracy in recognizing the data, which is not implemented using rapid performance.
Take a moment to peruse the documentation, where you will find other options like performing OCR on the image, region of the document, rotated page, and large PDF documents with code examples.
Refer here to explore the rich set of the Syncfusion Essential PDF features.
Note: Starting with v16.2.0.x, if you reference Syncfusion assemblies from the trial setup or the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering the Syncfusion license key in your application to use the components without a trail message.
Conclusion
I hope you enjoyed learning about how the performance of the OCR process can be improved.
You can refer to our .NET PDF feature tour page to know about its other groundbreaking feature representations. You can also explore our documentation to understand how to create and manipulate data.
For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion, you can try our 30-day free trial to check out our other controls.
If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!