Articles in this section
Category / Section

How to Perform OCR in ASP.NET Core Platform

3 mins read

The Syncfusion .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. Save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract OCR engine.
Using this library, we can perform OCR on scanned PDF documents using C# and VB.NET.

Steps to perform OCR on scanned PDF programmatically

  1. Create a new C# ASP.NET Core Web application project.
    ASP.NET Core app creation.png
  2. Install the Syncfusion.PDF.OCR.Net.Core NuGet package as a reference to your .NET Standard application from Nuget.org.
    Nuget package.png
  3. A default controller named HomeController.cs gets added to the creation of the ASP.NET Core MVC project. Include the following namespaces in that HomeController.cs file.

C#

using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;

VB.NET

Imports Syncfusion.OCRProcessor
Imports Syncfusion.Pdf.Graphics
Imports Syncfusion.Pdf.Parsing
  1. Add a new button in index.cshtml as follows.
@{
    Html.BeginForm("PerformOCR", "Home", FormMethod.Get);
    {
        <div>
            <input type="submit" value="Perform OCR" style="width:150px;height:27px" />
        </div>
    }
    Html.EndForm();
}
  1. Add a new action method named PerformOCR in the HomeController.cs and use the following code sample to perform OCR in the ASP.NET Core application.

C#

//Load an existing PDF document.
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream); 
//Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{                
    //Language to process the OCR.
    processor.Settings.Language = Languages.English;
    FileStream fontStream = new FileStream("ARIALUNI.ttf", FileMode.Open, FileAccess.Read);
    processor.UnicodeFont = new PdfTrueTypeFont(fontStream, 8);
    //Process OCR by providing the loaded PDF document, Data dictionary, and language.
    processor.PerformOCR(loadedDocument);
}
 
//Save a PDF to the MemoryStream.
MemoryStream stream = new MemoryStream();
loadedDocument.Save(stream);
//Close a PDF document.
loadedDocument.Close(true);
//Set the position as '0.'
stream.Position = 0;
//Download a PDF document in the browser.
FileStreamResult fileStreamResult = new FileStreamResult(stream, "application/pdf");
fileStreamResult.FileDownloadName = "OCR.pdf";
return fileStreamResult;

VB.NET

'Load an existing PDF document. 
Dim docStream As FileStream = New FileStream("Input.pdf", FileMode.Open, FileAccess.Read)
Dim loadedDocument As PdfLoadedDocument = New PdfLoadedDocument(docStream) 
'Initialize the OCR processor.
Using processor As OCRProcessor = New OCRProcessor()
 
    'Language to process the OCR. 
    processor.Settings.Language = Languages.English
    Dim fontStream As FileStream = New FileStream("ARIALUNI.ttf", FileMode.Open, FileAccess.Read)
    processor.UnicodeFont = New PdfTrueTypeFont(fontStream, 8)
    'Process OCR by providing the loaded PDF document, Data dictionary, and language.
    processor.PerformOCR(loadedDocument)
End Using
 
'Saving a PDF to the MemoryStream.
Dim stream As MemoryStream = New MemoryStream()
loadedDocument.Save(stream)
'Close a PDF document.  
loadedDocument.Close(True)
'Set the position as '0.' 
stream.Position = 0
'Download a PDF document in the browser.
Dim fileStreamResult As FileStreamResult = New FileStreamResult(stream, "application/pdf")
fileStreamResult.FileDownloadName = "OCR.pdf"
Return fileStreamResult

A complete working sample can be downloaded from the OCRSample.zip.

By executing the program, you will get a PDF document as follows. Output screenshot.png

Take a moment to peruse the documentation, where you will find other options like performing OCR on an image, region of the document, rotated page, and large PDF documents with code examples.

Refer to here to explore the rich set of Syncfusion Essential PDF features.

Note: Starting with v16.2.0.x, if you reference Syncfusion assemblies from the trial setup or NuGet feed, include a license key in your projects. Refer to this link to learn about generating and registering the Syncfusion license key in your application to use the components without a trail message.

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied