Articles in this section
Category / Section

How to perform OCR for a PDF document in Azure App Service?

8 mins read

The Syncfusion Essential® PDF is a .NET Core PDF library that supports OCR by using the Tesseract open-source engine. Using this library, you can Perform OCR for a PDF document in Azure using .NET Core.

Steps to Perform OCR for a PDF document in Azure programmatically:

  1. Create a new ASP.NET Core MVC application. Create new ASP.NET Core application in visual studio
  2. Install the Syncfusion.PDF.OCR.Net.Core NuGet package as a reference to your .NET Core application from NuGet.org Install required nuget packages
  3. Copy the tessdata and Tesseractbinaries folder from the installed OCR NuGet package and paste it into the folder that contains the project file. Copy the binaries folder  Paste the binaries folder in to project location.
  4. Then, set Copy to output directory to copy all the tessdata and Tesseract binaries (All files including inner folders and files) assemblies. Copy to output directory property
  5. Add a Perform OCR button in index.cshtml. Add the button in cs html.
    @{Html.BeginForm("PerformOCR", "Home", FormMethod.Get);
        {
            <br />
            <div>
                <input type="submit" value="Perform OCR" style="width:150px;height:27px" />
            </div>
        }
        Html.EndForm();
    }
    

 

  1. Include the following namespaces and code samples for performing OCR for a PDF document in Azure.

C#

using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;
using System.Net;
using System.Net.Http.Headers;
using Microsoft.Azure.WebJobs.Host;

 

//To get content root path of the project
private readonly IHostingEnvironment _hostingEnvironment;
public HomeController(IHostingEnvironment hostingEnvironment)
{
   _hostingEnvironment = hostingEnvironment;
}
 
public IActionResult PerformOCR()
{
       //Initialize the OCR processor with tesseract binaries folder path
       string binaries = Path.Combine(_hostingEnvironment.ContentRootPath, "Tesseractbinaries", "Windows");
       OCRProcessor processor = new OCRProcessor(binaries);
       //Set custom temp file path location
       processor.Settings.TempFolder = Path.Combine(_hostingEnvironment.ContentRootPath, "Data");
      //Load a PDF document
      FileStream stream1 = new FileStream(Path.Combine(_hostingEnvironment.ContentRootPath, "Data", "Input.pdf"), FileMode.Open);
      PdfLoadedDocument lDoc = new PdfLoadedDocument(stream1);
      //Set OCR language to process
      processor.Settings.Language = Languages.English;
      //Perform OCR with input document and tessdata (Language packs)
      string tessdataPath = Path.Combine(_hostingEnvironment.ContentRootPath, "tessdata");
      string ocr = processor.PerformOCR(lDoc, tessdataPath);
      //Save the document. 
      MemoryStream stream = new MemoryStream();
      lDoc.Save(stream);
      return File(stream.ToArray(), System.Net.Mime.MediaTypeNames.Application.Pdf, "OCR_Azure.pdf");
}
  1. Now, check the OCR creation in the local machine.
  2. Right-click the project and select Publish. Select Publish
  3. Create a new profile in the publish window. Pick publish target
  4. Create App Service using the Azure subscription and select a hosting plan. Create Azure App service
  5. Configure the Hosting plan. Configure Hosting plan
  6. After creating a profile, click the Publish button. Click Publish
  7. Now, the published website will open in the browser, then you can perform OCR for a PDF document. Published screen

Output document

A complete working sample can be downloaded from PerformOCR.zip.

Take a moment to peruse the documentation. You will find other options like performing OCR on the image, region of the document, and large PDF documents with code examples.

Refer here to explore a rich set of Syncfusion Essential® PDF features.

Note:

Starting with v16.2.0.x, if you reference Syncfusion® assemblies from trial setup or the NuGet feed, include a license key in your projects. Refer to the link to learn about generating and registering the Syncfusion® license key in your application to use the components without trail message.

 

Conclusion


I hope you enjoyed learning about how to perform OCR for a PDF document in Azure App Service.

You can refer to our ASP.NET Core PDF's feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications. You can also explore our ASP.NET PDF example to understand how to create and manipulate data.

For current customers, you can check out our Document Processing Libraries from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied