Articles in this section
Category / Section

How to Perform OCR for a PDF Document in Azure Functions

4 mins read

The Syncfusion PDF is a .NET Core PDF library that supports OCR by using the Tesseract open-source engine. Using this library, perform OCR for a PDF document in Azure Functions using .NET Core.

Steps to perform OCR on the entire PDF document in Azure Functions

Step 1: Create the Azure function project.

AzureFunctions1.png

Step 2: Select the framework to Azure Functions and select HTTP triggers as follows.

Project_Name.png

Additional_Info.png

Step 3: Install the Syncfusion.PDF.OCR.NET NuGet package as a reference to your .NET Core application NuGet.org.

NuGet_Package.png

Step 4: Copy the tessdata folder from the bin->Debug->net6.0->runtimes and paste it into the folder that contains the project file.

Tessdata-path.png

Tessdata_Store.png

Step 5: Then, set Copy to output directory to give copy always the tessdata folder.

Set_Copy_Always.png

Step 6: Include the following namespaces in the Function1.cs file to perform OCR for a PDF document using C#.


using System;
using System.IO;
using System.Threading.Tasks;
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf;
using System.Net.Http;
using Syncfusion.Pdf.Parsing;
using System.Net.Http.Headers;
using System.Net;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;

Step 7: Add the following code sample in the Function1 class to perform OCR for a PDF document using the PerformOCR method of the OCRProcessor class in Azure Functions.


[FunctionName("Function1")]
public static async Task<HttpResponseMessage> Run([HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequestMessage req, TraceWriter log, ExecutionContext executionContext)
{
    MemoryStream ms = new MemoryStream();
    try
    {
        OCRProcessor processor = new OCRProcessor();
        FileStream stream = new FileStream(Path.Combine(executionContext.FunctionAppDirectory, "Data", "Input.pdf"), FileMode.Open);
        //Load a PDF document.
        PdfLoadedDocument lDoc = new PdfLoadedDocument(stream);
        //Set OCR language to process.
        processor.Settings.Language = Languages.English;
        //Perform OCR with input document.
        string ocr = processor.PerformOCR(lDoc,Path.Combine(executionContext.FunctionAppDirectory, "tessdata"));            
        //Save a PDF document.  
        lDoc.Save(ms);
        ms.Position = 0;
    }
    catch (Exception ex)
    {
        //Add a page to the document.
        PdfDocument document = new PdfDocument();
        PdfPage page = document.Pages.Add();
        //Create PDF graphics for the page.
        PdfGraphics graphics = page.Graphics;
        //Set the standard font.
        PdfFont font = new PdfStandardFont(PdfFontFamily.Helvetica, 6);
        //Draw the text.
        graphics.DrawString(ex.ToString(), font, PdfBrushes.Black, new Syncfusion.Drawing.PointF(0, 0));
        ms = new MemoryStream();
        //Save a PDF document.  
        document.Save(ms);
    }
    HttpResponseMessage response = new HttpResponseMessage(HttpStatusCode.OK);
    response.Content = new ByteArrayContent(ms.ToArray());
    response.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment")
    {
        FileName = "Output.pdf"
    };
    response.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/pdf");
    return response;
}

Step 8: Now, check the OCR creation in the local machine.

Steps to publish as Azure Functions

Step 1: Right-click the project and click Publish. Then, create a new profile in the Publish Window and create the Azure Function App with a consumption plan.

AzureFunctions.png

Azure.png

AzureFunctionApp.png

Step 2: After creating the profile, click Publish.

publish_app_service.png

Step 3: Now, publish has been succeeded.
published_app_service.png

Step 4: Now, go to the Azure portal and select the Functions Apps. After running the service, click Get function URL > Copy. Include the URL as a query string in the URL. Then, paste it into the new browser tab. You will get a PDF document as follows.

Output.png

A complete working sample can be downloaded from GitHub.

Click here to explore the rich Syncfusion PDF library features.

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied