Articles in this section
Category / Section

How to Perform OCR for a Region of the PDF Document using C# and VB.NET

4 mins read

The Syncfusion® .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract OCR engine.
Using this library, you can perform OCR on a particular region or several regions of a PDF document in C# and VB.NET.

Steps to perform OCR on a particular region of a PDF programmatically:

  1. Create a new C# Windows Forms application project. Windows app creation.png
  2. Install the Syncfusion.Pdf.OCR.WinForms NuGet packages as a reference to your WinForms application from NuGet.org.
    Nuget package.png

Download the language packages from the following link.
https://github.com/tesseract-ocr/tessdata

  1. Add a new button in Form1.Designer.cs file.
private System.Windows.Forms.Button button1;
private System.Windows.Forms.Label label1;
private void InitializeComponent()
{
    this.button1 = new System.Windows.Forms.Button();
    this.label1 = new System.Windows.Forms.Label();
    this.SuspendLayout();
    // 
    // button1
    // 
    this.button1.Location = new System.Drawing.Point(298, 226);
    this.button1.Name = "button1";
    this.button1.Size = new System.Drawing.Size(159, 46);
    this.button1.TabIndex = 0;
    this.button1.Text = "Perform OCR";
    this.button1.UseVisualStyleBackColor = true;
    this.button1.Click += new System.EventHandler(this.button1_Click);
    // 
    // label1
    // 
    this.label1.AutoSize = true;
    this.label1.Location = new System.Drawing.Point(136, 193);
    this.label1.Name = "label1";
    this.label1.Size = new System.Drawing.Size(503, 20);
    this.label1.TabIndex = 1;
    this.label1.Text = "Click button to perform OCR on particular region of the PDF document";
    // 
    // Form1
    // 
    this.AutoScaleDimensions = new System.Drawing.SizeF(9F, 20F);
    this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;
    this.ClientSize = new System.Drawing.Size(800, 450);
    this.Controls.Add(this.label1);
    this.Controls.Add(this.button1);
    this.Name = "Form1";
    this.Text = "Form1";
    this.ResumeLayout(false);
    this.PerformLayout();
}
  1. Include the following namespaces in the Form1.cs file.

C#

using Syncfusion.Pdf.Parsing;
using Syncfusion.OCRProcessor;
using System.Drawing;

VB.NET

Imports Syncfusion.Pdf.Parsing
Imports Syncfusion.OCRProcessor
Imports System.Drawing
  1. Create the btnCreate_Click event and use the following code snippet to process OCR for a region of the PDF document.

C#

//Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{
    //Load a PDF document.
    PdfLoadedDocument lDoc = new PdfLoadedDocument("Input.pdf");
    //Set OCR language to process.
    processor.Settings.Language = "deu";
    RectangleF rect = new RectangleF(0, 100, 950, 150);
    //Assign rectangles to the page.
    List<pageregion> pageRegions = new List<pageregion>();
    PageRegion region = new PageRegion();
    region.PageIndex = 0;
    region.PageRegions = new RectangleF[] { rect };
    pageRegions.Add(region);
    processor.Settings.Regions = pageRegions;
    //Process OCR by providing the PDF document.
    processor.PerformOCR(lDoc);
    //Save the OCR processed PDF document on the disk.
    lDoc.Save("OCRingRegionOfPDF.pdf");
    lDoc.Close(true);
}

VB.NET

'Load a PDF document.
Dim lDoc As New PdfLoadedDocument("Input.pdf")
'Initialize the OCR processor.
Dim processor As New OCRProcessor()
'Set OCR language to process.
processor.Settings.Language = "deu"
Dim rect As New RectangleF(0, 100, 950, 150)
'Assign rectangles to the page.
Dim pageRegions As New List(Of PageRegion)()
Dim region As New PageRegion()
region.PageIndex = 0
region.PageRegions = New RectangleF() {rect}
pageRegions.Add(region)
processor.Settings.Regions = pageRegions
'Process OCR by providing the PDF document.
processor.PerformOCR(lDoc)
'Save the OCR processed PDF document in the disk.
lDoc.Save("OCRingRegionOfPDF.pdf")
lDoc.Close(True)

A complete working sample can be downloaded from the OCRingRegionOfPDF.zip.

By executing the program, you will get the text file (contains extracted text) as follows.

Output screenshot.png

Take a moment to peruse the documentation, where you will find other options like performing OCR on an image, region of the document, rotated page, and large PDF documents with code examples.

Refer here to explore the rich set of Syncfusion Essential® PDF features.

Note: Starting with v16.2.0.x, if you reference Syncfusion® assemblies from the trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering the Syncfusion® license key in your application to use the components without a trail message.

Conclusion
I hope you enjoyed learning about how to Perform OCR for a Region of the PDF Document using C# and VB.NET.

You can refer to our WinForms PDF feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications.

For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied