Articles in this section
Category / Section

How to Perform OCR on Ink Annotation in WinForms PDF using C# and VB.NET?

4 mins read

The Syncfusion® .NET Optical Character Recognition (OCR) Library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR Library uses a powerful Tesseract OCR engine.
Using this library, the PDF document containing ink annotation is converted to a searchable and selectable document using C# and VB.NET.

Steps to perform OCR on ink annotation in PDF programmatically:

  1. Create a new C# Windows Forms application project. Windows app creation.png
  2. Install the Syncfusion.Pdf.OCR.WinForms NuGet packages as a reference to your WinForms application from NuGet.org.
    NuGet package reference.png

Download the language packages from the following link.
https://github.com/tesseract-ocr/tessdata

  1. Add a new button in Form1.Designer.cs file.
private System.Windows.Forms.Button button1;
private System.Windows.Forms.Label label1;
private void InitializeComponent()
{
    this.button1 = new System.Windows.Forms.Button();
    this.label1 = new System.Windows.Forms.Label();
    this.SuspendLayout();
    // 
    // button1
    // 
    this.button1.Location = new System.Drawing.Point(262, 205);
    this.button1.Name = "button1";
    this.button1.Size = new System.Drawing.Size(200, 39);
    this.button1.TabIndex = 0;
    this.button1.Text = "OCR Ink Annotation";
    this.button1.UseVisualStyleBackColor = true;
    this.button1.Click += new System.EventHandler(this.button1_Click);
    // 
    // label1
    // 
    this.label1.AutoSize = true;
    this.label1.Location = new System.Drawing.Point(160, 173);
    this.label1.Name = "label1";
    this.label1.Size = new System.Drawing.Size(438, 20);
    this.label1.TabIndex = 1;
    this.label1.Text = "Click button to view sample for OCRing ink annotation in PDF";
    // 
    // Form1
    // 
    this.AutoScaleDimensions = new System.Drawing.SizeF(9F, 20F);
    this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;
    this.ClientSize = new System.Drawing.Size(800, 450);
    this.Controls.Add(this.label1);
    this.Controls.Add(this.button1);
    this.Name = "Form1";
    this.Text = "Form1";
    this.ResumeLayout(false);
    this.PerformLayout();
}
  1. Include the following namespaces in the Form1.cs file.

C#

using Syncfusion.OCRProcessor;
using System.IO;
using Syncfusion.Pdf;
using Syncfusion.Pdf.Parsing;

VB.NET

Imports Syncfusion.OCRProcessor
Imports System.IO
Imports Syncfusion.Pdf 
Imports Syncfusion.Pdf.Parsing
  1. Create the btnCreate_Click event and use the following code sample to process OCR for the ink annotation in the PDF.

C#

//Load the input ink annotation PDF document.
PdfLoadedDocument loadedDocument = new PdfLoadedDocument("sample.pdf");
//Get the first page from the loaded document.
PdfLoadedPage lpage = loadedDocument.Pages[0] as PdfLoadedPage;
//Flatten the annotations on the first page.
loadedDocument.Pages[0].Annotations.Flatten = true;
MemoryStream ms = new MemoryStream();
//Save the flattened document.
loadedDocument.Save(ms);
//Load the flattened PDF.
PdfLoadedDocument flattenedDocument = new PdfLoadedDocument(ms);
//Export the first page as an image.
Bitmap image = flattenedDocument.ExportAsImage(0);

using (OCRProcessor processor = new OCRProcessor())
{
    //Set the OCR language to process.
    processor.Settings.Language = "deu";
    //Process OCR by providing the bitmap image, data dictionary, and language.
    string ocrText = processor.PerformOCR(image,processor.TessDataPath);
    StreamWriter streamwriter = File.CreateText("OCRingInkAnnotation.txt");
    streamwriter.Write(ocrText);
    streamwriter.Close();
}
flattenedDocument.Close(true);
loadedDocument.Close(true);

VB.NET

'Load the input ink annotation PDF document.
Dim loadedDocument As PdfLoadedDocument = New PdfLoadedDocument("sample.pdf")
'Get the first page from the loaded document.
Dim lpage As PdfLoadedPage = CType(loadedDocument.Pages(0), PdfLoadedPage)
'Flatten the annotations of the first page.
loadedDocument.Pages(0).Annotations.Flatten = True
Dim ms As MemoryStream = New MemoryStream()
'Save the flattened document.
loadedDocument.Save(ms)
Dim flattenedDocument As PdfLoadedDocument = New PdfLoadedDocument(ms)
Dim image As Bitmap = flattenedDocument.ExportAsImage(0)

Using processor As OCRProcessor = New OCRProcessor()
'Language to process the OCR.
processor.Settings.Language = "deu"
'Process OCR by providing the loaded PDF document and Tesseract data.
Dim text As String = processor.PerformOCR(image,processor.TessDataPath)
'Write the text to the file.
File.WriteAllText("OCRingInkAnnotation.txt", text)
End Using.
flattenedDocument.Close(True)
loadedDocument.Close(True)

A complete working sample can be downloaded from the OCRingInkAnnotation.zip.

By executing the program, you will get the text file (contains extracted text) as follows. Output screenshot.png

Take a moment to peruse the documentation, where you will find other options like performing OCR on an image, region of the document, rotated page, and large PDF documents with code examples.

Refer here to explore the rich set of Syncfusion Essential® PDF features.

Note: Starting with v16.2.0.x, if you reference Syncfusion® assemblies from the trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering the Syncfusion® license key in your application to use the components without a trail message.

Conclusion
I hope you enjoyed learning
You can refer to our Winforms PDF feature tour page to know about its other groundbreaking feature representations. You can also explore our documentation to understand how to create and manipulate data.
For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.
If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments
Please  to leave a comment
Access denied
Access denied