Articles in this section
Category / Section

How to Extract Text From the WPF PDF Viewer Hosted in WinForms?

4 mins read

To extract text from a predefined rectangle, we must draw a rectangle annotation and perform the extraction process to extract text from the predefined rectangle annotation. Since the WinForms PDF Viewer does not support rectangle annotations directly, we host the WPF PDF Viewer inside a WinForms app using ElementHost.
To achieve the requirement, enable the rectangle annotation mode. Capture the bounds of the drawn rectangle using the ShapeAnnotationChanged event. Convert the corresponding PDF page to a bitmap, crop the image to the annotation’s bounds, and extract the text using an OCR engine.

Steps to extract text from a predefined rectangle in the PDF

Step 1:

In a button click event, set the annotation mode to rectangle

pdfViewer.AnnotationMode = Syncfusion.Windows.PdfViewer.PdfDocumentView.PdfViewerAnnotationMode.Rectangle; 

Step 2:

Subscribe to the ShapeAnnotationChanged event of the PDF Viewer. Once the rectangle annotation is drawn, the event is triggered. Within this event, check if the action indicates a new annotation was added. If so, capture the bounds of the rectangle.

// Hook the ShapeAnnotationChanged event 
pdfViewer.ShapeAnnotationChanged += pdfViewer_ShapeAnnotationChanged; 

private void PdfViewer_ShapeAnnotationChanged(object sender, Syncfusion.Windows.PdfViewer.ShapeAnnotationChangedEventArgs e)
{
   if (e.Action == Syncfusion.Windows.PdfViewer.AnnotationChangedAction.Add)
   {
       bounds = e.NewBounds;
       PdfLoadedDocument loadedDocument = pdfViewer.LoadedDocument;
   }
}

Step 3:

Set the OCR language to the desired language. Using the API ExportAsImage, export the desired page, which we get as a bitmap source, and convert the bitmap source to a bitmap.

   // Language to process the OCR
   processor.Settings.Language = Languages.English;
   Bitmap image = GetBitmap(pdfViewer.ExportAsImage(pdfViewer.CurrentPageIndex - 1)); 

The process for converting the bitmapSource to Bitmap:

Bitmap GetBitmap(BitmapSource source)
{
   Bitmap bmp = new Bitmap(
     source.PixelWidth,
     source.PixelHeight,
     System.Drawing.Imaging.PixelFormat.Format32bppPArgb);
   BitmapData data = bmp.LockBits(
     new System.Drawing.Rectangle(System.Drawing.Point.Empty, bmp.Size),
     ImageLockMode.WriteOnly,
     System.Drawing.Imaging.PixelFormat.Format32bppPArgb);
   source.CopyPixels(
     Int32Rect.Empty,
     data.Scan0,
     data.Height * data.Stride,
     data.Stride);
   bmp.UnlockBits(data);
   return bmp;
} 

Step 4:

Crop the specific region from the Bitmap image for OCR text extraction by cloning the bitmap to the bounds.

Bitmap clonedImage = image.Clone(bounds, System.Drawing.Imaging.PixelFormat.Format32bppArgb); 

Step 5:

Now, perform the actual OCR process using any OCR engine.

string ocrText = processor.PerformOCR(clonedImage, "../../Tessdata/"); 

A complete working sample to extract text from a predefined rectangle in a PDF can be downloaded from GitHub.

Conclusion

I hope you enjoyed learning about how to extract text from a predefined rectangle in the WPF PDF Viewer hosted in a WinForms app.
You can refer to our WinForms PDF Viewer page to learn about its other groundbreaking feature representations. You can also explore our WinForms PDF Viewer documentation to understand how to present and manipulate data.

For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.

If you have any queries or require clarification, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied