How to identify the corrupted PDF document using C# and VB.NET?
Syncfusion Essential PDF is a .NET PDF library used to create, read, and edit PDF documents. Using this library, you can identify the corrupted PDF document using C# and VB.NET.
The following methods are used to find out the corrupted PDF document:
- Syntax issue documents
The corruption can be found by loading the PDF document. It will throw the exception when the document has issues in cross table structure and object offset.
- Image related corruptions
These types of issues can be found by extracting images from the PDF document.
- Content and font related corruptions
These types of issues can be found by extracting text from the PDF document.
- Structure related corruptions
These types of issues can be found by loading and saving the PDF document by disabling the IncrementalUpdate property.
Steps to identify the corrupted PDF document programmatically:
- Create a new C# Windows Forms application project.
- Install the Syncfusion.Pdf.WinForms NuGet package as reference to your .NET Framework application from NuGet.org.
- Include the following namespaces in the Form1.Designer.cs file.
C#
using Syncfusion.Pdf; using Syncfusion.Pdf.Parsing; using System.Drawing;
VB.NET
Imports Syncfusion.Pdf Imports Syncfusion.Pdf.Parsing Imports System.Drawing
- Use the following code snippet to identify the corrupted PDF document.
C#
private bool IsCorrupted(string file) { bool isCorrupt = false; //Creates an instance of memory stream MemoryStream stream = new MemoryStream(); try { //Determine syntax issues PdfLoadedDocument ldoc = new PdfLoadedDocument(file); foreach (PdfLoadedPage lPage in ldoc.Pages) { //Determine content and font related issues ExtractText(lPage); } foreach (PdfLoadedPage lPage in ldoc.Pages) { //Determine image related corruptions ExtractImage(lPage); } //Determine structural related corruptions ldoc.FileStructure.IncrementalUpdate = false; //Save the PDF document ldoc.Save(stream); //Close the PDF document ldoc.Close(true); } catch (Exception e) { isCorrupt = true; } finally { //Dispose the memory stream stream.Dispose(); } return isCorrupt; }
VB.NET
Private Function IsCorrupted(file As String) As Boolean Dim isCorrupt As Boolean = False 'Creates an instance of memory stream Dim stream As New MemoryStream() Try 'Determine syntax issues Dim ldoc As New PdfLoadedDocument(file) For Each lPage As PdfLoadedPage In ldoc.Pages 'Determine content and font related issues ExtractText(lPage) Next For Each lPage As PdfLoadedPage In ldoc.Pages 'Determine image related corruptions ExtractImage(lPage) Next 'Determine structural related corruptions ldoc.FileStructure.IncrementalUpdate = False 'Save the PDF document ldoc.Save(stream) 'Close the PDF document ldoc.Close(True) Catch e As Exception isCorrupt = True Finally 'Dispose the memory stream stream.Dispose() End Try Return isCorrupt End Function
- Add the following code in ExtractText() and ExtractImage() methods to determine the corruptions in the PDF document.
C#
private void ExtractText(PdfLoadedPage lPage) { //Extract text string text = lPage.ExtractText(); text = null; } private void ExtractImage(PdfLoadedPage lPage) { //Extract images Image[] image = lPage.ExtractImages(); if (image != null) { for (int i = 0; i < image.Length; i++) image[i].Dispose(); } image = null; }
VB.NET
Private Sub ExtractText(lPage As PdfLoadedPage) 'Extract text Dim text As String = lPage.ExtractText() text = Nothing End Sub Private Sub ExtractImage(lPage As PdfLoadedPage) 'Extract images Dim image As Image() = lPage.ExtractImages() If image IsNot Nothing Then For i As Integer = 0 To image.Length - 1 image(i).Dispose() Next End If image = Nothing End Sub
The 100% of corrupted PDF cannot be found using the previously given code snippet.
A complete working sample can be downloaded from PDFSample.zip.
Take a moment to peruse the documentation, where you can find features like text extraction, image extraction and performing incremental update for PDF document.
Refer here to explore the rich set of Syncfusion Essential PDF features.
Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.