Articles in this section
Category / Section

How to identify the corrupted PDF document using C# and VB.NET?

3 mins read

Syncfusion Essential PDF is a .NET PDF library used to create, read, and edit PDF documents. Using this library, you can identify the corrupted PDF document using C# and VB.NET.

The following methods are used to find out the corrupted PDF document:

  • Syntax issue documents

The corruption can be found by loading the PDF document. It will throw the exception when the document has issues in cross table structure and object offset.

  • Image related corruptions

These types of issues can be found by extracting images from the PDF document.

  • Content and font related corruptions

These types of issues can be found by extracting text from the PDF document.

  • Structure related corruptions

These types of issues can be found by loading and saving the PDF document by disabling the IncrementalUpdate property.

Steps to identify the corrupted PDF document programmatically:

  1. Create a new C# Windows Forms application project. Create Windows Forms application project
  2. Install the Syncfusion.Pdf.WinForms NuGet package as reference to your .NET Framework application from NuGet.org. NuGet package reference screenshot
  3. Include the following namespaces in the Form1.Designer.cs file.

C#

using Syncfusion.Pdf;
using Syncfusion.Pdf.Parsing;
using System.Drawing;

 

VB.NET

Imports Syncfusion.Pdf
Imports Syncfusion.Pdf.Parsing
Imports System.Drawing

 

  1. Use the following code snippet to identify the corrupted PDF document.

C#

private bool IsCorrupted(string file)
{
    bool isCorrupt = false;
    //Creates an instance of memory stream 
    MemoryStream stream = new MemoryStream();
    try
    {
        //Determine syntax issues
        PdfLoadedDocument ldoc = new PdfLoadedDocument(file);
        foreach (PdfLoadedPage lPage in ldoc.Pages)
        {
            //Determine content and font related issues
            ExtractText(lPage);
        }
        foreach (PdfLoadedPage lPage in ldoc.Pages)
        {
            //Determine image related corruptions
            ExtractImage(lPage);
        }
        //Determine structural related corruptions
        ldoc.FileStructure.IncrementalUpdate = false;
        //Save the PDF document
        ldoc.Save(stream);
        //Close the PDF document
        ldoc.Close(true);
    }
    catch (Exception e)
    {
        isCorrupt = true;
    }
    finally
    {
        //Dispose the memory stream             
        stream.Dispose();
    }
    return isCorrupt;
}

 

VB.NET

Private Function IsCorrupted(file As String) As Boolean
    Dim isCorrupt As Boolean = False
    'Creates an instance of memory stream
    Dim stream As New MemoryStream()
    Try
        'Determine syntax issues
        Dim ldoc As New PdfLoadedDocument(file)
        For Each lPage As PdfLoadedPage In ldoc.Pages
            'Determine content and font related issues
            ExtractText(lPage)
        Next
        For Each lPage As PdfLoadedPage In ldoc.Pages
            'Determine image related corruptions
            ExtractImage(lPage)
        Next
        'Determine structural related corruptions
        ldoc.FileStructure.IncrementalUpdate = False
        'Save the PDF document
        ldoc.Save(stream)
        'Close the PDF document
        ldoc.Close(True)
    Catch e As Exception
        isCorrupt = True
    Finally
        'Dispose the memory stream               
        stream.Dispose()
    End Try
    Return isCorrupt
End Function

 

  1. Add the following code in ExtractText() and ExtractImage() methods to determine the corruptions in the PDF document.

C#

private void ExtractText(PdfLoadedPage lPage)
{
  //Extract text
    string text = lPage.ExtractText();
    text = null;
}
 
private void ExtractImage(PdfLoadedPage lPage)
{
    //Extract images
    Image[] image = lPage.ExtractImages();
    if (image != null)
    {
        for (int i = 0; i < image.Length; i++)
            image[i].Dispose();
    }
    image = null;
}

VB.NET

Private Sub ExtractText(lPage As PdfLoadedPage)
    'Extract text
    Dim text As String = lPage.ExtractText()
    text = Nothing
End Sub
 
Private Sub ExtractImage(lPage As PdfLoadedPage)
    'Extract images
    Dim image As Image() = lPage.ExtractImages()
    If image IsNot Nothing Then
        For i As Integer = 0 To image.Length - 1
            image(i).Dispose()
        Next
    End If
    image = Nothing
End Sub

 

The 100% of corrupted PDF cannot be found using the previously given code snippet.

A complete working sample can be downloaded from PDFSample.zip.

Take a moment to peruse the documentation, where you can find features like text extraction, image extraction and performing incremental update for PDF document.

Refer here to explore the rich set of Syncfusion Essential PDF features.

Note:

Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.

 

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied