How to identify the corrupted PDF document using C# and VB.NET?

8 mins read

Syncfusion Essential® PDF is a .NET PDF library used to create, read, and edit PDF documents. Using this library, you can identify the corrupted PDF document using C# and VB.NET.

The following methods are used to find corrupted PDF documents:

Syntax issue documents

The corruption can be found by loading the PDF document. It will throw an exception when the document has issues in the cross-table structure and object offset.

Image related corruptions

These types of issues can be found by extracting images from the PDF document.

Content and font related corruptions

These types of issues can be found by extracting text from the PDF document.

Structure related corruptions

These types of issues can be found by loading and saving the PDF document by disabling the IncrementalUpdate property.

Steps to identify the corrupted PDF document programmatically:

Create a new C# Windows Forms application project.
Install the Syncfusion.Pdf.WinForms NuGet package as reference to your .NET Framework application from NuGet.org.
Include the following namespaces in the Form1.Designer.cs file.

using Syncfusion.Pdf;
using Syncfusion.Pdf.Parsing;
using System.Drawing;

VB.NET

Imports Syncfusion.Pdf
Imports Syncfusion.Pdf.Parsing
Imports System.Drawing

Use the following code snippet to identify the corrupted PDF document.

private bool IsCorrupted(string file)
{
    bool isCorrupt = false;
    // Creates an instance of memory stream 
    MemoryStream stream = new MemoryStream();
    try
    {
        // Determine syntax issues
        PdfLoadedDocument ldoc = new PdfLoadedDocument(file);
        foreach (PdfLoadedPage lPage in ldoc.Pages)
        {
            // Determine content and font-related issues
            ExtractText(lPage);
        }
        foreach (PdfLoadedPage lPage in ldoc.Pages)
        {
            // Determine image-related corruptions
            ExtractImage(lPage);
        }
        // Determine structural-related corruptions
        ldoc.FileStructure.IncrementalUpdate = false;
        // Save the PDF document
        ldoc.Save(stream);
        // Close the PDF document
        ldoc.Close(true);
    }
    catch (Exception e)
    {
        isCorrupt = true;
    }
    finally
    {
        // Dispose of the memory stream             
        stream.Dispose();
    }
    return isCorrupt;
}

VB.NET

Private Function IsCorrupted(file As String) As Boolean
    Dim isCorrupt As Boolean = False
    ' Creates an instance of memory stream
    Dim stream As New MemoryStream()
    Try
        ' Determine syntax issues
        Dim ldoc As New PdfLoadedDocument(file)
        For Each lPage As PdfLoadedPage In ldoc.Pages
            ' Determine content and font-related issues
            ExtractText(lPage)
        Next
        For Each lPage As PdfLoadedPage In ldoc.Pages
            ' Determine image-related corruptions
            ExtractImage(lPage)
        Next
        ' Determine structural-related corruptions
        ldoc.FileStructure.IncrementalUpdate = False
        ' Save the PDF document
        ldoc.Save(stream)
        ' Close the PDF document
        ldoc.Close(True)
    Catch e As Exception
        isCorrupt = True
    Finally
        ' Dispose of the memory stream               
        stream.Dispose()
    End Try
    Return isCorrupt
End Function

Add the following code in ExtractText() and ExtractImage() methods to determine the corruptions in the PDF document.

private void ExtractText(PdfLoadedPage lPage)
{
    // Extract text
    string text = lPage.ExtractText();
    text = null;
}private void ExtractImage(PdfLoadedPage lPage)
{
    // Extract images
    Image[] image = lPage.ExtractImages();
    if (image != null)
    {
        for (int i = 0; i < image.Length; i++)
            image[i].Dispose();
    }
    image = null;
}

VB.NET

Private Sub ExtractText(lPage As PdfLoadedPage)
    'Extract text
    Dim text As String = lPage.ExtractText()
    text = Nothing
End Sub
 
Private Sub ExtractImage(lPage As PdfLoadedPage)
    'Extract images
    Dim image As Image() = lPage.ExtractImages()
    If image IsNot Nothing Then
        For i As Integer = 0 To image.Length - 1
            image(i).Dispose()
        Next
    End If
    image = Nothing
End Sub

The 100% of corrupted PDF cannot be found using the previously given code snippet.

A complete working sample can be downloaded from PDFSample.zip.

Take a moment to peruse the documentation, where you can find features like text extraction, image extraction and performing incremental update for PDF document.

Refer here to explore the rich set of Syncfusion Essential® PDF features.

Note:

Starting with v16.2.0.x, if you reference Syncfusion® assemblies from a trial setup or from the NuGet feed, include a license key in your projects. Refer to the link to learn about generating and registering the Syncfusion® license key in your application to use the components without a trial message.

Conclusion

I hope you enjoyed learning about how to identify the corrupted PDF document using C# and VB.NET.

You can refer to our WinForms PDF’s feature tour page to know about its other groundbreaking feature representations. You can also explore our WinForms PDF documentation to understand how to present and manipulate data.

For current customers, you can check out our WinForms components from the License and Downloads page. If you are new to Syncfusion, you can try our 30-day free trial to check out our WinForms PDF and other WinForms components.

If you have any queries or require clarifications, please let us know in comments below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?

Yes

Comments (0)

How to identify the corrupted PDF document using C# and VB.NET?

Steps to identify the corrupted PDF document programmatically:

Access denied