How to identify the corrupted PDF document using C# and VB.NET?
Syncfusion Essential® PDF is a .NET PDF library used to create, read, and edit PDF documents. Using this library, you can identify the corrupted PDF document using C# and VB.NET.
The following methods are used to find corrupted PDF documents:
- Syntax issue documents
The corruption can be found by loading the PDF document. It will throw an exception when the document has issues in the cross-table structure and object offset.
- Image related corruptions
These types of issues can be found by extracting images from the PDF document.
- Content and font related corruptions
These types of issues can be found by extracting text from the PDF document.
- Structure related corruptions
These types of issues can be found by loading and saving the PDF document by disabling the IncrementalUpdate property.
Steps to identify the corrupted PDF document programmatically:
- Create a new C# Windows Forms application project.
- Install the Syncfusion.Pdf.WinForms NuGet package as reference to your .NET Framework application from NuGet.org.
- Include the following namespaces in the Form1.Designer.cs file.
C#
using Syncfusion.Pdf; using Syncfusion.Pdf.Parsing; using System.Drawing;
VB.NET
Imports Syncfusion.Pdf Imports Syncfusion.Pdf.Parsing Imports System.Drawing
- Use the following code snippet to identify the corrupted PDF document.
C#
private bool IsCorrupted(string file)
{
bool isCorrupt = false;
// Creates an instance of memory stream
MemoryStream stream = new MemoryStream();
try
{
// Determine syntax issues
PdfLoadedDocument ldoc = new PdfLoadedDocument(file);
foreach (PdfLoadedPage lPage in ldoc.Pages)
{
// Determine content and font-related issues
ExtractText(lPage);
}
foreach (PdfLoadedPage lPage in ldoc.Pages)
{
// Determine image-related corruptions
ExtractImage(lPage);
}
// Determine structural-related corruptions
ldoc.FileStructure.IncrementalUpdate = false;
// Save the PDF document
ldoc.Save(stream);
// Close the PDF document
ldoc.Close(true);
}
catch (Exception e)
{
isCorrupt = true;
}
finally
{
// Dispose of the memory stream
stream.Dispose();
}
return isCorrupt;
}
VB.NET
Private Function IsCorrupted(file As String) As Boolean
Dim isCorrupt As Boolean = False
' Creates an instance of memory stream
Dim stream As New MemoryStream()
Try
' Determine syntax issues
Dim ldoc As New PdfLoadedDocument(file)
For Each lPage As PdfLoadedPage In ldoc.Pages
' Determine content and font-related issues
ExtractText(lPage)
Next
For Each lPage As PdfLoadedPage In ldoc.Pages
' Determine image-related corruptions
ExtractImage(lPage)
Next
' Determine structural-related corruptions
ldoc.FileStructure.IncrementalUpdate = False
' Save the PDF document
ldoc.Save(stream)
' Close the PDF document
ldoc.Close(True)
Catch e As Exception
isCorrupt = True
Finally
' Dispose of the memory stream
stream.Dispose()
End Try
Return isCorrupt
End Function
- Add the following code in ExtractText() and ExtractImage() methods to determine the corruptions in the PDF document.
C#
private void ExtractText(PdfLoadedPage lPage)
{
// Extract text
string text = lPage.ExtractText();
text = null;
}private void ExtractImage(PdfLoadedPage lPage)
{
// Extract images
Image[] image = lPage.ExtractImages();
if (image != null)
{
for (int i = 0; i < image.Length; i++)
image[i].Dispose();
}
image = null;
}
VB.NET
Private Sub ExtractText(lPage As PdfLoadedPage) 'Extract text Dim text As String = lPage.ExtractText() text = Nothing End Sub Private Sub ExtractImage(lPage As PdfLoadedPage) 'Extract images Dim image As Image() = lPage.ExtractImages() If image IsNot Nothing Then For i As Integer = 0 To image.Length - 1 image(i).Dispose() Next End If image = Nothing End Sub
The 100% of corrupted PDF cannot be found using the previously given code snippet.
A complete working sample can be downloaded from PDFSample.zip.
Take a moment to peruse the documentation, where you can find features like text extraction, image extraction and performing incremental update for PDF document.
Refer here to explore the rich set of Syncfusion Essential® PDF features.
Note:
Starting with v16.2.0.x, if you reference Syncfusion® assemblies from a trial setup or from the NuGet feed, include a license key in your projects. Refer to the link to learn about generating and registering the Syncfusion® license key in your application to use the components without a trial message.
Conclusion
I hope you enjoyed learning about how to identify the corrupted PDF document using C# and VB.NET.
You can refer to our WinForms PDF’s feature tour page to know about its other groundbreaking feature representations. You can also explore our WinForms PDF documentation to understand how to present and manipulate data.
For current customers, you can check out our WinForms components from the License and Downloads page. If you are new to Syncfusion, you can try our 30-day free trial to check out our WinForms PDF and other WinForms components.
If you have any queries or require clarifications, please let us know in comments below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!