Articles in this section
Category / Section

How to compare text in two PDF documents?

6 mins read

Syncfusion Essential® PDF is a .NET PDF library used to create, read, and edit PDF document. Using this library, you can compare the text in two PDF documents by text extraction. The resultant PDF document highlight the entire line of changed text.

Steps to compare the text in PDF documents programmatically:

  1. Create a new Windows Forms application project. Create new windows forms application in PDF
  2. Install the Syncfusion.Pdf.Base NuGet package as reference to your .NET Framework application from NuGet.org. install nuget packages in WinForms PDF
  3. Include the following namespace in the Form1.Designer.cs file.

C#

using Syncfusion.Pdf;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Parsing;

 

  1. Add a new button in Form1.Designer.cs to compare the PDF files as follows.
    label = new Label();
    button = new Button();
     
    // Label
    label.Location = new System.Drawing.Point(0, 40);
    label.Size = new System.Drawing.Size(426, 35);
    label.Text = "Click the button to view the compared PDF file generated by Essential PDF";
    label.TextAlign = System.Drawing.ContentAlignment.MiddleCenter;
     
    // Button
    button.Location = new System.Drawing.Point(180, 110);
    button.Size = new System.Drawing.Size(85, 26);
    button.Text = "Compare PDF";
    button.Click += new EventHandler(ComparePDF);
     
    // Create PDF
    ClientSize = new System.Drawing.Size(450, 150);
    Controls.Add(label);
    Controls.Add(button);
    Text = "Create PDF";
    

 

  1. Add the following code in ComparePDF to compare text in two PDF documents.
     //Load the first PDF document
    PdfLoadedDocument loadedDocument = new PdfLoadedDocument("../../Data/Source1.pdf");
     
    
    // Load the second PDF document
    PdfLoadedDocument loadedDocument1 = new PdfLoadedDocument("../../Data/Source2.pdf");

    // Creating the list to store text data in PDF documents
    List<TextData> textData = new List<TextData>();
    List<TextData> textData1 = new List<TextData>();
    List<TextData> maxContainsData = new List<TextData>();
    List<TextData> diff = new List<TextData>();

    for (int i = 0; i < loadedDocument.Pages.Count; i++)
    {
        // Get the page from the first document
        PdfLoadedPage loadedPage = loadedDocument.Pages[i] as PdfLoadedPage;
        // Extract the text from the page of the first document
        string extractedText = loadedPage.ExtractText(out textData);

        // Extract the text from the page of the second document
        string extractedText1 = loadedDocument1.Pages[i].ExtractText(out textData1);

        int minCount = 0;

        // Compare the text data count
        if (textData.Count > textData1.Count)
            maxContainsData = textData;
        if (textData.Count < textData1.Count)
            maxContainsData = textData1;

        if (textData != textData1)
        {
            if (textData.Count == textData1.Count)
                minCount = textData.Count;
            else
            {
                List<int> count = new List<int>();
                count.Add(textData.Count);
                count.Add(textData1.Count);
                minCount = count.Min();
                // Add diff text to the list
                diff.Add(maxContainsData[minCount]);
            }
            for (int j = 0; j < minCount; j++)
            {
                if (textData[j].Text != textData1[j].Text && textData[j].Bounds != textData1[j].Bounds)
                {
                    // Add diff text to the list
                    diff.Add(textData[j]);
                }
            }
        }
        // Highlight the changed text
        foreach (TextData data in diff)
        {
            loadedPage.Graphics.DrawRectangle(PdfPens.Red, PdfBrushes.Transparent, data.Bounds);
        }
    }

    // Save and close the document
    loadedDocument.Save("ComparedPDF.pdf");
    loadedDocument.Close(true);
    loadedDocument1.Close(true);

      //This will open the PDF file so, the result will be seen in default PDF viewer System.Diagnostics.Process.Start("ComparedPDF.pdf");

 

A complete working sample can be downloaded from PDFComparisonSample.zip.

By executing the program, you will get the PDF document as follows. Screenshot of output PDF file in WinForms

Note:

Starting with v16.2.0.x, if you reference Syncfusion&reg; assemblies from a trial setup or from the NuGet feed, include a license key in your projects. Refer to the link to learn about generating and registering the Syncfusion&reg; license key in your application to use the components without a trial message.

 

Conclusion

I hope you enjoyed learning about how to compare text in two PDF documents.

You can refer to our WinForms PDF’s feature tour page to know about its other groundbreaking feature representations. You can also explore our WinForms PDF documentation to understand how to present and manipulate data.

For current customers, you can check out our WinForms components from the License and Downloads page. If you are new to Syncfusion, you can try our 30-day free trial to check out our WinForms PDF and other WinForms components.

If you have any queries or require clarifications, please let us know in comments below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!

 

 

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments
Please  to leave a comment
Access denied
Access denied