Articles in this section
Category / Section

How to convert tables in PDF document to Data Table in C#.

4 mins read

The Syncfusion Essential® PDF is a feature-rich and high-performance .NET PDF library that is used to create, read, and edit PDF documents programmatically without Adobe dependencies. At present, there is no support for converting the tables from the PDF document to Data Table. However, you can achieve this using the tabula and Syncfusion PDF library. Refer to the following code.

Steps to convert the tables from the PDF document to Data Table using C# programmatically

1. Create a new C# console application project.

Create Console application

2. Include the following namespaces in the Program.cs file.

C#

using System;
using System.Data;
using System.Diagnostics;

 

3. The following code example shows how to convert the PDF tables to CSV conversion using the Tabula source in C#.

string csvName = fileName.Split('.')[0];
 
ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files (x86)\Java\jre1.8.0_261\bin\java.exe");
 
startInfo.WindowStyle = ProcessWindowStyle.Hidden;
 
//Sets the working directory
startInfo.WorkingDirectory = outputpath;
 
//Using the java dependencies to create a csv file
startInfo.Arguments = "-jar tabula-1.0.2-jar-with-dependencies.jar -p all -o " + 
csvName + ".csv " + fileName;
 
Process currentProcess = Process.Start(startInfo);
 
currentProcess.WaitForExit();
 
string[] files = Directory.GetFiles(outputpath, csvName + ".csv");
if (files.Length > 0)
{
   DataTable res = ConvertCSVtoDataTable(files[0]);
   Console.WriteLine("Extracted table from PDF to DataTable");
   DrawDataTabletoPDF(res);
}

 

4. The following code example shows how to convert the CSV to DataTable using C#.

public static DataTable ConvertCSVtoDataTable(string strFilePath)
{
 
   DataTable dtCsv = new DataTable();
   string Fulltext;
 
   using (StreamReader sr = new StreamReader(strFilePath))
   {
       while (!sr.EndOfStream)
       {
          //read the full file text
          Fulltext = sr.ReadToEnd().ToString(); 
          //split the full file text into rows
          string[] rows = Fulltext.Split('\n'); 
          for (int i = 0; i < rows.Count() - 1; i++)
          {
             //split each row with comma to get the individual values
             string[] rowValues = rows[i].Split(','); 
             {
                if (i == 0)
                {
                   for (int j = 0; j < rowValues.Count(); j++)
                   {
                       //add headers
                       dtCsv.Columns.Add(rowValues[j]); 
                   }
                }
                else
                {
                   DataRow dr = dtCsv.NewRow();
                   for (int k = 0; k < rowValues.Count(); k++)
                   {
                       dr[k] = rowValues[k].ToString();
                   }
                   //add other rows
                   dtCsv.Rows.Add(dr); 
               }
            }
         }
      }
   }
return dtCsv;
}

 

A complete working sample can be downloaded from PdfSample.zip.

In the sample, we are converting the PDF tables into (.csv) file and store it in the Data folder of the sample. Then convert the CSV file data to DataTable using the system assemblies.

Note:

1. If you get an issue while uploading the PDF file and the .csv file is not created in the Data folder, then the problem will be related to the Tabula.

2. Ensure the “tabula-1.0.2-jar-with-dependencies.jar” dependency in the Data folder.

3. Provide the Java installed location properly in the PdfToDataTable() method.

      ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files(x86)\Java\jre1.8.0_261\bin\java.exe");

 

Note:

Starting with v16.2.0.x, if you reference Syncfusion® assemblies from the trial setup or the NuGet feed, include a license key in your projects. Refer to the link to learn about generating and registering the Syncfusion® license key in your application to use the components without a trial message.

 

Conclusion:

I hope you enjoyed learning about how to convert tables in PDF document to Data Table in C#.

 

You can refer to our Flutter PDF feature tour page to learn about its other groundbreaking features and documentation, and how to quickly get started with configuration specifications. You can also explore our Flutter PDF Flutter PDF examples to understand how to create and manipulate data.

For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to explore our other controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied