Articles in this section
Category / Section

How to convert tables in PDF document to Data Table in C#.

3 mins read

The Syncfusion Essential® PDF is a feature-rich and high-performance .NET PDF library that is used to create, read, and edit PDF documents programmatically without Adobe dependencies. At present, there is no support for converting the tables from the PDF document to Data Table. However, you can achieve this using the tabula and Syncfusion PDF library. Refer to the following code.

Steps to convert the tables from the PDF document to Data Table using C# programmatically

1. Create a new C# console application project.

Create Console application

2. Include the following namespaces in the Program.cs file.

C#

using System;
using System.Data;
using System.Diagnostics;

 

3. The following code example shows how to convert the PDF tables to CSV conversion using the Tabula source in C#.

string csvName = fileName.Split('.')[0];
 
ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files (x86)\Java\jre1.8.0_261\bin\java.exe");
 
startInfo.WindowStyle = ProcessWindowStyle.Hidden;
 
//Sets the working directory
startInfo.WorkingDirectory = outputpath;
 
//Using the java dependencies to create a csv file
startInfo.Arguments = "-jar tabula-1.0.2-jar-with-dependencies.jar -p all -o " + 
csvName + ".csv " + fileName;
 
Process currentProcess = Process.Start(startInfo);
 
currentProcess.WaitForExit();
 
string[] files = Directory.GetFiles(outputpath, csvName + ".csv");
if (files.Length > 0)
{
   DataTable res = ConvertCSVtoDataTable(files[0]);
   Console.WriteLine("Extracted table from PDF to DataTable");
   DrawDataTabletoPDF(res);
}

 

4. The following code example shows how to convert the CSV to DataTable using C#.

public static DataTable ConvertCSVtoDataTable(string strFilePath)
{
 
   DataTable dtCsv = new DataTable();
   string Fulltext;
 
   using (StreamReader sr = new StreamReader(strFilePath))
   {
       while (!sr.EndOfStream)
       {
          //read the full file text
          Fulltext = sr.ReadToEnd().ToString(); 
          //split the full file text into rows
          string[] rows = Fulltext.Split('\n'); 
          for (int i = 0; i < rows.Count() - 1; i++)
          {
             //split each row with comma to get the individual values
             string[] rowValues = rows[i].Split(','); 
             {
                if (i == 0)
                {
                   for (int j = 0; j < rowValues.Count(); j++)
                   {
                       //add headers
                       dtCsv.Columns.Add(rowValues[j]); 
                   }
                }
                else
                {
                   DataRow dr = dtCsv.NewRow();
                   for (int k = 0; k < rowValues.Count(); k++)
                   {
                       dr[k] = rowValues[k].ToString();
                   }
                   //add other rows
                   dtCsv.Rows.Add(dr); 
               }
            }
         }
      }
   }
return dtCsv;
}

 

A complete working sample can be downloaded from PdfSample.zip.

In the sample, we are converting the PDF tables into (.csv) file and store it in the Data folder of the sample. Then convert the CSV file data to DataTable using the system assemblies.

Note:

1. If you get an issue while uploading the PDF file and the .csv file is not created in the Data folder, then the problem will be related to the Tabula.

2. Ensure the “tabula-1.0.2-jar-with-dependencies.jar” dependency in the Data folder.

3. Provide the Java installed location properly in the PdfToDataTable() method.

      ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files(x86)\Java\jre1.8.0_261\bin\java.exe");

 

 

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied