Articles in this section
Category / Section

How to convert tables in PDF document to Excel file

2 mins read

At present, there is no support for converting the tables in PDF document to Excel. However, you can achieve this using the tabula and Syncfusion XlsIo library. Refer to the following code.

PDF to CSV conversion using Tabula source

private byte[] PdfToExcel(string pdffilepath)
        {
csvName = fileName.Split('.')[0];
            ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files (x86)\Java\jdk1.8.0_131\bin\java.exe");
            startInfo.WindowStyle = ProcessWindowStyle.Hidden;
            //Sets the working directory
            startInfo.WorkingDirectory = outputpath;
            //Using the Java dependencies to create CSV file
            startInfo.Arguments = "-jar tabula-1.0.2-jar-with-dependencies.jar -p all -o " + csvName + ".csv " + fileName;
            Process currentProcess = Process.Start(startInfo);
            currentProcess.WaitForExit();
            string[] files = Directory.GetFiles(outputpath, csvName + ".csv");
            if (files.Length > 0)
            {
                return ConvertCSVToExcel(files[0]);
            }
            else { return null; }
        }

CSV to Excel conversion

private byte[] ConvertCSVToExcel(string filePath)
        {
            //Initialize the Excel engine
            ExcelEngine excelEngine = new ExcelEngine();
            IApplication application = excelEngine.Excel;
            //Load the CSV file
            IWorkbook workbook = application.Workbooks.Open(filePath);
            IWorksheet sheet = workbook.Worksheets[0];
            //Sets the worksheet default version
            application.DefaultVersion = ExcelVersion.Excel2013;
            workbook.Version = ExcelVersion.Excel2013;
            string fileName = csvName + ".xlsx";
            MemoryStream stream = new MemoryStream();
            workbook.SaveAs(stream);
            workbook.Close();
            excelEngine.Dispose();
            //Returns the Excel stream
            return stream.ToArray();
        }       

Sample: http://www.syncfusion.com/downloads/support/directtrac/general/ze/PDFToExcel-1833431670https://www.syncfusion.com/downloads/support/directtrac/general/ze/PDFToExcel-1833431670

In the sample, clicking Convert PDF to Excel will convert the PDF file to Excel (.csv) file and store it in Data folder of the sample. Then, clicking Download as Excel will download the converted .csv file as .xlsx file using Syncfusion XlsIo library.

Note:

1. If you get an alert PDF document cannot be converted to Excel, while uploading the PDF file and the .csv file is not created in the Data folder, then the problem will be related to the Tabula.

2. Ensure the “tabula-1.0.2-jar-with-dependencies.jar” dependency in Data folder.

3. Provide the Java installed location properly in the PdfToExcel() method.

      ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files(x86)\Java\jdk1.8.0_131\bin\java.exe");

 

 

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied