How to convert tables in PDF document to Excel file
At present, there is no support for converting the tables in PDF document to Excel. However, you can achieve this using the tabula and Syncfusion XlsIo library. Refer to the following code.
PDF to CSV conversion using Tabula source
private byte[] PdfToExcel(string pdffilepath) { csvName = fileName.Split('.')[0]; ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files (x86)\Java\jdk1.8.0_131\bin\java.exe"); startInfo.WindowStyle = ProcessWindowStyle.Hidden; //Sets the working directory startInfo.WorkingDirectory = outputpath; //Using the Java dependencies to create CSV file startInfo.Arguments = "-jar tabula-1.0.2-jar-with-dependencies.jar -p all -o " + csvName + ".csv " + fileName; Process currentProcess = Process.Start(startInfo); currentProcess.WaitForExit(); string[] files = Directory.GetFiles(outputpath, csvName + ".csv"); if (files.Length > 0) { return ConvertCSVToExcel(files[0]); } else { return null; } }
CSV to Excel conversion
private byte[] ConvertCSVToExcel(string filePath) { //Initialize the Excel engine ExcelEngine excelEngine = new ExcelEngine(); IApplication application = excelEngine.Excel; //Load the CSV file IWorkbook workbook = application.Workbooks.Open(filePath); IWorksheet sheet = workbook.Worksheets[0]; //Sets the worksheet default version application.DefaultVersion = ExcelVersion.Excel2013; workbook.Version = ExcelVersion.Excel2013; string fileName = csvName + ".xlsx"; MemoryStream stream = new MemoryStream(); workbook.SaveAs(stream); workbook.Close(); excelEngine.Dispose(); //Returns the Excel stream return stream.ToArray(); }
In the sample, clicking Convert PDF to Excel will convert the PDF file to Excel (.csv) file and store it in Data folder of the sample. Then, clicking Download as Excel will download the converted .csv file as .xlsx file using Syncfusion XlsIo library.
1. If you get an alert PDF document cannot be converted to Excel, while uploading the PDF file and the .csv file is not created in the Data folder, then the problem will be related to the Tabula.
2. Ensure the “tabula-1.0.2-jar-with-dependencies.jar” dependency in Data folder.
3. Provide the Java installed location properly in the PdfToExcel() method.
ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files(x86)\Java\jdk1.8.0_131\bin\java.exe");