How to convert tables in PDF document to Excel file
At present, there is no support for converting the tables in PDF document to Excel. However, you can achieve this using the tabula and Syncfusion XlsIo library. Refer to the following code.
PDF to CSV conversion using Tabula source
private byte[] PdfToExcel(string pdffilepath)
{
csvName = fileName.Split('.')[0];
ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files (x86)\Java\jdk1.8.0_131\bin\java.exe");
startInfo.WindowStyle = ProcessWindowStyle.Hidden;
//Sets the working directory
startInfo.WorkingDirectory = outputpath;
//Using the Java dependencies to create CSV file
startInfo.Arguments = "-jar tabula-1.0.2-jar-with-dependencies.jar -p all -o " + csvName + ".csv " + fileName;
Process currentProcess = Process.Start(startInfo);
currentProcess.WaitForExit();
string[] files = Directory.GetFiles(outputpath, csvName + ".csv");
if (files.Length > 0)
{
return ConvertCSVToExcel(files[0]);
}
else { return null; }
}
CSV to Excel conversion
private byte[] ConvertCSVToExcel(string filePath)
{
//Initialize the Excel engine
ExcelEngine excelEngine = new ExcelEngine();
IApplication application = excelEngine.Excel;
//Load the CSV file
IWorkbook workbook = application.Workbooks.Open(filePath);
IWorksheet sheet = workbook.Worksheets[0];
//Sets the worksheet default version
application.DefaultVersion = ExcelVersion.Excel2013;
workbook.Version = ExcelVersion.Excel2013;
string fileName = csvName + ".xlsx";
MemoryStream stream = new MemoryStream();
workbook.SaveAs(stream);
workbook.Close();
excelEngine.Dispose();
//Returns the Excel stream
return stream.ToArray();
}
In the sample, clicking Convert PDF to Excel will convert the PDF file to Excel (.csv) file and store it in Data folder of the sample. Then, clicking Download as Excel will download the converted .csv file as .xlsx file using Syncfusion XlsIo library.
1. If you get an alert PDF document cannot be converted to Excel, while uploading the PDF file and the .csv file is not created in the Data folder, then the problem will be related to the Tabula.
2. Ensure the “tabula-1.0.2-jar-with-dependencies.jar” dependency in Data folder.
3. Provide the Java installed location properly in the PdfToExcel() method.
ProcessStartInfo startInfo = new ProcessStartInfo(@"C:\Program Files(x86)\Java\jdk1.8.0_131\bin\java.exe");