How to extract text from a PDF file in C#, VB.NET?
Syncfusion Essential PDF is the .NET PDF library used to create, read, and edit PDF documents. Using this library, you can extract text from PDF document.
Essential PDF supports basic text extraction and layout-based extraction.
Steps to extract text in PDF programmatically:
- Create a new C# console application project.
- Install the Syncfusion.Pdf.WinForms NuGet package as reference to your .NET Framework applications from NuGet.org.
- Include the following namespaces in the Program.cs file.
C#
using Syncfusion.Pdf; using Syncfusion.Pdf.Parsing;
VB.NET
Imports Syncfusion.Pdf; Imports Syncfusion.Pdf.Parsing;
- Use the ExtractText() with true parameter to perform layout based text extraction in the PDF document.
C#
//Extract text from first page string extractedTexts = page.ExtractText(true);
- The following C# and VB.NET code snippets show how to extract text from the PDF document.
C#
//Load an existing PDF Assembly assembly = typeof(Program).GetTypeInfo().Assembly; Stream fileStream = assembly.GetManifestResourceStream("ConsoleApplication.input.pdf"); PdfLoadedDocument loadedDocument = new PdfLoadedDocument(fileStream); //Load first page PdfPageBase page = loadedDocument.Pages[0]; //Extract text from first page string extractedTexts = page.ExtractText(true); //Close the document loadedDocument.Close(true);
VB.NET
'Load an existing PDF Dim assembly As Assembly = GetType(Program).GetTypeInfo().Assembly Dim fileStream As Stream = assembly.GetManifestResourceStream("ConsoleApplication.input.pdf") Dim loadedDocument As PdfLoadedDocument = New PdfLoadedDocument(fileStream) 'Load first page Dim page As PdfPageBase = loadedDocument.Pages(0) 'Extract text from first page Dim extractedTexts As String = page.ExtractText(True) 'Close the document loadedDocument.Close(True)
A complete work sample can be downloaded from Extract-Text-from-PDF-File.zip
The input PDF document is as follows.
By executing the program, you will get the extracted text as in the following console window.
You can go through the documentation, where you will find the basic and layout based text extraction with Essential PDF. Also, the brief details about OCR processing and Image Extraction are available with code examples.
Refer here to explore the rich set of Syncfusion Essential PDF features.
An online sample link to extract text from PDF document.
Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.