Articles in this section
Category / Section

How to extract text from a PDF file in C#, VB.NET?

2 mins read

Syncfusion Essential PDF is the .NET PDF library used to create, read, and edit PDF documents. Using this library, you can extract text from PDF document.

Essential PDF supports basic text extraction and layout-based extraction.

Steps to extract text in PDF programmatically:

  1. Create a new C# console application project. Create empty Console application in Visual Studio
  2. Install the Syncfusion.Pdf.WinForms  NuGet package as reference to your .NET Framework applications from NuGet.org. Install nuget packages
  3. Include the following namespaces in the Program.cs file.

C#

using Syncfusion.Pdf;
using Syncfusion.Pdf.Parsing;

 

VB.NET

Imports Syncfusion.Pdf;
Imports Syncfusion.Pdf.Parsing;

 

  1. Use the ExtractText() with true parameter to perform layout based text extraction in the PDF document.

C#

//Extract text from first page
string extractedTexts = page.ExtractText(true);

 

  1. The following C# and VB.NET code snippets show how to extract text from the PDF document.

C#

//Load an existing PDF
Assembly assembly = typeof(Program).GetTypeInfo().Assembly;
Stream fileStream = assembly.GetManifestResourceStream("ConsoleApplication.input.pdf");
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(fileStream);
 
//Load first page
PdfPageBase page = loadedDocument.Pages[0];
 
//Extract text from first page
string extractedTexts = page.ExtractText(true);
 
//Close the document
loadedDocument.Close(true);

 

VB.NET

'Load an existing PDF
Dim assembly As Assembly = GetType(Program).GetTypeInfo().Assembly
Dim fileStream As Stream = assembly.GetManifestResourceStream("ConsoleApplication.input.pdf")
Dim loadedDocument As PdfLoadedDocument = New PdfLoadedDocument(fileStream)
 
'Load first page
Dim page As PdfPageBase = loadedDocument.Pages(0)
 
'Extract text from first page
Dim extractedTexts As String = page.ExtractText(True)
 
'Close the document
loadedDocument.Close(True)

  

A complete work sample can be downloaded from Extract-Text-from-PDF-File.zip

The input PDF document is as follows. Input PDF text to be extracted

By executing the program, you will get the extracted text as in the following console window. Text extracted from PDF output

You can go through the documentation, where you will find the basic and layout based text extraction with Essential PDF. Also, the brief details about OCR processing and Image Extraction are available with code examples.

Refer here to explore the rich set of Syncfusion Essential PDF features.

An online sample link to extract text from PDF document.

Note:

Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.

 

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied