How to extract text from a PowerPoint presentation?
Syncfusion® Presentation is a .NET PowerPoint Library used to create, read, edit, and convert PowerPoint files to PDF and images programmatically without Microsoft Office or interop dependencies. Using this library, you can extract text from a PPTX file in .NET using C#.
Steps to extract text from a PPTX file in .NET using C#:
1. Create a new C# .NET Core console application.
2. Install the Syncfusion.Presentation.Net.Core NuGet package as a reference to your .NET Core application from NuGet.org.
- The Syncfusion.Compression library is a dependent package of the PresentationRenderer NuGet package. Once PresentationRenderer gets installed, the Compression library will automatically be installed as its dependent package.
- Starting with v16.2.0.x, if you reference Syncfusion® assemblies from a trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering the Syncfusion® license key in your application to use the components without a trial message.
In PowerPoint presentation, most visible text is associated with shapes, including auto-shapes and placeholders. However, text may also appear inside Tables, SmartArt, Notes pages, Layout slides and Master slides, each of which requires separate handling when extracting text programmatically. Use the following code sample to extract text from PowerPoint presentation.
3. Include the following namespace in the Program.cs file:
using Syncfusion.Presentation;4. This block loads the PowerPoint file, iterates through all slides, extracts text from multiple slide components, and finally writes the extracted text to a text file.
//Load the PowerPoint presentation
IPresentation presentation = Presentation.Open("Sample.pptx");
//Text collection to store the extracted text
StringBuilder textBuilder = new StringBuilder();
// Extract text from all slides
for (int i = 0; i < presentation.Slides.Count; i++)
{
ISlide slide = presentation.Slides[i];
textBuilder.AppendLine($"--- Slide {i + 1} ---");
// Extract text from all shapes in the slide
ExtractText(slide.Shapes as IShapes, textBuilder);
// Extract text from the slide notes body
if (slide.NotesSlide?.NotesTextBody != null)
{
foreach (IParagraph paragraph in slide.NotesSlide.NotesTextBody.Paragraphs)
{
textBuilder.AppendLine(paragraph.Text);
}
}
// Extract text from the slide notes shapes
if (slide.NotesSlide?.Shapes != null)
{
ExtractText(slide.NotesSlide.Shapes as IShapes, textBuilder);
}
// Extract text from the layout slide shapes
if (slide.LayoutSlide?.Shapes != null)
{
ExtractText(slide.LayoutSlide.Shapes as IShapes, textBuilder, true);
// Extract text from the master slide shapes
if (slide.LayoutSlide.MasterSlide?.Shapes != null)
{
ExtractText(slide.LayoutSlide.MasterSlide.Shapes as IShapes, textBuilder, true);
}
}
textBuilder.AppendLine();
}
string extractedText = textBuilder.ToString();
//Write the text collection to a text file
System.IO.File.WriteAllText("Sample.txt", extractedText);
//Dispose the presentation instance
presentation.Close();5. Identifies each slide shape type and routes it to the appropriate text‑extraction logic using recursion.
private static void ExtractText(IShapes shapes, StringBuilder textBuilder, bool ignorePlaceHolder = false)
{
foreach (IShape shape in shapes)
{
if (shape is ITable)
ExtractTextInTable(shape, textBuilder);
else if (shape is ISmartArt)
ExtractTextInSmartArt(shape, textBuilder);
else if (shape is IGroupShape)
ExtractText((shape as IGroupShape).Shapes, textBuilder, ignorePlaceHolder);
else
ExtractTextInShape(shape, textBuilder, ignorePlaceHolder);
}
}
private static void ExtractTextInSmartArt(IShape shape, StringBuilder textBuilder)
{
ISmartArt smartArt = shape as ISmartArt;
if (smartArt == null)
return;
foreach (ISmartArtNode node in smartArt.Nodes)
{
ExtractTextInSmartArtNode(node, textBuilder);
}
}private static void ExtractTextInShape(IShape shape, StringBuilder textBuilder, bool ignorePlaceHolder)
{
if (shape.TextBody == null || (ignorePlaceHolder && (shape as ISlideItem).SlideItemType == SlideItemType.Placeholder))
return;
foreach (IParagraph paragraph in shape.TextBody.Paragraphs)
{
textBuilder.AppendLine(paragraph.Text);
}
}
private static void ExtractTextInTable(IShape shape, StringBuilder textBuilder)
{
ITable table = shape as ITable;
if (table == null)
return;
foreach (IRow row in table.Rows)
{
foreach (ICell cell in row.Cells)
{
textBuilder.AppendLine(cell.TextBody.Text);
}
}
}
private static void ExtractTextInSmartArtNode(ISmartArtNode node, StringBuilder textBuilder)
{
if (node.TextBody != null)
{
foreach (IParagraph paragraph in node.TextBody.Paragraphs)
{
textBuilder.AppendLine(paragraph.Text);
}
}
// Recursively extract text from child nodes
foreach (ISmartArtNode childNode in node.ChildNodes)
{
ExtractTextInSmartArtNode(childNode, textBuilder);
}
}
You can download the sample here.
Take a moment to peruse the documentation, where you can find basic presentation document processing options along with features like clone and merge slides and encrypt and decrypt PowerPoint presentations and most importantly PDF and image conversion with code examples.
Explore more about the rich set of Syncfusion® PowerPoint Framework features.
Conclusion
I hope you enjoyed learning how to extract text from a PPTX file in .NET Core.
You can refer to our .NET PowerPoint Library feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications.
For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.
If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!