Articles in this section
Category / Section

How to Extract Text from Images in .NET Core PowerPoint Using C#?

10 mins read
Syncfusion PowerPoint is a .NET Core PowerPoint library used to create, read, and edit PowerPoint presentation programmatically without Microsoft PowerPoint or interop dependencies. Using our Syncfusion library Presentation library and Syncfusion OCR processor library we can be able to extract the text from the text from the images in PowerPoint presentation using C#.

Steps to extract the text from the images in PowerPoint presentation using C#

Step 1: Create a new .NET console application project.

NET_Console

Step 2: Install the Syncfusion.Presentation.Net.Core  and Syncfusion.PDF.OCR.Net.Core NuGet package as a reference to your project from NuGet.org.

nuget-package_open_and_save (1)
OCR (1)
Note: Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, include a license key in your projects. Refer to link to learn about generating and registering Syncfusion license key in your application to use the components without trail message.
Step 3: Include the following namespaces in Program.cs file.
using Syncfusion.Presentation;
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Graphics;
using System.Collections.Generic;
using System.IO;

Step 4: Include the below code snippet in Program.cs to get the images from PowerPoint presentation and add to the memory stream list using C#.

//Open the existing PowerPoint presentation.
using (IPresentation pptxDoc = Presentation.Open(@"../../../Template.pptx"))
{
    List<MemoryStream> pictureStreamList = new List<MemoryStream>();
    //Retrieves the each slide from the Presentation.
    foreach (ISlide slide in pptxDoc.Slides) 
    {
        //Retrieves all the picture from the slide.
        IPictures pictures = slide.Pictures;
        foreach (IPicture picture in pictures)
        {
            pictureStreamList.Add(new MemoryStream(picture.ImageData));
        }
    }
    //Extract text from images using OCR processor.
    ExtractTextFromImages(pictureStreamList);
}

Step 4: Include the below helper code snippet in Program.cs to extract the text from each image stream using C#.

/// <summary>
 /// Extracts text from images using OCR processor.
 /// </summary>
 /// <param name="pictureStreamList">List of picture stream.</param>
 private static void ExtractTextFromImages(List<MemoryStream> pictureStreamList)
 {
     //Inside bin folder, the tessdata folder contains the language data files.
     string tessdataPath = Path.GetFullPath(@"runtimes/tessdata");
     int i = 1;
     //Get each picture and extract its text.
     foreach (MemoryStream imgStream in pictureStreamList)
     {
         //Initialize the OCR processor by providing the path of the tesseract binaries.
         using (OCRProcessor processor = new OCRProcessor())
         {
             //Set OCR language to process.
             processor.Settings.Language = Languages.English;

             //Sets Unicode font to preserve the Unicode characters in a PDF document.
             FileStream fontStream = new FileStream(Path.GetFullPath("../../../ARIALUNI.ttf"), FileMode.Open);
             processor.UnicodeFont = new PdfTrueTypeFont(fontStream, 8);
                    
             //Perform the OCR process for an image stream.
             string ocrText = processor.PerformOCR(imgStream, tessdataPath);

             //Write the OCR'ed text in text file. 
             using (StreamWriter writer = new StreamWriter(Path.GetFullPath(@"../../../OCRText_" + i + ".txt"), true))
             {
                 writer.WriteLine(ocrText);
             }
         }
         //Dispose the image streams.
         imgStream.Dispose();
         i++;
     }
 }

A complete working sample to extract the text from the images in PowerPoint presentation using C# can be downloaded from GitHub.

Conclusion

I hope you enjoyed learning about how to extract text from images in .NET Core PowerPoint using C#.

You can refer to our .NET PowerPoint feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications. You can also explore our .NET PowerPoint example to understand how to create and manipulate data.
For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion, you can try our 30-day free trial to check out our other controls.
If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forumsDirect-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied