Category / Section
How to extract text from a PowerPoint presentation?
1 min read
In PowerPoint presentation, text is always associated with shapes. Text can be added, modified, and extracted from auto-shapes like text box, rectangle, oval, partial circle, etc. Use the following code sample to extract text from PowerPoint presentation.
//Load the PowerPoint presentation IPresentation presentation = Presentation.Open("Sample.pptx"); //Text collection to store the extracted text List<string> textCollection = new List<string>(); //Iterate each slide in a presentation foreach (ISlide slide in presentation.Slides) { //Iterate all the shapes in the slide to get the text foreach (IShape shape in slide.Shapes) { //Check the shape is table if (shape is ITable) { ITable table = shape as ITable; //Iterate all the cells in the table and gets the text foreach (IRow row in table.Rows) { foreach (ICell cell in row.Cells) { //Get the text from the cell body string text = cell.TextBody.Text; //Add the extracted text into string collection. textCollection.Add(text); } } } else { //Iterate all the paragraphs in the shape and gets the text foreach (IParagraph paragraph in shape.TextBody.Paragraphs) { foreach (ITextPart textpart in paragraph.TextParts) { //Get the text from the paragraph string text = textpart.Text; //Add the extracted text into string collection textCollection.Add(text); } } } } } //Write the text collection to a text file System.IO.File.WriteAllLines("Sample.txt", textCollection); //Dispose the presentation instance presentation.Close();
You can download the sample here.