Articles in this section
Category / Section

How to extract the text based on the text color?

1 min read

The support to extract text from the PDF document based on the color of the text is not supported directly in the PDF component. But, this can be achieved with the help of the ExtractText method with an option to obtain text along with its format details.

Refer to the following code snippet.

PdfLoadedDocument pdf;private void Form1_Load(object sender, System.EventArgs e)
            //Loads the PDF document 
            pdf = new PdfLoadedDocument(@"Succinctly.pdf");
            textBox1.Text = "Red";
private void button1_Click(object sender, EventArgs e)
            List<TextData> TextFormat = new List<TextData>();
            string text = null;
            //Gets the color by using the name of the color
            Color color = Color.FromName(textBox1.Text);
                MessageBox.Show("Enter the valid color name");
            for (int i = 0; i < pdf.Pages.Count; i++)
                //Gets the PDF page
                PdfPageBase page = pdf.Pages[i];
                //Extracts text with its format  
                string pageTexts = page.ExtractText(out TextFormat);
                for (int j = 0; j < TextFormat.Count; j++)
                    if (TextFormat[j].FontColor.ToArgb() == color.ToArgb())
                        text += TextFormat[j].Text;
            if (text != null)
                MessageBox.Show("The PDF document does not contain " + textBox1.Text + " color text");

Sample link:

Did you find this information helpful?
Help us improve this page
Please provide feedback or comments
Comments (0)
Please sign in to leave a comment