Articles in this section
Category / Section

How to Extract Individual Data in WinForms PDF Viewer?

3 mins read

You can extract individual questions and answers in WinForms PDF Viewer using the `ExtractText` method of `PdfViewerControl` and by performing string manipulation with the extracted text using some predefined set of terms for questions and answers respectively.

For example, if you have the PDF document with the questions and answers in a structure as illustrated in the following screenshot, you can identify the questions by checking if there are any numeric values and the answers by checking the term “Ans” at the beginning of the text.

Sample question and answer

 

You can refer to the following steps for performing the same:

Steps to extract individual questions and answers from a PDF document

 

Step 1: Extract text from the PDF document using `PdfViewerControl`.

C#

string fileText = string.Empty;
 
//Initialize PdfViewerControl
PdfViewerControl pdfViewerControl = new PdfViewerControl();
// Load PDF document.
pdfViewerControl.Load(@"../../Data/sample.pdf");
 
//Extract text from the document
List<TextData> textData = new List<TextData>();
for (int i = 0; i < pdfViewerControl.PageCount; i++)
{   
    //Get text from a particular page at the index `i` 
    string text = pdfViewerControl.ExtractText(i, out textData);
    //Add new line for next page.
    fileText += "\n" + text;
}

 

Step 2: Collect questions from the extracted text.

C#

private void Form1_Load (object sender, System.EventArgs e)
{
   int questionNumber;
   //Check whether the line of text starts with a numeric value
   if (int.TryParse(text[0].ToString(), out questionNumber))
   {
     for (int i = 0; i < text.Length; i++)
     {
        if (text[i].ToString() == ".")
        {
             //Add the line of text to the question collection list
             if (int.TryParse(text.Substring(0, i).ToString(), out questionNumber))
                QuestionCollection.Add(text.Substring(i+questionStartIndex,text.Length-(i+    questionStartIndex)));      
        }
      }
    }
}

 

Step 3: Collect answers from the extracted text.

C#

private void Form1_Load (object sender, System.EventArgs e)
{
   //Check whether the line of text starts with “Ans.”
   if (answer == "Ans.")
   //Add the line of text to the answer collection list
   AnswerCollection.Add(text.Substring(answerStartIndex, text.Length - answerStartIndex));
 }

 

Note:

In the sample, we have used a PDF document with a simple structure as mentioned in the above definition. If you have different structured PDF document, need to make some changes in the sample based on the structure.

 

Refer to the following sample link for the complete code snippet.

ExtractQuestionsAndAnswers

See Also:

Extract text from the predefined rectangle

 

Conclusion

I hope you enjoyed learning about how to extract individual data in WinForms PDF Viewer.

You can refer to our WinForms PDF Viewer feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications. You can also explore our WinForms PDF Viewer example to understand how to create and manipulate data.

For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forumsDirect-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied