How to Extract Individual Data in WinForms PDF Viewer?
You can extract individual questions and answers in WinForms PDF Viewer using the `ExtractText` method of `PdfViewerControl` and by performing string manipulation with the extracted text using some predefined set of terms for questions and answers respectively.
For example, if you have the PDF document with the questions and answers in a structure as illustrated in the following screenshot, you can identify the questions by checking if there are any numeric values and the answers by checking the term “Ans” at the beginning of the text.
You can refer to the following steps for performing the same:
Steps to extract individual questions and answers from a PDF document
Step 1: Extract text from the PDF document using `PdfViewerControl`.
C#
string fileText = string.Empty; //Initialize PdfViewerControl PdfViewerControl pdfViewerControl = new PdfViewerControl(); // Load PDF document. pdfViewerControl.Load(@"../../Data/sample.pdf"); //Extract text from the document List<TextData> textData = new List<TextData>(); for (int i = 0; i < pdfViewerControl.PageCount; i++) { //Get text from a particular page at the index `i` string text = pdfViewerControl.ExtractText(i, out textData); //Add new line for next page. fileText += "\n" + text; }
Step 2: Collect questions from the extracted text.
C#
private void Form1_Load (object sender, System.EventArgs e) { int questionNumber; //Check whether the line of text starts with a numeric value if (int.TryParse(text[0].ToString(), out questionNumber)) { for (int i = 0; i < text.Length; i++) { if (text[i].ToString() == ".") { //Add the line of text to the question collection list if (int.TryParse(text.Substring(0, i).ToString(), out questionNumber)) QuestionCollection.Add(text.Substring(i+questionStartIndex,text.Length-(i+ questionStartIndex))); } } } }
Step 3: Collect answers from the extracted text.
C#
private void Form1_Load (object sender, System.EventArgs e) { //Check whether the line of text starts with “Ans.” if (answer == "Ans.") //Add the line of text to the answer collection list AnswerCollection.Add(text.Substring(answerStartIndex, text.Length - answerStartIndex)); }
In the sample, we have used a PDF document with a simple structure as mentioned in the above definition. If you have different structured PDF document, need to make some changes in the sample based on the structure.
Refer to the following sample link for the complete code snippet.
See Also:
Extract text from the predefined rectangle
I hope you enjoyed learning about how to extract individual data in WinForms PDF Viewer.
You can refer to our WinForms PDF Viewer feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications. You can also explore our WinForms PDF Viewer example to understand how to create and manipulate data.
For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.
If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums, Direct-Trac, or feedback portal. We are always happy to assist you!