Articles in this section
Category / Section

How to convert HTML document to plain text in C# and VB.NET?

7 mins read

The Essential® DocIO converts the HTML file into a Word document and vice versa. You can also convert an HTML document to plain text format and vice versa.

In the Word library (DocIO), we use XmlReader for parsing the content from input HTML. So, the input HTML should meet XML standards (have proper open and close tags), even if you specify the XHTMLValidationType parameter as XHTMLValidationType.None.

XHTML Validation

Every HTML content is validated against a Document Type Declaration (DTD), which is a set of markup declarations that define a document type for an SGML-family markup language (GML, SGML, XML, HTML).

XHTML validation types

The following XHTML validation types are supported in Essential® DocIO while importing HTML content.

XHTML validation types

Description

XHTMLValidationType.None

It does not perform any schema validation, but the given HTML content should meet the XHTML 1.0 format.

XHTMLValidationType.Transitional

It allows several attributes within the tags.

XHTMLValidationType.Strict

It does not allow the attributes inside the tag.

 

Steps to convert HTML document to plain text in C#

  1. Create a new C# console application project.

Create new C# console app in WinForms

  1. Install the Syncfusion.DocIO.WinForms NuGet package as a reference to your .NET Framework applications from the NuGet.org.

Install WinForms NuGet packages

  1. Include the following namespaces in the Program.cs file:

C#

using Syncfusion.DocIO;
using Syncfusion.DocIO.DLS;

VB

Imports Syncfusion.DocIO
Imports Syncfusion.DocIO.DLS
  1. Use the following code to convert an HTML document to plain text.

C#

// Loads the HTML document against validation type none
WordDocument document = new WordDocument("Input.html", FormatType.Html, XHTMLValidationType.None);
// Saves the Word document
document.Save("HTMLtoText.txt", FormatType.Txt);
// Closes the document
document.Close();

VB

'Loads the HTML document against validation type none 
Dim document As WordDocument = New WordDocument("Input.html", FormatType.Html, XHTMLValidationType.None) 
'Saves the Word document
document.Save("HTMLtoText.txt", FormatType.Txt)
'Closes the document
document.Close()

 

A complete working example of converting a HTML document to plain text in C# can be downloaded from here.

Input HTML Document as Follows:

Input HTML document

By executing the program, you will get the plain text as follows:

Output Text file

Take a moment to peruse the documentation, where you can find basic Word document processing options along with features like mail merge, merge and split documents, find and replace text in the Word document, protect Word documents, and most importantly, PDF and Image conversions with code examples.

Explore more about the rich set of Syncfusion® Word Framework features.

An online example to protect the Word document from editing using DocIO.

See Also:

Word to HTML and HTML to Word Conversions

Note:

Starting with v16.2.0.x, if you reference Syncfusion® assemblies from a trial setup or from the NuGet feed, include a license key in your projects. Refer to the link to learn about generating and registering a Syncfusion® license key in your application to use the components without a trial message.

 

Conclusion
I hope you enjoyed learning how to convert an HTML document to plain text in C# and VB.NET.
You can refer to our WinForms Word feature tour page to learn about its other groundbreaking features and documentation, and how to quickly get started for configuration specifications. You can also explore our WinForms PDF example to understand how to create and manipulate data.
For current customers, you can check out our components from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our other controls.
If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forums or feedback portal. We are always happy to assist you!
Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied