How to remove HTML tags from an Excel document using C#, VB.NET?
Syncfusion Excel (XlsIO) library is a .NET Excel library used to create, read, and edit Excel documents. It also converts Excel documents to PDF files.
This article explains how to remove HTML tags from an Excel document using
Include the following namespace in the Program.cs file.
C#
using Syncfusion.XlsIO;
VB.NET
Imports Syncfusion.XlsIO
Use the following code snippet for removing HTML tags from an Excel document.
C#
using (ExcelEngine excelEngine = new ExcelEngine())
{
IApplication application = excelEngine.Excel;
application.DefaultVersion = ExcelVersion.Xlsx;
FileStream inputStream = new FileStream(Path.GetFullPath(@"Data/Input.xlsx"), FileMode.Open, FileAccess.Read);
IWorkbook workbook = application.Workbooks.Open(inputStream);
IWorksheet worksheet = workbook.Worksheets[0];
// Get the used range
IRange usedRange = worksheet.UsedRange;
// Iterate through each cell in the range
for (int row = 1; row <= usedRange.LastRow; row++)
{
for (int col = 1; col <= usedRange.LastColumn; col++)
{
IRange cell = worksheet[row, col];
string cellValue = cell.Value?.ToString();
// Check if cell contains value and might have HTML tags
if (!string.IsNullOrEmpty(cellValue) && cellValue.Contains("<"))
{
// Remove HTML tags using regex
string cleanValue = Regex.Replace(cellValue, "<.*?>", string.Empty);
// Set the cleaned value back to the cell
cell.Value = cleanValue;
}
}
}
#region Save
// Saving the workbook
FileStream outputStream = new FileStream(Path.GetFullPath("Output/Output.xlsx"), FileMode.Create, FileAccess.Write);
workbook.SaveAs(outputStream);
#endregion
// Dispose streams
outputStream.Dispose();
inputStream.Dispose();
}
VB.NET
Using excelEngine As New ExcelEngine()
Dim application As IApplication = excelEngine.Excel
application.DefaultVersion = ExcelVersion.Xlsx
Dim workbook As IWorkbook = application.Workbooks.Open("Input.xlsx")
Dim worksheet As IWorksheet = workbook.Worksheets(0)
' Get the used range
Dim usedRange As IRange = worksheet.UsedRange
' Iterate through each cell in the used range
For row As Integer = 1 To usedRange.LastRow
For col As Integer = 1 To usedRange.LastColumn
Dim cell As IRange = worksheet(row, col)
Dim cellValue As String = If(cell.Value, String.Empty)
' Check if cell contains HTML tags
If Not String.IsNullOrEmpty(cellValue) AndAlso cellValue.Contains("<") Then
' Remove HTML tags using Regex
Dim cleanValue As String = Regex.Replace(cellValue, "<.*?>", String.Empty)
cell.Value = cleanValue
End If
Next
Next
' Saving the workbook
workbook.SaveAs("Output.xlsx")
End Using
You can get the complete sample for removing HTML tags from an Excel document from the link below:
I hope you enjoyed learning about how to remove HTML tags from an Excel document using
Take a moment to peruse the documentation, where you can find basic Excel document processing options along with features like import and export data, charts, formulas, conditional formatting, data validation, tables, pivot tables, and protecting Excel documents, as well as PDF, CSV, and image conversions with code examples.
You can refer to our XlsIO’s feature tour page to learn about its other groundbreaking features. Explore our UG documentation and online demos to understand how to manipulate data in Excel documents.
If you are an existing user, you can access our latest components from the License and Downloads page. For new users, you can try our 30-day free trial to check out XlsIO and other Syncfusion components.
If you have any queries or require clarification, please let us know in the comments below or contact us through our support forums, Support Tickets, or feedback portal. We are always happy to assist you!