Articles in this section
Category / Section

How to open HTML in ASP.NET Core Word and extract image from the URL?

2 mins read

You can convert HTML to Word document and vice versa using Syncfusion .NET Core Word library (Essential DocIO) without Microsoft Word or interop dependencies.

When converting HTML to Word document using .NET Core Word library, the images referred as URL in the input HTML file (“<img src=”https://”>”) are not imported in the Word document. Essential DocIO doesn’t support to download image from Website URL in ASP.NET Core, Xamarin, and Blazor platforms. You can import these images using ImageNodeVisited event in DocIO.

Get the image from URL in the input HTML:

To import the images referred as URL in the input HTML, we suggest you download the image using ImageNodeVisited event in DocIO.

The following code example shows how to hook ImageNodeVisited event while converting HTML to Word document.

C#

//Open the file as Stream
FileStream docStream = new FileStream("Input.html", FileMode.Open, FileAccess.Read);
//Creates a new instance of WordDocument
WordDocument document = new WordDocument();
 
//Hooks the ImageNodeVisited event to download the image from a Website URL
document.HTMLImportSettings.ImageNodeVisited += DownloadImage;
 
//Opens the input HTML document
document.Open(docStream, FormatType.Html);
 
//Unhooks the ImageNodeVisited event after loading HTML
document.HTMLImportSettings.ImageNodeVisited -= DownloadImage;
 
FileStream outputStream = new FileStream("HtmlToWord.docx", FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
//Saves the Word document
document.Save(outputStream, FormatType.Docx);
//Closes the Word document
document.Close();
//Disposes the output stream
outputStream.Flush();
outputStream.Dispose();

 

The following code example shows event handler to download the image from website URL.

C#

/// <summary>
/// Event handler to download the image from website.
/// </summary>
private static void DownloadImage(object sender, ImageNodeVisitedEventArgs args)
{
    //Check whether image src is mentioned as website URL.
    if (args.Uri.StartsWith("https://"))
    {
       WebClient client = new WebClient();
       //Download the image as stream.
       byte[] image = client.DownloadData(args.Uri);
       Stream stream = new MemoryStream(image);
       //Set the retrieved image from the input HTML.
       args.ImageStream = stream;
    }
}

 

Note:

Hook the ImageNodeVisited event before opening the input HTML document and do not dispose the image stream in the event handler. otherwise, image will not be preserved. Internally, DocIO will dispose the image stream.

Take a moment to peruse the documentation, where you can find more information about HTML to Word conversion and vice versa.

Explore more about the rich set of Syncfusion Word Framework features.


Conclusion

I hope you enjoyed learning about how to get image from URL while opening HTML in .NET Core.

You can refer to our .NET Core Word library feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications. You can also explore our .NET PDF example to understand how to create and manipulate data in the .NET File Format Libraries.

For current customers, you can check out our Document processing libraries from the License and Downloads page. If you are new to Syncfusion, you can try our 30-day free trial to check out our .NET Core File Format Libraries and other .NET Core controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forumsDirect-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied