Insert HTML into OpenXML Word Document (.Net)

Using OpenXML SDK, I want to insert basic HTML snippets into a Word document.

How would you do this:

  • Manipulating XML directly ?
  • Using an XSLT ?
  • using AltChunk ?

Moreover, C# or VB examples are more than welcome 🙂

How to insert image into header of OpenXML Word document?

My OpenXML Word document generation project requires text, tables, and images. But first, I need a document header with a logo (image) in it. I’ve used the Microsoft example for creating headers and f



Inserting HTML into Word Using OpenXML

I have some HTML stored in a database that I want to insert into a Word document using DocumentFormat.OpenXml. Inspired by the article here, I tried the following code. mainPart.AddAlternativeFormatIm

Add HTML String to OpenXML (*.docx) Document

I am trying to use Microsoft’s OpenXML 2.5 library to create a OpenXML document. Everything works great, until I try to insert an HTML string into my document. I have scoured the web and here is what

OpenXML 2 SDK – Word document – Create bulleted list programmatically

Using the OpenXML SDK, 2.0 CTP, I am trying to programmatically create a Word document. In my document I have to insert a bulleted list, an some of the elements of the list must be underlined. How can

Duplicating Word document using OpenXml and C#

I am using Word and OpenXml to provide mail merge functionality in a C# ASP.NET web application: 1) A document is uploaded with a number of pre-defined strings for substitution. 2) Using the OpenXML S

word Document by OpenXml from infopath form

I have infopath 2010 form. designed as Student information. I want to generate the report as word document using OpenXMl with specified report template. To I designed the form in such away that, All

Programatically filling content controls in Word document (OpenXML) in .NET

I have a really simple word document with Content Controls (all text). I want to loop through the controls, filling them with values from a dictionary. Should be super simple, but something is wrong:

Is it possible to insert pieces of RTF text into a Word document (.docx) using OpenXml?

I’m developing a .NET C# app that needs to create a Word document in which I want to insert different pieces of RTF text which are stored in a database. Does anyone know if it is possible and how this

OpenXML SDK 2.0 vs Aspose for server side word 2007 document generation in .NET

I am going to start a Server side Office automation project in .Net. Below are the key activities that are planned: Create a word document Use a existing word document template having cover page, hea

Streaming Word Doc in OpenXML SDK using ASP.NET MVC 4 gets corrupt document

I am trying to do this on ASP.NET MVC 4: MemoryStream mem = new MemoryStream(); using (WordprocessingDocument wordDoc = WordprocessingDocument.Create(mem, DocumentFormat.OpenXml.WordprocessingDocument

Answers

I’m not sure, what you actually would like to achieve. The OpenXML documents have an own html-like (WordprocessingML) notation for the formatting elements (like paragraph, bold text, etc.). If you would like to add some text to a doc, with basic formatting, than I rather suggest to use the OpenXML syntax and format the inserted text with that.

If you have a html snippet, that you must include into the doc as it is, you can use the “external content” feature of OpenXML. With external content, you can include the HTML document to the package, and create a reference (altChunk) in the doc in the position, where you want to include this. The disadvantage of this solution, that not all tools will support (or support properly) the generated document, therefore I don’t recommend this solution, unless you really cannot change the HTML source.

How to include any content (the wordml) to a openxml word doc is an independent question IMHO, and the answer depends very much on how complex modifications you want to apply, and how big the document is. For a simple document, I would simply read out the document part from the package, obtain it’s stream and load it to an XmlDocument. You can insert additional content to the XmlDocument quite easily, and then save it back to the package. If the document is big, or you need complex modifications in multiple places, XSLT is a good option.

Well, hard to give general advice, because it depends strongly on your input what is best.

Here’s a simple example inserting a paragraph into a DOCX document for each paragraph in an (X)HTML document using OpenXML SDK v2.0 and an XPathDocument:

    void ConvertHTML(string htmlFileName, string docFileName)
    {
        // Create a Wordprocessing document. 
        using (WordprocessingDocument package = WordprocessingDocument.Create(docFileName, WordprocessingDocumentType.Document))
        {
            // Add a new main document part. 
            package.AddMainDocumentPart();

            // Create the Document DOM. 
            package.MainDocumentPart.Document = new Document(new Body());
            Body body = package.MainDocumentPart.Document.Body;

            XPathDocument htmlDoc = new XPathDocument(htmlFileName);

            XPathNavigator navigator = htmlDoc.CreateNavigator();
            XmlNamespaceManager mngr = new XmlNamespaceManager(navigator.NameTable);
            mngr.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");

            XPathNodeIterator ni = navigator.Select("//xhtml:p", mngr);
            while (ni.MoveNext())
            {
                body.AppendChild<Paragraph>(new Paragraph(new Run(new Text(ni.Current.Value))));
            }

            // Save changes to the main document part. 
            package.MainDocumentPart.Document.Save();
        }
    }

The example requires your input to be valid XML, otherwise you will get an exception when creating the XPathDocument.

Please note that this is a very basic example not taking any formatting, headings, lists etc into account.

Here is another (relatively new) alternative

http://notesforhtml2openxml.codeplex.com/