Tuesday, December 1, 2015

Review of some products for working with MS Word documents using .NET / C#

I didn't want to use MS Automation because it is slow and not server friendly, so I am not covering that here. The solutions below do NOT require MS Word to be installed and they are server friendly.

Open XML Word Processing SDK
This is my first choice for completely free solutions because is relatively easy for basic things and it will likely always be supported by Microsoft. This is the Microsoft SDK (released as Open Source as of version 2.5) for creating and editing MS Word documents in DOCX format. It takes a bit to get used since it mirrors the internal DOCX format. The advantage is that pretty much everything you need to be done here. The disadvantage is that it may take more code than using other packages. The link above has sample code on how to do many common tasks. Here is a good place to start to familiarize yourself with the file formats, what resources are available, etc. To understand the DOCX file format, check this out. The DOCX format is much simpler than the XLSX format. I recommend giving it a try. The learning curve is pretty small for the DOCX file format and there is a lot of examples available. In the end it isn't so different than NPOI (see section below) for DOCX. Here is a list of how to do's.

It is unofficially distributed on NuGet as DocumentFormat.OpenXml, but it is unofficially distributed by someone other than Microsoft (even though it shows the author as Microsoft). If you want the official binaries they can be downloaded here. If you want the source code you can get it on GitHub. This video shows how to build the source code if you want to go that route. If you want to use 2.6 it has been released on NuGet as OpenXMLSDK-MOT. This is the same as official binaries and source code in GitHub (from what I can tell). All options install an assembly called DocumentFormat.OpenXml and if done through NuGet it is added a reference to your project.

If you want to validate that the document you create is valid, check out this code example.

Here is an example of how to do a search and replace in a DOCX file.

Here is the link to the Open Xml Developer website.

There are also tons of example on the Open Xml Power Tools project on GitHub and from NuGet.

Open-Xml-Power-Tools - DocumentAssembler
This is an Open Source module in Open Xml Power Tools that allows you to create templates using .DOCX files as templates, a xml data source, and generating a well formatted DOCX file with the data merged in to the template. The template allows such things as tables, conditions, etc using XPATH-like syntax. Here is a video on how it works, but not a step by step tutorial. Here is a getting started video / tutorial that walks you through using the product. It is important to watch the ending where he talks using <# #> instead of Content Controls in Word as the placeholders. It works similar to most reporting tools, but using MS Word as report definition (template) file and the output being a MS Word document as well. You can download the entire Open-Xml-Power-Tools suite that includes DocumentAssembler from here. It is not meant to be used to create DOCX documents from scratch. It always uses a template file to generate new DOCX files. However, Open-Xml-Power-Tools suite does have the ability to help work with DOCX files. NOTE: It is built on top of (requires it) the Open XML Word Processing  SDK (above).

DOCX
If you really don't want to learn anything about the DOCX format and want a more inuitive way to interact with the DOCX files then this may be a good option if you are okay with alpha software. A simple Open Source project available from codeplex.com or NuGet. It let's you interact with the Word document in an intuitive manor without understanding how Word documents internally work. The examples on his blog are good and good examples in source code. It is easy to use. It is still alpha, but has been around since 2009. It has growing popularity.

Free Spire.Doc
This is a free version with limitations for a professional package called Spire.Doc. It is quite powerful and also does conversions to PDF and many other formats. It does mail merges, etc. Is is nice for small projects that are below the limitations. The API is well thought out and works nicely.

NPOI
This is the .NET implementation of the popular POI Java Project. It is Open Source and totally free and also does MS Excel. The API is a bit low level at time, but works well. I recommend downloaded from github to work with examples. Also, you can install binaries via NuGet. The documentation and examples for XDOC are not really there yet though. Look for XWPF if you want to use DOCX file format. The problem is that it is not the exact same API as the POI Java Project, but it is similar. It is also not as mature and is thus missing some features. It is a bit more abstract that just using the Microsoft packages, but still quite a bit of DOCX format knowledge is needed. In general I found the experience a bit frustrating because of how the project is organized and the poor documentation and very few DOCX examples.

No comments: