rtfx converts RTF files into a generic XML format. It majors on keeping meta data like style names, etc... rather than every bit of formatting. This makes it handy for converting RTF documents into a custom XML format (using XSL or an additional processing step).
[The name used to be 'rtfm' which I changed due to a naming conflict. And here I thought I was being so witty .... ]
RTF features supported: page breaks, section breaks, style names, lists (various types), tables, footnotes, info block, bold, italic, underline, super/sub script, hidden text, strike out, text color, fonts.
The output format has two slight variations. The 'normal' format contains mainly content type data along with formatting that has content relevance (such as bold or underline). The 'presentation' format contains things like fonts in addition to the above. The differences are outlined here.
C:\> rtfx.exe sample.rtf output.xml
$ rtfx sample.rtf output.xml
Source: rtfx-1.1.tar.gz
Source: rtfx-1.0.tar.gz
Binary: rtfx-1.0.zip
Source: rtfx-0.9.6.tar.gz
Binary: rtfx-0.9.6.zip
Source: rtfx-0.9.5.tar.gz
Binary: rtfx-0.9.5.zip
Source: rtfx-0.9.4.tar.gz
Binary: rtfx-0.9.4.zip
Source: rtfx-0.9.3.tar.gz [older versions require sablotron]
Binary: rtfx-0.9.3.zip
Source: rtfx-0.9.2.tar.gz
Binary: rtfx-0.9.2.zip
Source: rtfx-0.9.1.tar.gz
Binary: rtfx-0.9.1b.zip
Source: rtfm-0.9.tar.gz
Binary: rtfm-0.9.zip
You can get a snapshot of the source with git:
$ git clone git://thewalter.net/rtfx.git
Source code repository:
http://thewalter.net/git/cgit.cgi/rtfx/
See the license. (The DOMC and libmba libraries included are under an 'MIT' license. See the source code for details). Contact me when bugs are found.