XML:Wrench :: How-To convert HTML files to XML


Converting HTML to XML

You can use XML:Wrench to convert an existing HTML File to XML. The convertor will attempt to make legal XML from the HTML without as far as possible changing the content. Whether the conversion is 100% successful depends on the orginal HTML file. After conversion, you can check the file using the XML check well-formed or validate options

To begin a conversion select 'Convert HTML to XML' from the Tools menu. This will display a dialog box similar to the one below.

screenshot

The HTML standard does not care about the case of text inside elements. The XML standard does - usually elements and attributes are required to be lower case.

Some attributes in HTML do not have any value, for example the 'selected' or 'noborder' attributes. This is not legal XML. Check the fix attributes and these will be converted automatically.

In HTML tags don't have to be paired up exactly. For example:

<i> This is italic <b>and</i> bold</b>.

is legal in HTML but forbidden in the XML standard. In addition, many HTML pages often contain mis-matched or incomplete tags. HTML browsers have tended to be very forgiving of illegal HTML and done their best to cover up errors. If you tick the Fix mis-matched tags box, XML:Wrench will attempt to resolve any mis-matched tags.

HTML contains a number of singleton elements that do not have content or may be used without content. These include such elements as P, IMG, BR and HR. If you select an element in the Convert singleton tags box, it will be converted to the form <x />.