/----------------------------------------------------------------------\ | Title : HTML to text converter and markup remover | | | | File name : example.txt | | File size : 8,263 bytes (approx) | | Create date : 6-Jan-2003 | \----------------------------------------------------------------------/ HTML to text converter and markup removal ========================================= ------------------------------------------------------------------------ [1] [2] 30-day no-risk money-back guarantee [3] ! ------------------------------------------------------------------------ Detagger is a utility that removes some or all of the tags form a HTML file. Detagger makes it easy to extract text from your web pages for use elsewhere, or to tidy up your HTML code to make clean, faster-loading web pages. Detagger can act as a full HTML to text converter, and has a number of options for producing good-looking text file. For example here is the result of converting this page. Detagger can also can act as a markup remover, selectively removing and editing the tags that make up the HTML code in your file. The utility supports wildcards and drag and drop operation, and a console version is available for batch operations, making Detagger well suited to whatever mode of operation you prefer. An API version may interest software developers. Whether you're trying to collate text from multiple sources on the web, or simply looking for some way to remove all the JavaScript, FONT tags and comments from your HTML archives, Detagger is the tool for you. There are a number of evaluation downloads available. Detagger is produced by JafSoft [4] who are the authors of the highly-praised AscToHTM [5] text-to-HTML converter and other text conversion products. Detagger as a HTML-to-text converter ==================================== As an HTML-to-Text converter, Detagger allows you to convert HTML newsletters into a more compact and email-friendly format, helping authors easily maintain HTML and text versions. The program will output the document as text, preserving the marked up headings, lists, tables of the original document and turning them into suitable text formats. Text will be laid out as faithfully as possible to the original document, within the constraints of your chosen page width. There are many formatting options which can be saved in "policy" files so that they may be easily reloaded in later sessions. Detagger allows you to:- - Remove all the HTML tags, using the heading, paragraph and list tags etc. to decide how the text should be formatted - Parse tables and layout the text accordingly. Simple tables can also be converted into comma-delimited (CSV) or tab-delimited data, ready for import into spreadsheets. - Replace hyperlinks by the display text. URLs may either be placed in the main text, or added as an entry in a reference table added at the end of the text. (See the example for this page). - Format the output to your desired page width (may not work when parsing complex tables) - Format any "dialogue" intelligently. This is particularly useful when converting short stories - Replace Image tags by an Image marker. This can be labelled with the Image URL or the ALT attribute text. - Add custom header and footers to the output. These can have merged in data fields such as convert date, title etc. The evaluation version, adds a standard header, in the registered version this is omitted and you can choose to add your own headers. - Convert all HTML entities into the correct characters. You can choose to have 8-bit characters replaced by 7-bit alternatives where available to give greatest compatibility of the output - Support the creation of Unicode text files from advanced HTML character sets. Detagger as a markup remover and tag manipulator ================================================ As a markup remover, Detagger allows you to "tidy up" your HTML code in a number of ways. You simply select classes of tags you want removed, sections of code you want stripped out, or tag manipulations you want performed. Options include:- - remove all non-HTML tags (e.g. MS Office tags) - remove all non-standard tags - remove the ... section - remove all