$_$_TITLE The JafSoft text conversion FAQ $_$_CHANGE_POLICY document subject : The JafSoft text conversion FAQ $_$_CHANGE_POLICY Expect contents list : No $_$_CHANGE_POLICY background colour : ffefff $_$_CHANGE_POLICY headings colour : dd00dd $_$_CHANGE_POLICY LINK Definition : "[a2h man]" = "AscToHTM Manual" + "http://www.jafsoft.com/doco/a2hdoco.html" $_$_CHANGE_POLICY LINK Definition : "[a2r man]" = "AscToRTF Manual" + "http://www.jafsoft.com/doco/a2rdoco.html" $_$_CHANGE_POLICY LINK Definition : "[pol man]" = "Policy Manual" + "http://www.jafsoft.com/doco/policy_manual.html" $_$_CHANGE_POLICY LINK Definition : "[tag man]" = "Tag Manual" + "http://www.jafsoft.com/doco/tag_manual.html" $_$_CONTENTS_LIST $_$_BEGIN_IGNORE ** Master copy on VMS ** $_$_END_IGNORE 1.0 Introduction ================ This FAQ is clearly a work in progress. Many of the subjects have no answers as yet. Nevertheless I intend fleshing this out as and when I get time, and I welcome new questions (or prompts to write the answers to questions listed here) for all users. Direct all correspondence to *infojafsoft.com* 1.1 Document conventions Often the answer to a question involves setting a policy value (see the "[Pol man]" for more about policy files). The policy involved will be displayed as : The is the text that will appear in the policy file. This must be *exactly* as shown, no variability in the spelling will be tolerated by the program. If you misspell the policy text (or if it's been changed in a new version), the program will complain that it doesn't recognize the policy. In addition to adding lines to your policy file by hand, the Windows version allows *most* (not all) policies to be set via property sheets. You'll need to locate the equivalent policy on property sheets. More details on policies can be found in the Policy Manual which is included in downloads, but may also be found online at http://www.jafsoft.com/doco/policy_manual.html 1.2 Finding JafSoft software on the web 1.2.1 The home page Currently http://www.jafsoft.com/. Each product has it's own page, e.g. http://www.jafsoft.com/asctohtm/ http://www.jafsoft.com/asctortf/ http://www.jafsoft.com/asctotab/ http://www.jafsoft.com/addlinx/ These are are listed on the products page http://www.jafsoft.com/products/ There is also a .co.uk mirror site. 1.2.2 Online documentation Currently http://www.jafsoft.com/doco/docindex.html. Documentation is usually included with all downloads, either as HTML or as ready-to-convert text. In Windows this will usually be found in the folder c:\Program Files\JafSoft\AscToHTM Documentation available includes : - [a2h man]. Describes the text-to-HTML converter AscToHTM - [a2r man]. Describes the text-to-RTF converter AscToRTF - [pol man]. Describes the use of policy files by the software - [tag man]. Describes the use of a preprocessor and tagging system by the software - This FAQ. If you plan to read one or more of these manuals you'd be best advised to download one of the documentation .zip files. 1.2.3 Keeping track of updates There are update pages at http://www.jafsoft.com/asctohtm/updates.html and http://www.jafsoft.com/asctortf/updates.html Registered users get update notifications by mail. To date all updates have been free to registered users, but we can't guarantee that will always be the case. 1.2.4 Who is the author? 1.2.4.1 John A Fotheringham That's me that is. The program is wholly the responsibility of John A Fotheringham, who maintains it in his spare time. He doesn't make enough to make a living from it (in case you were wondering). 1.2.4.2 JafSoft Limited Although authoring shareware doesn't earn enough that I can give up my day job, I have created a separate company to handle AscToHTM, AscToRTF and all the shareware and other services I have to offer. The company is called JafSoft Limited, and the web site is http://www.jafsoft.com/ 1.2.4.3 Contacting the author Correspondence should be via email to *infosupport.com*. Priority is given to registered users and people who want to pay for development [ :) ], however all correspondence will be answered. 1.2.5 Reporting errors and bugs Despite the best of intentions, bugs do happen, and we're always grateful for anyone who takes the time to report them to us. Please feel free to report all errors and bugs to *infosupport.com*. When you do so please include - a clear description of the problem - which version of the software you are using - a copy of the offending source file (if not too large <50k) - a copy of any policy file being used. - a copy of any .log file generated (save the status messages to file) Please keep any source files small. If the source file is large, try to generate a smaller file that exhibits the same problem. 1.2.6 Requesting changes to the software Feel free to send suggestions for enhancements/changes to *infosupport.com*. A surprising number of features have been added this way although, naturally, I'm happy for people to think these were all my own ideas. Minor changes may slip into the next release if I think they enhance the product. Major changes to the software can be undertaken on a commercial basis by contracting my services from Yezerski Roper Ltd. This option is not for the faint hearted. Don't let the software's $40 price tag persuade you that that's anything but a bargain, my hourly rate is more than that amount, although I can do quite a lot in one hour :-) 1.3 Registration and updates 1.3.1 Registration Registration can be completed online by visiting http://www.jafsoft.com/asctohtm/register_online.html or [[BR]] http://www.jafsoft.com/asctortf/register_asctortf.html Registration is usually completed via a third party registration service (I use a couple) and an on-line download. The registration service will take your payment and then send you download instructions for a fully registered copy. The registration companies can accept payments using a number of methods, but the commonest is credit card. We do not ship software on media at this time. We'd have to double the price and stop our free upgrade policy if we did. That said, one of the registration companies will put the software onto CD and ship it to you for an extra charge. As yet I haven't set this up, but if interested email *infosupport.com* with details. 1.3.2 Update policy To date all updates have been free to registered users. This has been true for both minor and major updates. Over time the price of the software has risen, but no-one has ever had to pay extra. I'd like to continue this policy, but I'm unable to actually guarantee this, especially since I've discovered old registered versions circulating on the Net. 1.4 Other related products by the same author 1.4.1 AscToTab [AscToTab] is a subset of AscToHTM which is dedicated to creating tables from plain text and tab-delimited source files. The software is offered as freeware under Windows and OpenVMS. 1.4.2 AscToRTF AscToRTF is a text-to-RTF converter which uses the same analysis engine as AscToHTM, but which creates Rich Text Format (RTF) files instead. RTF is a format better suited for import into Word and other word processors. [AscToRTF] was released early in 2000 and has received a number of 5-star reviews. 1.4.3 AddLinx A registered user (see "[[GOTO requesting changes to the software]]") contacted me and asked if I had a program that could add hyperlinks to an *existing* HTML file. At the time I didn't, but on examining the software it seemed I had all the bits and pieces necessary to construct such a tool. Within 24 hours I sent him a first attempt at such a utility, and within a few weeks [AddLinx] was born. It's a very rough utility that I haven't spent much time on. It's available as postcard ware. 1.4.4 API versions of the software For those wanting to programmatically integrate the conversion software into their own products, an API has been produced and is available under license. AscToHTM and AscToRTF are written in C++, and an API is available which provides a C++ header file defining the functions available. The software is then provided as a Windows library to be linked against. In the past clients have successfully integrated this with their Java software, on Windows, Linux and Solaris platforms. Although I'm not a Visual Basic programmer myself I'm less sure of how the software could be integrated with VB, although I presume this can be done. Contact *infojafsoft.com* if interested. 1.4.5 Linux versions of the software Linux versions of all programs are planned. The core conversion software is developed as a command line utility, and in this form it ports to Linux reasonably easily. I plan to offer AscToHTM and AsctoRTF as Linux shareware in the near future. 1.5 Document conversion consultancy 1.5.1 Do you offer consultancy? We always like to offer a little help to users just starting out. Once you register you are free to send a typical sample file to the author, who will offer some advice on problems you might encounter and policies you may use. However, for people wanting to do larger conversions (see "[[GOTO What's the largest file anyone's ever converted with AscToHTM?]]") or wanting significant amounts of our time, you will need to buy assistance at consultancy rates. Regrettably this is not cheap, although we feel it's good value for money :) Contact *infojafsoft.com* with details. See also "[[GOTO requesting changes to the software]]" 1.6 Y2K Compliance From time to time I get asked if my products are Y2K compliant. The short answer is "yes it was" :-) 1.7 Status of this FAQ Clearly it's not finished yet. You might even say it's "under construction" :) I've decided to put this on the web in "unfinished" form so that it may be of *some* benefit to people as soon as possible. If you've a particularly urgent need for a question to be answered contact *infosupport.com*, and don't be surprised if your answer ends up in this document. 2.0 Getting the best results ============================ 2.1 General 2.1.1 Three words: consistency, consistency, consistency The software works by analysing your document to determine what "rules" you've used for laying out your file. On the output pass these "rules" (also known as "policies") are used to determine how to categorize each line, and inconsistencies can lead to lines being wrongly treated because they "fail to obey policy". You can greatly help this analysis by being consistent in your formatting. Many of the decisions the software makes can be overridden by changing the "analysis policies" (see "[[GOTO using policy files]]"), but if this becomes necessary it can quickly become hard work (if only because you need to familiarize yourself with these policies), so it's better to avoid this if possible. If you're writing a document with text conversion in mind, bear in mind the following - *use of white space* (see "[[GOTO white space is your friend]]"). In general white space can be used to separate paragraphs, tables and diagrams from normal text and columns of data from each other inside tables. The software *likes* white space :) - *use of tabs*. The software will convert all tabs to spaces on input, assuming that one tab = 8 spaces. This will work fine provided this tab size is correct, or your use of tabs and spaces is consistent. It may not work otherwise, in which case you'll need to tell the software what your tab size is via an analysis policy. - *use of indentation*. The software will calculate the pattern of indentation used in your file, and will output text accordingly. If your use of indentation is inconsistent, then paragraphs will be wrongly broken and headings may not be correctly recognized. - *use of numbering*. The software can spot numbered headings and numbered lists. To avoid confusing the two, the indentation of a given type of heading is tested (although you can disable this test), together with the numbering sequence. The software can tolerate small gaps in numbering, but large gaps will confuse it. - *use of line lengths*. The software will attempt to determine your "page width" and text justification. These are then used to spot short lines (which get a
added) and centred text. The centred text algorithm has problems and so is disabled by default. Try to avoid really long lines, or highly variable line lengths. If you don't, the software is liable to insert
where you don't want them, unless you set the "page width" and "short line length" analysis policies to correct this behaviour. - *avoid confusing the program*. Numbered lists inside numbered sections all at the same level of indentation is a good example. The numbers become ambiguous and errors start to occur. If you must have this, try to set the numbered list at a small offset to the heading, so that the indentation position will distinguish the two. 2.1.2 Make sure your files are "line-orientated" The software reads files line-by-line. On the first pass it will analyse the distribution of line lengths to determine the "page width" of your file. This in turn is used to detect certain features such as centred text and "short lines". Some files, especially those created on PC, do not include line breaks, instead they only have a single break after each paragraph of text. Whilst not a problem in itself, it does somewhat handicap the software's ability to analyse the file. Where possible, you should attempt to save files "with line breaks" to give the software the best chance of understanding how your file is laid out. 2.1.3 Make sure your use of tabs is consistent The software converts all tabs in your source document on the assumption that one tab equals 8 spaces. In fact, the actual tab size is irrelevant provided your use of tabs and spaces is consistent. If it isn't, you may find tables aren't being analysed correctly. You can set the actual Tab Size used in your documents vie the policy line Tab Size: n where n is the number of spaces per tab. 2.1.4 White space is your friend The software attempts to categorize each line into one of a number of types (e.g. heading, bullet point, part of a table etc). Often this analysis is influenced by adjacent lines. For example a line of minus signs can be interpreted as "underlining" a heading, or perhaps as part of a table or diagram. Confusion can occur where different features are close to each other (e.g. an underlined heading immediately followed by a table). In most cases the ambiguity can be reduced or eliminated by adding 1 or 2 blank lines between the objects being confused. The same argument applies to table columns. If two columns get merged together, try increasing the "white space" between by moving them apart. In almost all situations, adding white space to your document will help reduce the likelyhood of analysis errors. 2.1.5 Use a simple numbering system I've seen documents with section numbers like "Section II-3.b". I'm sorry, but at present the software can't recognise such an exotic numbering system. Equally it can't cope Appendices line A-1 etc. [*] If possible, change your section numbers to numbers (like this document), or "underline" all your headings with a row of dashes or equal signs on the next line. The software will understand that much better. [*] From version 4 onwards, there is the ability to recognise headings that start with the same word or phrase (such as Chapter, Appendix, Section etc), so this may offer a solution to you. 2.1.6 Save policies into a policy file The program offers a large number of "policies" to customize the conversion. These policies can be saved in a "policy file", which is simply an ordinary text file (which you may edit by hand if you like). By saving policies into files, you can reload these files the next time you do a conversion, which means you won't need to adjust all the settings again. You can create multiple policy file for different conversions or conversion types. Policy files are described at length in the "[Pol man]". 2.1.7 Add preprocessor commands to your source file The program has it's own built-in preprocessor. This allows you to add special "directives" and "tags" into your source file which tell the program to perform special functions. Examples include the addition of include files into the source, the insertion of contents lists, adding hyperlinks to sections and much much more. An example is the following hyperlink, whereby [[OT]]GOTO Using preprocessor commands[[CT]] is used to provide the link to the named section, such as the one that appears in the next sentence. For more details see "[[GOTO using preprocessor commands]]" The preprocessor is described at length in the "[Tag Man]". 2.2 Using policy files 2.2.1 Saving "incremental" policies When you choose to save your policies to file you will be asked whether you want to save "incremental" policies, or "all" policies. "Incremental" means only those policies loaded from file, or manually adjusted will be written to file. This is recommended as it leaves the program free to make all other adjustments itself. "All" means that all policies will be written to file. This is useful if you want to document or review the policies used, but it is less useful if you want to reload this policy file, as it will fully constrain the program's behaviour. While this may not be a problem when reconverting the same file, it may well be unsuitable when converting new files. 2.2.2 Editing policy files by hand Policy file are just text files with a ".pol" extension. If you think of them like the old Windows .ini files you'll get the idea. This has been done deliberately so that these files can be manually edited in a normal text editor. OpenVMS users actually have no other way of creating policy file, but Windows users can change most (but not all) policies via the GUI. However I recommend that anyone who comes to regard themselves as a "power" user learns how to edit these files. The policy file consists of one policy per line, usually in the form : e.g. Document title : Here's my favourite URLs When entering policy lines you must use the *exact* indicated in the documentation for the policy to be recognized. If I've misspelt anything then tough, you'll have to follow it (but tell me anyway). The one exception to this rule is I've allowed both British and American spelling of colour/color. The allowed will vary from policy to policy. Most policy lines accept a value of "(none)" effectively negating that policy. The order of lines in the file is largely unimportant. If you're editing a .pol file generated by the program (see "[[GOTO generate a .pol file]]") then you'll notice section headings of the form [Hyperlinks] These are purely decorative. That is, they have no significance, and you can ignore them and move the policy lines around, there's no concept of having to place policy lines in the "right" section. As new versions of the software are released policies are moved from one section to another as different grouping expand and appear. As explained above, this usually has no effect on the validity of the .pol file. 2.2.3 Using include files in policy files Policy files may include other policy file as follows include file : ..\policies\Other_policy_file.pol This can be useful if you have multiple policy files but want certain features to be the same. For example I use this to introduce the same link dictionary commands into all my policy file. You could equally put all your colour policies into one file. The "include file" line will have to be manually edited into the .pol file using a text editor.... there is no support currently for setting this via the program itself. NOTE: If you "save" a policy file that has been loaded, then the include file structure will be lost, and all the policies will be output into a single file. 2.2.4 Using a default policy You can make the program use the same policies by default each time it runs. To do this select the policies you want, and then save these to a policy file. Next select the _Settings->Use of Policy Files_ menu option. Check the "Use a default" flag, and select the file you just created. Next time you run the program these policies will be loaded and used for your conversions. Note, you can still reset the policies or load a different file using the options on the Conversion options menu. To stop using a default just clear the "Use a default" flag (you don't need to clear the policy file name). 2.3 Using preprocessor commands 2.3.1 What is the preprocessor? The program has a built-in preprocessor. This will recognize special commands inserted into the source file. These commands can be used to correct analysis errors (e.g. to correctly delimit a table), or to add to the output. For example the TIMESTAMP tag can cause the text "this document was converted on [[OT]]TIMESTAMP[[CT]]" to be output as "this document was converted on [[TIMESTAMP]]". preprocessor commands are of two types *Directives*. These begin with "$_$_" and must be on a line by themselves with the "$_$_" being at the start of the line (i.e. there can be no leading spaces). *Tags*. These take the form [[OT]]TAG [[CT]] and may occur anywhere within your text, but cannot be split over two lines. Some commands may be expressed as either directives or tags. A "[Tag Man]" is also available. 2.3.2 Delimiting tables, diagrams etc The program will attempt to detect tables and diagrams, but sometimes it gets the wrong range for the table, and also diagrams may be interpreted as tables and vice versa. To correct such mistakes, you can bracket the source lines as follows :- $_$_BEGIN_PRE $_$_BEGIN_TABLE ... $_$_END_TABLE $_$_END_PRE or $_$_BEGIN_PRE $_$_BEGIN_DIAGRAM ... $_$_END_DIAGRAM $_$_END_PRE 2.3.3 How do I add my own HTML to the file? You can embed raw HTML in your text file in one of three ways using the preprocessor a) Insert a one-line HTML as follows $_$_HTML_LINE The HTML_LINE and it's arguments must all be on one line. b) Insert a HTML tag as follows [[OT]]HTML [[CT]] The HTML tag must all be on one line. c) insert a section of HTML between two directive lines $_$_BEGIN_PRE $_$_BEGIN_HTML ... lines of HTML, e.g. custom artwork or tables ... $_$_END_HTML $_$_END_PRE For example to enter a anchor point in your text so that you can link to it try $_$_BEGIN_PRE $_$_HTML_LINE $_$_END_PRE To embed an image with a hyperlink you might try $_$_BEGIN_PRE $_$_BEGIN_HTML AscToHTM home page $_$_END_HTML $_$_END_PRE $_$_BEGIN_HTML AscToHTM home page $_$_END_HTML The "$_$_" has to be at the beginning of the line, i.e. not indented as I've shown above. If you look at the program's HTML documentation, and the text used to create it you'll see examples of this and other preprocessors. Indeed if you look at the [[SOURCE_FILE]] for this document you'll see that's exactly how the image on the right was added to *this* document. Future versions of the software will introduce in-line tagging so you can do place LINKPOINTs anywhere in your text. Check your program's documentation for details. 2.3.4 Using standard include files The preprocessor command INCLUDE can be used to include standard pieces of text into your source files. For example $_$_INCLUDE ..\data\footer.inc will include the file "footer.inc" into your source file at this location. Note that the path given must be correct relative to the source file being converted. The contents of the include file simply get "read into" the source. As such they get included in the analysis of the whole document. Include files can be useful to include standard disclaimers or navigation bars to all your pages. For example you could embed HTML to link back to your home page (see "[[GOTO how do I add my own HTML to the file?]]") Of course the same effect could be achieved by using a HTML footer file (see "[[GOTO adding headers and footers]]") or by defining a "HTML fragment" called HTML_FOOTER (see "[[GOTO customizing the HTML created by the software]]"). 2.3.5 Adding Title, keywords etc If you want to add title, keywords and descriptions to your HTML you can do this by embedding special commands in the source file as follows $_$_BEGIN_PRE $_$_TITLE This is the title of my HTML page $_$_DESCRIPTION This page is a wonderful page that everyone should visit $_$_KEYWORDS wonderful, web, page, full, of keywords, that $_$_KEYWORDS everyone, will, want, to search, for $_$_END_PRE The "$_$_" must be the first characters on the line. You can spread the keywords and description over several lines by adding extra $_$_KEYWORD and $_$_DESCRIPTION lines. Note: Most of these commands have equivalent policies, allowing you to set title etc through an external policy file should you prefer. 2.3.6 Adjusting policies for individual files or parts of files You can, if you wish, create one policy file for each file being converted, however this is liable to become a maintenance nightmare. If you don't want to maintain multiple policy files, or if you simply want to adjust a few policies for a given source file, you can use the $_$_CHANGE_POLICY command. The effect will vary according to the type and position of the command. Some policies will affect the whole document, others will only affect the document from that point onwards... it depends on the nature of the particular policy. See the "[Pol man]" for details. For example placing $_$_CHANGE_POLICY background colour : #FF0000 $_$_CHANGE_POLICY text colour : White will change the document background colour to be red, and the text to be white throughout the whole document. 2.4 Making the program run faster You can make the program run faster in a number of ways by disabling features that you know you don't want. 2.4.1 Review the "look for" options As of V3.1, AscToHTM has a number of "look for" options, stating what the program is looking for. Disable the ones you don't want, although most of them will not make a major difference to the program speed. 2.4.2 Don't convert URLs Probably the single most expensive function is the search for URLs to convert into hyperlinks. Every word (and every word fragment) has to be checked individually. The problem isn't helped by having to distinguish URLs with commas in them from comma separated lists of URLs. If you know your document has *no* URLs to be converted, disable this feature and watch the software run 10-20% faster. However this is one feature of the software that people like most. 2.4.3 Don't generate tables The software will attempt to convert regions of pre-formatted text into tables. This can take a lot of analysis even if eventually it decides "it's not a table after all!". This only comes into effect if the program detects preformatted text, so you should only disable this feature if your pre-formatted text is largely non-tabular. If that's the case you probably want to disable this anyway as the tables created may be inappropriate. 3.0 Conversion Questions ======================== 3.1 General 3.1.1 How do I get rid of the "nag" lines? Easy. You register the software (see "[[GOTO registration and updates]]"), or you remove them by hand. "Nag" lines only appear in unregistered trial copies of the software. If you register, these are removed. 3.1.2 My file has had it's case changed and letters replaced at random by numbers. How do I fix that? Easy. You register the software (see "[[GOTO registration and updates]]"). The case is only adjusted in unregistered trial copies of the software, either after the line limit is reached, or after the 30 day trial has expired. The case is adjusted so that you can still evaluate the conversion has produced the right type of HTML, but since the text is now all in the wrong case and had letters substitutes the HTML is of little use to you. This is intended as an incentive to register. That said, you *will* find pages on the web that have been converted in this manner. 3.1.3 Why do I sometimes get
markup? How do I stop it? The program is detecting a "definition". Definitions are usually keywords with a following colon ":" or hyphen "-", e.g. "text:" You can see this more easily if you go to Output-Style and toggle the "highlight definition term" option... the definition term (to the left of the definition character) is then highlighted in bold. If the definition spreads over 2 "lines", then a definition paragraph is created, giving the effect you see. If you have created your file using an editor that doesn't output line breaks then only long paragraphs will appear to the program as 2 or more "lines". In such cases only the longer paragraphs will be detected as "definition paragraphs", the rest are detected as "definition lines", even though they're displayed in a browser as many lines. If you view the file in NotePad you'll see how the program sees it. To stop this you have a number of options. (1) _Analysis policies -> What to look for -> Look for definitions switch this off. This will stop *all* attempts to spot "definition" lines (2) _Analysis policies -> Analysis -> recognize colon (:) characters_ switch this off. This will stop anything with a colon (:) being recognized as a definition. (3) _Output policies -> Style -> Use
markup for paragraphs_ disable this. The definitions will still be recognized, but the
markup won't be used. 3.1.4 Why are some of my words being broken in two? Sometimes AscToHTM will produce HTML with words broken - usually over two lines. This can happen if your text file has been edited using a program (like NotePad) that doesn't place line breaks in the output. AscToHTM is line-orientated (see 2.1.2). Programs like NotePad place an entire paragraph on a single "line", or on lines of a fixed length (e.g. 1000 characters). AscToHTM places an implicit space at the end of each line it reads. This is to ensure you don't get the word at the end of one line merged with that at the start of the next. However, in files with fixed length "lines", large paragraphs will be broken arbitrarily, with the result that a space (and possibly a
) will be inserted into the middle of a word. You can avoid this by breaking your text into smaller paragraphs, passing your file through an editor that wraps differently prior to conversion, or selecting any "save with line breaks" option you have. 3.1.5 Why am I getting line breaks in the middle of my text? The software will add a line break to "short" lines, or - sometimes - to lines with hyperlinks in them. You can edit your text to prevent the line being short, or you can use policies to alter the calculation of short lines. Use the "[Pol man]" to read about the following policies - "Add
to lines with URLs" - "Look for short lines" - "Short line length " - "Page Width" 3.1.6 Why isn't the software preserving my line structure? Do you mean line structure, or do you really mean paragraph structure? The program looks for "short lines". Short lines can mark the last line in a paragraph, but more usually indicate an intentionally short line. The calculation of what is a short line and what isn't can be complex, as it depends on the length of the line, compared to the estimate with of the page. You have a number of options :- - enable the "Preserve line structure" policy. This will cause your output to exactly match the line structure of your input. - disable the search for short lines using the _Analysis policies -> What to look for_ tab - explicitly set the page width and/or short line length using the _Analysis policies -> analysis_ tab. See also "[[GOTO how do I preserve one URL per line?]]" 3.1.7 Why am I getting lots of white space? Usually because you had lots of white space in your original document. If that is the case, then you can set the policy Ignore multiple blank lines : Yes to reduce this effect. Some people complain that there are blank lines between paragraphs, or between changes in indentation. Often this is the vertical spacing inserted by default in HTML. This can only be controlled in later versions of HTML which support [[TEXT HTML 4.0]] and Cascading Style Sheets (CSS) Occasionally certain combinations of features lead to an extra line of space. 3.1.8 What's the largest file anyone's ever converted with AscToHTM? Well, at time of writing, I know of a 56,000 line file (3Mb) which was converted into a single (4Mb) HTML file. Of course, it was also converted in a suite of 300 smaller, linked, files weighing in at 5Mb of HTML. This file represented 1,100 pages when printed out. I *do* sometimes wonder if anyone ever reads files that big though. 3.1.9 Does the software support hebrew letters / Japanese / Right to Left Alignment ? Since version [[text 4.1]] the short answer is "probably". Although the software has no ability to *understand* documents written this way, and was designed to cope with the ASCII character set, from version [[text 4.0]] onwards it is possible to manually set the "charset" used. This tells the HTML browser how to interpret the characters. Whether or not you see the page correctly then depends on the browsers and fonts installed on the viewer's machine. In version [[text 4.1]] some auto-detection of character sets has been added. This can usually detect which character encoding is being used. You can switch this behaviour off should you wish, and you can also set the correct charset by hand. See the policies "Character encoding" and "Auto-detect character encoding". 3.1.10 Why does the program hang after a conversion? Under Windows the software usually tries to display the results files in your browser or viewer of choice. To prevent multiple instances of the browser being launched, DDE is used. DDE is a Windows mechanism that allow requests to be passed from one program to another, in this case the software is asking the browser to display the HTML just created. Some users have reported problems with DDE - especially under Windows Millenium. When this occurs any program - including AscToHTM - will hang whenever it attempts to use DDE... you notice it first with AscToHTM because it uses DDE all the time. When this happens you will need to use the Task Manager to kill the program. You can solve this problem by using the _Settings -> Viewers for results_ to disable the use of DDE. From version 4 onwards the software will detect when this has happened, and will disable its use of DDE next time it is run. You can re-enable this (e.g. after a reboot has cleared the problem) under the _Settings->Viewers_ menu option. Note, this is a workaround and not a solution. When DDE stops working on your system other programs sill have problems, e.g. when you click on a hyperlink inside your email client. Sadly I don't know a solution for the DDE problem. Sometimes rebooting helps - initially at least - sometimes stopping a few applications helps. Sometimes it doesn't. :-( 3.2 What the software _can't_ do 3.2.1 Why doesn't it convert Word/Wordperfect/RTF/my favourite wp documents? Because it wasn't designed to. No, really. The software is designed to convert *ASCII* text into HTML. That is plain, unformatted documents. Word and other wp packages use binary formats that contain formatting codes embedded in the text (or in some cases the text is embedded in the codes :-). Even RTF, which is a text file, is so heavily full of formatting information that it could not be perceived as normal text (look at it in Notepad and you'll soon see what I mean). Why the omission? Well, like I said, that was *never* the intention of this program. I always took the view that, in time, the authors of those wp packages would introduce "export as HTML" options that would preserve all the formatting, and in general this is what has happened. To my mind writing such a program is "easy". My software tackles the much more difficult task of inferring structure where none is explicitly marked. In other words trying to "read" a plain text file and to determine the structure intended by the author. See also "[[GOTO Do you have a html-to-text converter, rtf-to-html converter etc?]]". 3.2.2 How can I use DDE with Netscape 6.0? You can't. Unlike Netscape versions up to and including 4.7, Netscape 6.0 doesn't support DDE in its initial release under Windows. 3.2.3 Can I use AscToHTM to build a web site with a shopping cart? By itself, no. AscToHTM can only really produce relatively "static", mostly-text web pages. To add any dynamic contents and graphics you'd effectively need to add the relevant HTML yourself, so the answer is essentially "no". Adding a shopping cart is actually fairly tricky. You either have to install the software yourself, or sign up with an ISP that will do this for you. Most such systems require a database (of items being sold). Having not dealt much with such systems myself I can't really advice on a *web authoring* tool (which is what AscToHTM is) that would integrate seamlessly with a shopping cart system. My advice would be to identify an ISP that offers shopping cart functionality and see what methods they offer for web authoring. I wish you luck. 3.2.4 How do I interrupt a conversion? At present you can't. The windows version won't respond to stimulus while a conversion is in progress, meaning that the windows will not refresh. Normally this isn't a problem, but in large conversions this can be a little disconcerting. Fixing this is on the "to do" list. 3.3 Tables 3.3.1 How does the program detect and analyse tables? Here's an overview of how the software works, this will give you a flavour for the complexity of the issues that need to be addressed. The software first looks for pre-formatted regions of text. It does this by 1) Spotting lines that are clearly formatted, looking for large white space and any table-like characters like '|' and '+'. If may also look for code-like lines and diagram-like lines according to the policies set. 2) Each time a heavily formatted line is encountered an attempt is made to extend the preformatted region by "rolling it out" to adjacent, not so clearly formatted lines 3) This "roll out" process is stopped whenever it encounters a line that is clearly not part of the formatted region. This might be a section heading or a set of multiple blank lines (the default is 2). Once a preformatted region is identified, analysis is performed to see whether this is a table, diagram, code sample or something else. This decision depends on 4) The mix of "graphics" characters as opposed to "text" characters 5) The presence of "code-like" indicators like curly brackets, semi-colons and "++" and other special character sequences. Note, the software doesn't understand code syntax, it just recognises commonly used character combinations. 6) How well the data can be fitted into columns of a table (below) If nothing fits then this text is output "as normal", expect that the line structure is preserved to hopefully retain the original meaning. If the software decides a table is possible, it 7) Characterizes the contents of each character position. So for example a character position that contains mostly blank characters on each line is a good candidate for a column boundary. 8) Infers from the character positions the likely column boundaries Once a tentative set of column boundaries has been identified, the following steps are repeated 9) Place all text into cells using the current column boundaries 10) Measure how "good a fit" the text is to the columns, looking for values that span column boundaries, or columns that are mostly "empty" 11) Eliminate any apparently "spurious" columns. For example "empty" columns may get merged with their neighbours. Finally, having settled on a column structure the software 12) Tries to identify the table header, preferably by detecting a horizontal line near the top of the table. 13) Tries to work out column alignments etc. If the cell contents are numeric the cell will be right aligned, otherwise the placement of the text compared to the detected boundaries will be observed 14) Identifies how many lines goes into each row. If blank lines or horizontal rules are present, these may be taken as row boundaries. 15) places all text into cells, using the configuration found. Naturally any one of these steps can go wrong, leading to less than perfect results. The program has mechanisms (via policies and preprocessor commands) to a) Influence the attempt to look for tables b) Influence the attempt to extend tables (steps (1)-(3)) c) Influence the decision as to what a preformatted region is (steps (4)-(6)) d) Influence the column analysis (steps (7)-(11)) e) Influence the header size and column alignment (steps (12)-(15)) Read the table sections in the "[Tag Man]" and "[Pol man]" for more details. 3.3.2 Why am I getting tables? How do I stop it? The software will attempt to detect regions of "pre-formatted" text. Once detected it will attempt to place such regions in tables, or if that fails sometimes in
...
markup. Lines with lots of horizontal white space or "table characters" (such as "|". "-". "+") are all candidates for being pre-formatted, especially where several of these lines occur. This often causes people's .sigs from email to be placed in a table-like structure. You can alter whether or not a series of lines is detected as preformatted with the policies Look for preformatted text : No Minimum automatic
 size      : 4

The first disables the search for pre-formatted text completely.  The second 
policy states that only groups of 4 or more lines may be regarded as 
preformatted.  That would prevent most 3-line .sigs being treated that way.

If you have pre-formatted text, but don't want it placed in tables (either
because it's not tabular, or because the software doesn't get the table analysis
quite right), you can prevent pre-formatted regions being placed in tables via
the policy

      Attempt TABLE generation          : No


3.3.3 Why am I _not_ getting tables?

First read "[[GOTO how does the program detect and analyse tables?]]" for an 
overview of how tables are detected.

If you're not getting tables this is either because they are not being detected,
or that having been detected they are being deemed to be not "table-like".  Look
at the HTML code to see if there are any comments around your table indicating
how it's been processed.

If the table is not being detected this could be because

    - the lines don't look table-like.  Try increasing the white space, or 
      adding a vertical bar '|' as your column separator.
	  
    - some lines are table-like, but the "roll out" isn't including the adjacent
      less formatted lines.  Try changing the policy *Table extending factor*
	  	
    - The detected "table" is too small compared to the value in the policy
      *Minimum automatic 
 size*.
	  	
If all this fails, edit the source to add preprocessor commands around the table
as follows

$_$_BEGIN_PRE
	$_$_BEGIN_TABLE
	...
	...(your table lines)
	...
	$_$_END_TABLE
$_$_END_PRE


3.3.4 Why do my tables have the wrong column structure?

First read "[[GOTO how does the program detect and analyse tables?]]" for an introduction
to how tables columns are analysed.

The short answer is "the analysis went wrong".  Answering *why* it went wrong
is almost impossible to answer in a general way.  Some things to consider

    - Was the table extent correctly calculated?  If adjacent lines were
      wrongly sucked into the table this will affect the analysis.  Try
      adding blank lines around the table, adjusting the "Table extending factor"
      policy, or adding BEGIN_TABLE/END_TABLE preprocessor tags to correct any
      errors in calculating the extent.

Often the table extent is correct, but the analysis of the table has gone 
wrong.

    - Check the text doesn't mix tabs and spaces together in an inconsistent
      manner.  Either set the "Tab size" policy, or replace all tabs by spaces.

    - Look to see if some data just "happens" to line up the blanks.  In some
      small tables this can happen.  Consider adjusting the 
      "Minimum column separation" policy to a value greater than 1.

    - Consider adjusting the "Column merging factor" policy to reduce/increase
      the number of columns produced for the table.

If all this fails you can explicitly *tell* the software what the table layout
by using either the TABLE_LAYOUT preprocessor command, or the "Default TABLE 
layout" policy.  Only use the policy if all tables in the same source file have
the same layout.


3.3.5 Where did all my table lines go?

The software removed them because it thought they would look wrong as 
characters. The lines are usually replaced by a non-zero BORDER value 
and/or some 
tags placed in cells. 3.3.6 How can I get the program to recognize my table header? One tip. If you insert a line of dashes after the header like so... $_$_BEGIN_PRE Basic Dimensions Hole No. X Y ------------------------- 1 3.2500 5.0150 2 1.2500 3.1250 etc..... $_$_END_PRE The program *should* recognize this as a heading, and modify the HTML accordingly (placing it in bold). Alternatively you can tell the program (via the policy options or preprocessor commands) that the file has 2 lines of headers. 3.3.7 Why am I getting strange COLSPAN values in my headers? (see the example table in 3.3.6) The spanning of "Basic Dimensions" over the other lines can be hit and miss. Basically if you have a space where the column gap is expected the text will be split into cells, if you don't then the text will be placed in a cell with a COLSPAN value that spans several cells. For example $_$_BEGIN_PRE | space aligns with column "gap" v Basic Dimensions Hole No. X Y ------------------------- 1 3.2500 5.0150 2 1.2500 3.1250 etc..... $_$_END_PRE In this case you'd get "Basic" in column 1 and "Dimensions" spanning columns 2 and 3. If you edit this slightly as follows then the "Basic Dimensions" will span all 3 columns $_$_BEGIN_PRE | space no longer aligns with column "gap" v Basic Dimensions Hole No. X Y ------------------------- 1 3.2500 5.0150 2 1.2500 3.1250 etc..... $_$_END_PRE It's a bit of a black art. Sometimes when the table is wrong, it's a good idea to set the BORDER size to 0 (again via the policy options) to make things look not so bad. It's a fudge, but a useful one to know. 3.4 Headings 3.4.1 How does the program _recognize_ headings? The program can attempt to recognize five types of headings: *Numbered headings*. These are lines that begin with section numbers. To reduce errors, numbers must be broadly in sequence and headings at the same level should have the same indentation. Words like "Chapter" may be before the number, but may confuse the analysis when present. *Capitalised headings*. These are lines that are ALL IN UPPERCASE. *Underlined headings*. These are lines which are followed by a line consisting solely of "underline" characters such as underscore, minus, equals etc. The length of the "underline" line must closely match the length of the line it is underlining. *Embedded headings*. These are headings embedded as the first sentence of the first paragraph in the section. The heading will be a single all-UPPERCASE sentence. Unlike the other headings, the program will place these as bold text, rather than using heading markup. You will need to manually enable the search for such headings, it is not enabled by default. *Key phrase headings*. These are lines in the source file that begin with user-specified words (e.g. "Chapter", "Appendix" etc.) The list of words and phrases to be spotted is case-sensitive and will need to be set via the "Heading key phrases" policy. The program is biased towards finding numbered headings, but will allow for a combination. It's quite possible for the analysis to get confused, especially when - headings are centred, rather than at fixed indents. The policy "Check indentation for consistency" should be disabled if this is the case. - headings include the words Chapter, Part etc. You should consider using the "Heading key phrase" policy and disabling the search for numbered headings in such cases. - The numbering system repeats (e.g. Part I, 1,2,3,... Part II, 1,2,3...). Again, consider using "key phrase" and/or underlined heading detection as an alternative. - The file has numbered lists at a similar indentation to the numbered sections. If possible move your numbered lists a few characters to the right of the indentation that headings are expected at. - The file has a large number of capitalised non-heading lines. Manually disable the search for capitalised headings if this happens - The numbering system is "exotic" (e.g. II.3.g) To tell if the program is correctly detecting the headings a) Look at the HTML to see if

,

etc. tags are being added to the correct text. b) If the headings are wrong, check the analysis policies are being set correctly by looking at the values shown under _Conversion Options -> Analysis policies -> headings_ after the conversion. Depending on what is going wrong do one or more of the following :- i) Adjust the headings policy (e.g. to disable capitalised headings) ii) Edit the source to replace centred headings by headings at a fixed indentation. iii) Edit the source so that numbered lists are at a different indentation to numbered sections. iv) If your numbering system is too exotic, edit your source so that all the headings are "underlined" and get the program to recognize underlined, rather than numbered headings. v) If possible consider the use of the "Heading key phrase" policy instead. 3.4.2 Why are my headings coming out as hyperlinks? This is a failure of analysis. The program looks for a possible contents list at the top of the file before the main document (sometimes in the first section). If your file has no contents list, but the program wrongly expects one, then as it encounters the headings it will mark these up as contents lines. To prevent this, set the analysis policy Expect contents list : No to "no". Or add a preprocessor line to the top of your file as follows $_$_BEGIN_PRE $_$_CHANGE_POLICY Expect contents list : No $_$_END_PRE 3.4.3 Why are the numbers of my headings coming out as hyperlinks? Either a failure of analysis, or an error in your document. The software checks headings "obey policy" and are in sequence. If you get your numbering sequence wrong, or if you place the heading line at a radically different indentation to all the others, then the software will reject this as a heading line, in which case the number may well be turned into a hyperlink. If it's an error in your document, fix the error. For example, a common problem is numbered lists inside sections. If the list numbers occur at the same level of indentation as the level 1 section headings, then eventually a number on the list will be accepted as the next "in sequence" header. For example in a section number [[TEXT 3.11]], any list containing the number 4 will have the "4" treated as the start of the next chapter. If section "3.12" is next, the change in section number from 4 will be rejected as "too small", and so all sections will be ignored until section [[TEXT 4.1]] is reached. The solution here is edit the source and indent the numbered list so that it cannot be confused with the true headers, Alternatively change it to an alphabetic, roman numeral or bulleted list. Another possible cause if is the software hasn't recognized this level of heading as being statistically significant. (e.g. if you only have 2 level 4 headings (n.n.n.n) in a large document). In this case you'll need to correct the headings policy, which is a sadly messy affair. 3.4.4 Why are various bullets being turned into headings, and the headings ignored? The software can have problems distinguishing between 1 This is chapter one and 1) This is list item number one. To try and get it right it checks the sequence number, and the indentation of the line. However problems can still occur if a list item on the right number appears at the correct indentation in a section. If possible, try to place chapter headings and list items at different indentations. In extreme cases, the list items will confuse the software into thinking they are the headings. In such a case you'd need to change the policy file to say what the headings are, with lines of the form We have 2 recognized headings Heading level 0 = "" N at indent 0 Heading level 1 = "" N.N at indent 0 (this may change in later versions). 3.4.5 Why are lines beginning with numbers being treated as headings? The software can detect numbered headings. Any lines that begin with numbers are checked to see if they are the next heading. This check includes checking the number is (nearly) in sequence, and that the line is (nearly) at the right indentation. If the line meets these criteria, it is likely to become the next heading, often causing the *real* heading to be ignored, and sometimes completely upsetting the numbering sequence. You can fix this by editing the source so that the "number" either occurs at the end of the previous line, or has a different indentation to that expected for headings. 3.4.6 Why are underlined headings not recognized? The software prefers numbered headings to underlined or capitalised headings. If you have both, you may need to switch the underlined headings on via the policy Expect underlined headings : Yes 3.4.7 Why are only _some_ of my underlined headings not recognized? If the program is looking for underlined headings (see "[[GOTO Why are underlined headings not recognized?]]") then the only reason for this is that the "underlining" is of a radically different length to the line being underlined. Problems can also occur for long lines that get broken. Edit your source to - place the whole heading on one line - make the underlining the *same* length 3.4.8 How do I control the header level of underlined headings? The level of heading associated with an underlined heading depends on the underline character as follows:- '****' level 1 '====','////' level 2 '----','____','~~~~' level 3 '....' level 4 The actual *markup* that each heading gets may depend on your policies. In particular level 3 and level 4 headings may be given the same size markup to prevent the level 4 heading becoming smaller than the text it is heading. However the _logical_ different will be maintained, e.g. in a generated contents list, or when choosing the level of heading at which to split large files into many HTML pages. 3.4.9 Why are only the first few headings are working? A couple of possible reasons :- - a numbered list is confusing the software. This is the same problems as "[[GOTO why are the numbers of my headings coming out as hyperlinks?]]" - Some of your headings are "failing" the checks applied. See the discussion in "[[GOTO how does the program recognize headings?]]" One of the reasons for "failure" is that - for consistency - headings must be in sequence and at the same indentation. This is an attempt to prevent errors in documents that have numbers at the start of a line by chance being treated as the wrong headings. If some headings aren't close enough to the calculated indent then they won't be recognised as headings. If a few headings are discarded then later headings that *are* at the correct indentation are discarded as being "out of sequence". If you're authoring from scratch then the easiest solution is to edit all the headings to have the same indent. Alternatively disable the policy "Check indentation for consistency". 3.5 Hyperlinks 3.5.1 Why doesn't it correctly parse my hyperlinks? The software attempts to recognize all URLs, but the problem is that - especially near the end of the URL - punctuation characters can occur. The software then has difficulty distinguishing a comma separated list of URLs from a URL with a series of commas in it (as beloved at C|Net). This algorithm is being improved over time, but there's not much more you can do than manually fix it, and report the problem to the author who will pull out a bit more hair in exasperation :) 3.5.2 Why doesn't it recognize my favourite newsgroup? To avoid errors the program will only recognize newsgroups in the "big 7" hierarchies. Otherwise filenames like "command.com" might become unwanted references to fictional newsgroups. This means that uk.telecom won't be recognized, although if you place "news:" in from of it like this news:uk.telecom then it is recognized. If you want to make "uk." recognized as a valid news hierarchy, then set the policy recognized USENET groups : uk Then any work beginning "uk." may become a newsgroup link. 3.5.3 Why are only some of my section references becoming hyperlinks? The program will only convert numbers that match known numbered sections into hyperlinks. If the number is a genuine section heading, then the chances are that this level of heading has not been detected. This has happened in large documents which contained only 2 level 5 headings. In such document you may need to manually add the extra level to your policy file. Another limit is that the program won't convert level 1 heading references, because the error rate is usually two high. For example if I say "1, 2, 3" it's unlikely I want this to become hyperlinks to chapters 1, 2 and 3. 3.5.4 Why are some numbers becoming hyperlinks? In a numbered document numbers of the form n.n may well become hyperlinks to that section of the document. This can cause "Windows [[TEXT 3.1]]" to become a hyperlink to section 3.1 if such a section exists in your document. You can either insert some character (such as "V" to make "V3.1"), place the number inside a protective pre-processor TEXT tag as follows [[OT]]TEXT [[TEXT 3.1]][[CT]] or disable this feature entirely via the policy Cross-refs at level : 3 (which means only "level 3" headings such as n.n.n will be turned into links, or Cross-refs at level : (none) which should disable the behaviour. 3.5.5 Why are some long hyperlinks not working? The software will sometimes break long lines to make the HTML more readable. If this happens in the middle of a hyperlink, the browser reads the end of line as a space in the URL. You can fix this by editing the output text so that the HREF="" part of the file is all on the same line. This "feature" may be fixed in later versions of AscToHTM. 3.5.6 How do I preserve one URL per line? Some files contain lists of URLs, with one URL per line. By default the software will not normally preserve this structure because long lines are usually concatenated into a single paragraph. You can change this behaviour using the option on the _Output policies -> Hyperlinks_ policy sheet. See also "[[GOTO why isn't the software preserving my line structure?]]" 3.6 Policy files 3.6.1 How many policies are there? Where can I read more about individual policies? First time I looked it was nearly 200, recently the number is approaching 250. They kind of sneak up on you, I guess. The "[Pol man]" gives a pretty comprehensive description of what each one does and where it can be found. Last time I checked that file was 5000 lines of text before conversion to HTML. People complain that there are too many policies, but then they say "couldn't you add an option to ...", and so it goes. Organizing these policies in a logical manner is a fairly difficult problem, and if anyone has any bright ideas I'm listening. In recent versions I added overview policies to make things easier to locate or to switch off en masse. 3.6.2 My policy file used to work, but now it doesn't. Why? Make sure you're using an "incremental" policy file, rather than a full one. You can do this by viewing the .pol file in a text editor. An "incremental" policy file will only contain lines for the policies you've changed. A full policy file will contain all possible policies. If you load a "full" policy file you prevent the program intelligently adjusting to the particular file being converted. If this happens either edit out the lines you don't want from your policy file, or reset the policies to their defaults and create a new policy file from scratch. NOTE: There used to be a bug whereby sometimes a policy would inadvertently get saved as a "full" file. that should be fixed now. 3.6.3 xxxx Policy is not taking effect. What shall I do? (see 1.7) 3.7 Bullets and lists 3.7.1 Why is the indentation wrong on follow-on paragraphs? The program can't distinguish between indented paragraphs and paragraphs that are intended as follow-on paragraphs from some bullet point or list item. This means that whilst the first paragraph (the one with the bullet point) is indented as a result of being placed inside appropriate list markup, the second and subsequent paragraphs are just treated as indented text. The bullet point will be indented as one level deeper than the text position of the bullet. The follow-on paragraph will be indented according to it's own indentation position compared to the prevailing documentation pattern. Ideally this will be one level deeper than the text position of the bullet. Occasionally the two result in different indentations. The solutions are either to a) Review your *indent position(s)* policy with a view to adjusting the values to give the right amount of indentation to the follow-on paragraphs. Sometimes adding an extra level to match the indentation of the follow-on paragraph is all that's necessary. b) Edit your source text slightly, adjusting the indent of either the list items or follow-on paragraphs until the two match. 3.7.2 Why is the numbering wrong on some of my list items? HTML doesn't allow the numbering to be marked up explicitly. Instead you can only use a START attribute in the
    tag to get the right first number which is incremented each time a
  1. tag is seen. Some browsers don't implement the START attribute, and so they always restart numbering at 1. There's not much I can do about this problem. I've also seen a bug in Opera V3.5 where any tag (such as ) placed between the
      and the
    1. causes the numbering to increment. That shouldn't be a problem here, as that's illegal HTML markup - and we try very hard not to generate any of that! 3.7.3 Some of my text has gone missing. What happened? There's a bug (in Opera), where a tag between the
        and
      1. tag causes all that text to not be displayed. That shouldn't be a problem here, as that's illegal HTML markup - and we try very hard not to generate any of that! If there's any other problem of this sort please email *infosupport.com* with details. 3.8 Contents List generation 3.8.1 How do I add a contents list to my file? There are a number of ways:- - If the file already has a contents list this may be detected if the sections are numbered, and the contents line will be turned into links to the sections concerned. - You can forced the addition of a contents list using the policies under the menu at _Conversion Options -> Output Policies -> Contents List_ A hyperlinked contents list will be generated from the headings that the program detects. This list will be placed at the top of the first file. - If you don't want the generated list to be placed at the top of the file, insert the preprocessor command $_$_CONTENTS_LIST at the location(s) you want. This command takes arguments that allow a limited number of formatting options. It can also be limited in scope, so you can, if you wish, add a $_$_CONTENTS_LIST to each chapter in your document. 3.8.2 Why doesn't my contents list doesn't show all my headings? First read "[[GOTO how does the program recognize headings?]]". If you're generating a contents list from the observed headings, then any missing headings are either because a) The program didn't recognize the headings b) The policy *Maximum level to show in contents* has been set to a value that excludes the desired heading. If you're converting an in-situ contents list, then only (a) is likely to apply, in which case you need to ensure the program recognizes your headings. 3.8.3 Some of my contents hyperlinks don't work! There used to be a problem whereby the software would add hyperlinks to sections that didn't exist, or would point to the wrong file when a large file was being split into many smaller files. Both problems should now be fixed, so if you encounter this problem, contact *infosupport.com*. 3.9 Emphasis 3.9.1 Why didn't my emphasis markup work? Emphasis markup can be achieved by placing asterisks (*) or underscores (_) in pairs around words or phrases. The matching pair can be over a few lines, but cannot span a blank line. Asterisks and underscores can be nested. Asterisks generate *bold markup*, underscores generate _italic markup_, and combining these generates _*bold, italic markup*_. If you wrap a phrase in underscores, and replace and replace all the spaces by underscores [[TEXT _like_this_]] then the result will be underlines _like_this_ and not in italics. The algorithm copes reasonably well with normal punctuation, but if you use some unanticipated punctuation, it may not be *recognized*!&%@! You can have a _phrase that spans a couple of lines that contains *another phrase of a different type* in the middle of it_, but you can't have two phrases of the same type nested that way. Be reasonable :-) Phrases that span a blank line are not permitted. You'll need to end the markup before the blank line, and re-start it afterward. This is to reduce the chances of false matches. 3.10 Link Dictionary 3.10.1 What is the Link Dictionary? The link dictionary allows you to add hyperlinks to particular words or phrases. You can choose the phrase to be matched, the text to be displayed and the URL to be linked to. This can help when building a site by converting multiple text files. For example the whole www.jafsoft.com site is built from text files, and extensive use of a link dictionary is made to add links from one page to another. 3.10.2 My links aren't coming out right. Why? Known problems include - if the "match text" matches part of the URL the program may get confused. Try to keep them different. - if the "match text" of one link is a substring of another the program will get confused - if a link is repeated on the same line on the first occurrence is converted (fixed post V3.0) - if the "match text" spans two lines it won't be detected. One tip is to place brackets round the [match text] in your source file... this not only makes the chances of a false match less likely, but also makes it clearer in the source files where the hyperlinks will be. 3.10.3 I can't enter links into the Link Dictionary. What gives? The Link Dictionary support in the Windows version of the software is a little quirky. Apologies for that. The way it should work is that you click on "add new link definition", button. I realize now that this is counterintuitive, and will probably address this in the next release. If you save your policy, each link appears as a line of the form Link definition: "match text" = "display text" + "URL" e.g. Link definition: "jaf" = "John Fotheringham" + "http://www.jafsoft.com/" The whole definition must fit on one line. You may find it easier to open your .pol file in a text editor and add these by hand. 3.11 Batch conversion For more information see the section "Processing several files at once" in the main documentation. The software supports wildcards, and console versions are available to registered users which are better suited for batch conversions. In the shareware versions no more than 5 files may be converted at once. This limit is absent in the registered version (see "[[GOTO what's the most files I can convert at one go?]]"). 3.11.1 How do I convert a few files at once? If you only want a few files converted, then the simplest way is to drag and drop those files onto the program. You can either drag files onto the program's icon on the desktop, or onto the program itself. If you drag files onto the program's icon there is a limit with this approach of around 10 files. This limit arises because the filenames are concatenated to make a command string, and this seems to have a Windows-impose limit of 255 characters. This problem may be solved in later versions. The same limit doesn't seem to apply when you drag files onto the open program. Alternatively you can browse to select the files you want converting. 3.11.2 How do I convert _lots_ of files at once? If you want to convert many files in the same directory, then just type in a wildcard like "*.txt" into the name of the files to be converted. Registered users of the software can get a console version of the software. This can accept wildcards on the command line, and is more suited for batch conversion, e.g. from inside windows batch files (for example it won't grab focus when executed). If you want to convert many files in different directories, either invoke the console version multiple times using a different wildcard for each directory, converting one directory at a time, or investigate the use of a steering command file when running from the command line. See the main documentation for details. 3.11.3 What's the most files I can convert at one go? The largest number of files converted at one time using the wildcard function was reported to be around 2000. A week later someone contacted me with around 3000 files to be converted. A few weeks after that someone was claiming 7000. If you'd like to claim a higher number, let me know. Theoretically the only limit is your disk space. The program operates on a flat memory model so that the memory used is largely independent of the number of files converted, or the size of the files being converted. Such conversions are a testament to the program's stability and efficient use of system resources. That said if possible we recommend you break the conversion into smaller runs you reduce your risks :-) 3.12 File splitting 3.12.1 Why isn't file splitting working for me? The program can only split into files at headings it recognises (see "[[GOTO how does the program recognize headings?]]"). You first need to check that the program is correctly determining where the headings are, and what type they are. Headings can be numbered, capitalised or underlined. To tell if the program is correctly detecting the headings a) Look at the HTML to see if

        ,

        etc. tags are being added to the correct text. b) If the headings are wrong, check the analysis policies are being set correctly. If necessary set them yourselves under _Conversion Options -> Analysis policies -> headings_ Once the headings are begin correctly diagnosed, you can switch on file splitting using the policies under _Conversion Options -> output policies -> file generation_ Note that the "split level" is set to 1 to split at "chapter" headings, 2 to split at "chapter and major section" headings etc. Underlined headings tend to start at level 2, depending on the underline character (see "[[GOTO How do I control the header level of underlined headings?]]") Hopefully this will give you some pointers, but if you still can't get it to work, please mail a copy of the source file (and any policy file you're using) to *infosupport.com* and I'll see what I can advise. 3.13 Miscellaneous questions 3.13.1 How do I suppress the Next/Previous navigation bar when splitting a large document? Prior to version 4 there was a bug which meant the policy "Add navigation bar" was being ignored when splitting files (the only time it was used). This is now fixed. However also available in version 4 is a new "HTML fragments" feature that allows you to customize some of the HTML generated by the software. This includes the navigation bars so that, for example, if you wanted to suppress just the top navigation bar, you could define the fragment NAVBAR_TOP to be empty. See "[[GOTO customizing the HTML created by the software]]" and the "[Tag Man]" for more details. 3.13.2 Why am I getting regions of
         text?
        
        The software attempts to detect pre-formatted text in your files and, when
        it finds some, attempts to turn these into tables.  In many cases having 
        detected some pre-formatted text it recognises that it cannot make a table 
        and so resorts to using 
        ...
        markup instead (in RTF is uses courier font), giving a "mal-formed table" error message. These
         sections actually work quite well for some documents, but in other
        cases they would be better not handled this way.
        
        Happily the solution is simple.  On the menu go to
        
              _Conversion Options -> Analysis policies -> What to look for_
        
        and disable "pre-formatted regions of text".
        
        
        3.13.3 Do you have a html-to-text converter, rtf-to-html converter etc?
        
        No.
        
        My converters convert from *plain ASCII text* into HTML or RTF.  Their
        "unique selling point" is that they intelligently work out the structure
        of the text file.
        
        However *other* people provide other converters.
        
        There are a number of html->text converters on top of which Netscape
        has a good "save as text" feature.  Or you can import the HTML into
        Word and use Word's save as text features (although in my opinion these
        are inferior to Netscape's).
        
        If you visit my ZDNet listing at http://www.hotfiles.com/?000M96 and click
        on the "related links" you'll see a number of converters listed.
        
        There are at least two RTF-to-HTML converters called RTF2HTML and RTFtoHTML
        and of course Word for Windows offers this capability (it doesn't suit 
        everyone though).
        
        In fact, here are four products:-
        
        RTFtoHTML can be found at http://www.sunpack.com/RTF/ [[BR]]
        RTF2HTML can be found at http://www.xwebware.com/products/rtf2html/ [[BR]]
        RTF-2-HTML can be found at http://www.easybyte.com/rtf2html.com [[BR]]
        IRun RTF conveter (free) can be found at http://www.pilotltd.com/irun/index.html
        Yet another Word convetrter can be found at http://www.yawcpro.com/
        
        
        4.0 Adding value to the HTML generated
        ======================================
        
        4.1 Adding Title and Description and Keyword META tags
        
        There are policies that allow Title, Description and keywords to be added to 
        your pages.  
        
        The title will default to "Converted from """, but a number of 
        policies allow the title to be made to adopt the first section title, or 
        any text that you provide.
        
        Alternatively you can use preprocessor commands embedded in the source file
        as follows
        
        $_$_BEGIN_PRE
        	$_$_TITLE This is my lovely HTML page
        	$_$_DESCRIPTION This page was converted from text
        	$_$_DESCRIPTION and this description was added using preprocessor
        	$_$_DESCRIPTION commands
        	$_$_KEYWORDS Converted, from, text
        $_$_END_PRE
        
        This approach is in many ways simpler, as it avoids the need for policy 
        files, and keeps all your source in one file.
        
        
        4.2 Adding other META tags
        
        The program doesn't have a mechanism to explicitly add other META tags,
        however you can still achieve this by using the "script file" feature
        that allows text to be copied into the  section of the document.
        
        You can also use the HEAD_SCRIPT "HTML fragment" in the same way.
        See "[[GOTO customizing the HTML created by the software]]".
        
        Originally indented as a way of adding JavaScript to a document, in fact
        you can place anything you like in such a sections, including  tags.
        In fact the "script" file need not contain any JavaScript at all, so in that
        respect it is mis-named.
        
        See "[[GOTO Adding JavaScript]]"
        
        
        4.3 Adding Headers and Footers
        
        The software will allow you to add headers and footers to each file generated.
        You can do this either through policies or by defining some "HTML fragments"
        The "HTML fragments" method is preferred.  If both policies and fragments
        are defined then the fragments will be used.
        
        You can define the "HTML fragments" HTML_HEADER and HTML_FOOTER (see 
        "[[GOTO customizing the HTML created by the software]]"). 
        
        Alternatively, use the policies concerned are
        
        	HTML header file : c:\include\header.inc
        	HTML footer file : c:\include\footer.inc
        	
        The value is the name of the file to be used (you must supply a full or
        relative path so that the file may be located).
        
        Whether defined by file or as a "HTML fragment", these fragments will be 
        copied into each HTML page generated after the  tag and before 
        the  tag respectively.
        
        If a large file is being split into many smaller HTML files these headers and
        footers will be copied into *every* HTML generated.  This is different to using
        an $_$_INCLUDE statement, which only gets executed once.
        
        These files can be useful to add a standard title in the header, and links to
        other parts of the site (home, contacts etc) in the footer of whatever.
        
        
        4.4 Adding Javascript
        
        There's a limit to how you can add JavaScript to a page generated from text.  
        That said the program will allow you to embed javascript (or indeed anything 
        else, such as META tags) into the ... section of the document.  
        This is the recommended location for including JavaScript as this ensures it 
        is all read before anything is drawn.
        
        You can specify the "script" code to be copied either by defining a
        HEAD_SCRIPT "HTML fragment" (see "[[GOTO customizing the HTML created by the software]]"), 
        or by using the policy 
        
        	HTML script file : ..\scripts\myscript.js
        	
        This should point to a file that contains all the scripting required.  The 
        program will simply copy this text into the header of each HTML page generated.
        
        Using the HEAD_SCRIPT fragment has the advantage that you can place the 
        necessary text into your source file, which avoids the need to manage
        individual policy files and script files.  This would be done as follows
        
        $_$_BEGIN_PRE
              $_$_DEFINE_HTML_FRAGMENT HEAD_SCRIPT
              ..
               tags and any  block
              ..
              $_$_END_BLOCK
        $_$_END_PRE
        
        See the "[Tag Man]" for more about using "HTML fragments".
        
        NOTE:	For the JavaScript to have an effect, you may need to embed 
              	further HTML into the body of the source text.  
        
        See "[[GOTO how do I add my own HTML to the file?]]".
        
        
        4.5 Adding colour/color
        
        A number of policies allow you to choose your document colours.  These can be
        found under the Windows menu
        
        	_Conversion Options -> Output policies -> Document colours_
        	
        and
        
        	_Conversion Options -> Output policies -> Tables_
        	
        All colours should be specified in HTML format, i.e. as 6-character hex values
        in the form rrbbgg.  A few colours like "Red", "White" and "Black" may be
        entered by name.  Wherever possible the program will use the name so as to
        make the HTML more understandable.
        
        If you don't want *any* colours added to your HTML (not even the default white
        background) you can use the policy *Suppress all colour markup*.
        
        For a full list of colour policies, see the "[Pol man]".
        
        
        4.6 Adding images to the HTML
        
        See "[[GOTO how do I add my own HTML to the file?]]" which includes an
        example which is used to add an image to HTML version of this document.
        
        
        4.7 Adding hyperlinks to keywords and phrases
        
        Use the "[[GOTO Link Dictionary]]".
        
        
        4.8 Splitting large documents into sections
        
        The program can only split into files at headings it recognizes.  So first 
        you need to check that the program is correctly determining where your 
        headings are, and what type they are.  See "[[GOTO how does the program recognize headings?]]"
        
        Once the headings are begin correctly diagnosed, you can switch on file
        splitting using the policies under
        
              _Conversion Options -> output policies -> file generation_
        
        Note that the "split level" is set to 1 to split at "chapter" headings, 2 to 
        split at "chapter and major section" headings etc.
        
        Underlined headings tend to start at level 2, depending on the underline
        character (see "[[GOTO How do I control the header level of underlined headings?]]")).
        
        Hopefully this will give you some pointers, but if you still can't get it to
        work, please mail me a copy of the source file (and any policy file you're
        using) and I'll see what I can advise.
        
        
        4.9 Customizing the HTML created by the software
        
        From version 4 onwards AscToHTM will allow you do define "HTML fragments"
        that can be used in place of the standard HTML generated by the program in
        certain situations.
        
        These fragments can be placed in a separate file, which is pointed to by
        the policy "HTML fragments file", or can be included in the source file
        itself.  For example the fragment
        
        $_$_BEGIN_PRE
              $_$_DEFINE_HTML_FRAGMENT HTML_HEADER
              

        Head of each page

        $_$_END_BLOCK $_$_END_PRE defines a centred header and a horizontal ruler that will be placed at the top of each page. This could include a navigational link to your home page, and would be useful when splitting a large document into smaller pages - you'd get the same header on each page. See the "HTML fragments" chapter in the "[Tag Man]" for more details. 5.0 Diagnosing problems for yourself ==================================== The program offers a number of diagnostic aids. These can be awkward to use, but if you want to get a better idea of what's going on these can sometimes help. The various diagnostic options can be accessed via the menu option _Conversion Options -> Output policies -> File generation_ 5.1 Generate a .lis file The program can be made to generate listing files. A fragment is shown below. $_$_BEGIN_PRE 56: 103 |1.2.4 Who is the author? 57: 1 | 58: 104 |1.2.4.1 John A Fotheringham 59: 1 | 60: |That's me that is. The program is wholly the responsibility 61: |Fotheringham, who maintains it in his spare time. 62: 1 | 63: 1 | $_$_END_PRE These show the source lines in truncated form. Each line is numbered, and markers show how the line has been analysed. In this case the line with "Who is the author?" has been allocated a line type of 103 ("header level 3") and is followed by a line of type 1 ("blank"). A complete list of line types and code is included at the end of the file. Three files are generated; a ".lis1" file which is a listing from the Analysis pass, a ".lis" file which is a listing from the output pass and a ".stats" file which lists statistics collected during the analysis. Ignore this last file. The ".lis1" and ".lis" files have similar format, but represent the file as analysed before and after the application of program policies. Thus more lines will be marked as headings in the ".lis1" file, but only those that "pass policy" - i.e. are in sequence and at the right indentation - will be marked as headings in the ".lis" file. Understanding these files is a black art, but a quick look can sometimes help you understand how the program has interpreted particular lines that have gone wrong, and give you a clue as to which policies may be used to correct this behaviour. 5.2 Generate a .log file The program will display messages during conversion. You can filter these messages (e.g. to suppress certain types) by using the Menu option _Settings -> Diagnostics_ These messages can also be output to a .log file by using the options under _Conversion Options -> Output policies -> File generation_ This log file will contain *all* messages, including those suppressed by filtering. In the Windows version you can also choose to save the messages displayed to file. Looking through the .log file can sometimes reveal problems that the program has detected and reported. 5.3 Generate a .pol file The program operates in three passes. - The first pass analyses the file, and sets various policies automatically (assuming these haven't previously been loaded from a policy file). - The second pass calculates the output file structure, - The third pass actually generates the output files. You can use the options under _Conversion Options_ to review the policies that have been set. Alternatively you can save these policies to file, using the menu option _Conversion options -> Save policies to file_ Selecting the "save all policies" option. Be careful not to overwrite any existing "incremental" file. This file will list all policies used, which you may review... particularly looking for any analysis policies that seem to have been incorrectly set. 5.4 Understanding error messages In the fullness of time an [Error Manual] will be produced. (see 1.7) 5.5 Diagnosing table problems See "[[GOTO how does the program detect and analyse tables?]]" and other topics in the "[[GOTO Tables]]" section of this document. 6.0 Future directions ===================== 6.1 RTF generation The text analysis engine that lies at the core of AscToHTM is now available in a text-to-RTF converter. This is called [AscToRTF], but we prefer the name "rags to Rich Text" :-) The initial release of this software was in March 2000. For more details visit the [AsctoRTF] home page. 6.2 Multi-lingual user interface AscToHTM (and AscToRTF) support several languages in the user interface. These translations have been provided by volunteers, and so far only extend to parts of the user interface. All the programs' documentation and support remain in English. The software also supports the use of "language skins", i.e. the loading of text files containing all the user interface text. This will hopefully allow people to convert the program into more languages. We'd welcome copies of skins developed, and will consider them for future distribution. Please send them to *infojafsoft.com* For more details visit http://www.jafsoft.com/products/translations.html 6.3 Improved standards support *Standards support is now a stated aim of the program*. However, due to the complexities of generating standards-compliant HTML from arbitrarily structured text we don't feel we can *guarantee* standards-compliance. If you find the program generating faulty HTML, please report it to *infosupport.com*. If you want to validate your HTML, visit http://validator.w3.org/ Please note, if you embed your own HTML into your source files, this may well upset the balance in terms of compliance. Note: When the program detects that it has violated standards, error messages will be displayed. You should report such violations to *infosupport.com*. 6.4 Targeting particular HTML versions Internally the program is aware of the features and limitations of various versions of HTML as follows $_$_BEGIN_TABLE $_$_TABLE_MIN_COLUMN_SEPARATION 2 [[TEXT HTML 3.2]] [[TEXT HTML 4.0]] Transitional [[TEXT HTML 4.0]] Strict (not yet supported) $_$_END_TABLE For example certain HTML entities are only supported under newer versions of HTML. Bearing in mind we're converting text files, there's a limit as to how advanced the HTML can be (for example I can't work out which text to animate :-) If you want to target a particular form of HTML, use the policy HTML version to be targeted : [[TEXT HTML 4.0]] Transitional and the program will adjust to do the best it can. Note: "[[TEXT HTML 4.0]] Strict" is not yet supported as the program uses a number of "deprecated" tags that are still allowed in "[[TEXT HTML 4.0]] Transitional". 6.5 CSS and Font support Font support will be introduced shortly. Due to the program's history, the HTML currently being generated is more akin to [[TEXT HTML 3.2]]. Over time we plan to offer proper [[TEXT HTML 4.0]] and CSS support, although obviously this will be limited to what can be sensibly applied to converted text. $_$_DEFINE_HTML_FRAGMENT HTML_HEADER

        FAQ for JafSoft text conversion utilities

        You can download these files as a .zip file (~100k)


        $_$_END_BLOCK $_$_DEFINE_HTML_FRAGMENT HTML_FOOTER

        Valid HTML 4.0! Converted from a single text file by AscToHTM
        © 1997-2003 John A Fotheringham
        Converted by AscToHTM
        $_$_END_BLOCK