dtd2html

dtd2html is a Perl program that generates an HTML document that documents an SGML document type definition (DTD) and allows hypertext navigation of an SGML DTD.

Contents:


Overview

dtd2html generates various HTML files for hypertext navigation of an SGML DTD. The files generated are as follows:

DTD-HOME.html

File is the home page of the HTML document. This file contains the basic links to start navigating through the DTD. The name of this file can be changed with the -homefile option. User text may be added to this page via the Description File.

TOP-ELEM.html

This file lists the top-most elements of the DTD, and contains the links to element pages describing each top-most element. The name of this file can be changed with the -topfile option.

ALL-ELEM.html

This file contains a list of all elements defined in the DTD. This page allows quick access to any individual element description page. The name of this file can be changed with the -allfile option.

ENTS.html (Optional)

File contains a list of general entities defined in the DTD. This file is only generated if the -ents option is specified during program invocation. The name of this file can be changed with the -entfile option.

DTD-TREE.html (Optional)

File contains the content heierachy tree(s) of the top-most element(s) in the DTD. This file is only generated if the -tree option is specified during program invocation. The name of this file can be changed with the -treefile option.

element.html

For each element defined in the DTD, an element description file is generated with a filename of the element name suffixed by ".html". User text may be added to this page via the Description File.

element.attr.html

For each element defined in the DTD, a file is generated describing the attributes defined for the element. User text may be added to this page via the Description File.

element.cont.html

For each element defined in the DTD, a file is generated listing the content model decleration of the element as declared in the DTD.

Once all the files are generated, one needs only to create a link in the Web server being used to the DTD-HOME page.

Note
If you have a Web client that can load local files, than linking the DTD-HOME page to the Web server is unnecessary.

More information on the content of each file is in the HTML File Descriptions section.


Usage

dtd2html is invoked from a command-line shell, with the following syntax:

% dtd2html [options] filename

filename is the SGML DTD to be parsed for generating the HTML files. The following is the list of options available:

-allfile filename

Set the filename for file listing all elements in the DTD to filename. The default name is "ALL-ELEM.html".

-catalog filename

Use filename as the file for mapping public identifiers and external entities to system files. If -catalog is not specified, "catalog" is used as the default filename. See Resolving External Entities for more information.

-contnosort

The base content list of the element.html page is listed as declared in the content model declaration. Normally, the elements are listed in sorted order and with no group delimiters, group connectors, or occurance indicators.

-descfile filename

Use filename as the source for element descriptions in the DTD. If this argument is not specified, no description file is used. See Description File for more information.

-docurl URL

Use URL for location of documentation on dtd2html. The default URL is "http://www.oac.uci.edu/indiv/ehood/dtd2html.html".

-dtdname string

Set the name of the DTD to string. If not specified, dtd2html determines the name of the DTD by its filename with the extension stripped off. If reading from standard input, then this argument should be specified. Otherwise, "Unknown" is used. The string " DTD" will be appended to the name of the DTD. If the -qref option is specified, then the string " DTD Quick Reference" is appended to represent the title of the quick reference document.

-elemlist

Generate a blank description file to standard output. See Description File for more information.

-ents

Generate a general entities page. The general entities types listed are: replaceable character data, CDATA, SDATA, and PI (processing instruction). Note: For large DTDs, this list may be quite large and provide little usefulness to the document.

-entsfile filename

Set the filename for the general entities page to filename. The default name is "ENTS.html".

-entslist

Generate a blank description file to standard output containing ONLY general entity entries. This differs from -elemlist is that -elemlist outputs ONLY entries for elements and attributes. See Description File for more information.

-help

Print out a terse description of all options available. No HTML files are generated and all other options are ignored when this option is specified.

-homefile filename

Set the filename for the HTML home page for the DTD to filename. The default name is "DTD-HOME.html".

-keepold

This option is only valid if -updateel is specified. This option tells dtd2html to preserve any old descriptions when updating an description file.

-level #

Set the prune level of the content hierachy tree to #. This option is only valid if -tree is specified.

-modelwidth #

Set the maximum output width for content model declarations to # for element.cont.html pages. Default value is 65.

-nodocurl

Do not insert hyperlink to dtd2html documentation in the DTD-HOME page.

-noreport

This option is only valid if -updateel is specified. This options tells dtd2html to not output a report when updating an description file.

-outdir path

Set destination of generated HTML files to path. Defaults to the current working directory.

-qref

Output a quick reference document of the DTD. The document is outputted to standard output (STDOUT). When this option is specified, only the quick reference document is generated. Therefore, the tree page and the -outdir options are ignored. See Quick Reference Mode for more information on the -qref option.

-qrefdl

Output a quick reference document of the DTD using the <DL>, definition list, HTML tag. When this option is specified, only the quick reference document is generated. Therefore, the tree page and the -outdir options are ignored. See Quick Reference Mode for more information. This option overrides the behavior of the -qref option.

-qrefhtag htag

Use htag as the header tag for the element names when the -qref option is specified. Defaults to '<H2>'.

-reportonly

This option is only valid if -updateel is specified. This options tells dtd2html to generate only a report when the -updateel option is specified.

-topfile filename

Set the filename for file listing the top-most elements in the DTD to filename. The default name is "TOP-ELEM.html".

-tree

Generate the content hierarchy of the top-most elements defined in the DTD.

-treelink

Create anchor in HTML pages to the tree page, even if -tree is not specified.

-treefile filename

Set the filename for file containing the content hierarchy tree(s) of the DTD to filename. The default name is "DTD-TREE.html". This option is only valid if -tree is specified.

-treeonly

Create only the tree page. This option implies -tree.

-treetop string

Set the top-most elements to string. String is a comma separated list of elements that dtd2html should treat as the top-most elements when printing the content hierarchy tree(s), and/or which elements get listed in the TOP-ELEM page. Normally, dtd2html will compute what are the top-most elements of the DTD. This option overrides that computation.

-updateel file

Perform an update of the description file specified by file. This option allows one to update an element description to contain any new elements/attributes that have been added to the DTD without affecting element descriptions already defined. See Updating Description File for more information.

-verbose

Print status messages to standard error on what dtd2html is doing. This option generates much output, and is used mainly for debugging purposes.


HTML File Descriptions

All HTML files/pages generated contain hypertext links at the end of the page to the DTD-HOME, TOP-ELEM, ALL-ELEM, ENTS (optional), and DTD-TREE (optional) pages, unless stated otherwise.

DTD-HOME

This page is the root of the HTML document. It contains the links to the other main pages as described above.

One can add documentation to the home page via the Description File or by manually editting the file.

TOP-ELEM

This page contains the list of all top-most elements defined in the DTD. A top-most element is defined as: An element which cannot be contained by another element or can be only contained by itself.

ALL-ELEM

This page contains an alphabetic list of all elements defined in the DTD.

ENTS

This page contains an alphabetic list of of general entities defined in the DTD. The general entities types listed are: replaceable character data, CDATA, SDATA, and PI (processing instruction). Note: For large DTDs, this list may be quite large and provide little usefulness to the document. Also, entities are not handled when updating a description file.

DTD-TREE

This page contains the content hierarchy tree(s) of the top-most elements of the DTD. The maximum depth of the tree can be set via the -level command-line option.

The tree shows the overall content hierarchy for an element. Content hierarchies of descendents will also be shown. Elements that exist at a higher (or equal) level, or if the maximum depth has been reached, are pruned. The string "..." is appended to an element if it has been pruned due to pre-existance at a higher (or equal) level. The content of the pruned element can be determined by searching for the complete tree of the element (ie. elements w/o "..."). Elements pruned because maximum depth has been reached will not have "..." appended.

Example:

     |__section+)
         |_(effect?, ...
         |__title, ...
         |__toc?, ...
         |__epc-fig*,
         |   |_(effect?, ...
         |   |__figure,
         |   |   |_(effect?, ...
         |   |   |__title, ...
         |   |   |__graphic+, ...
         |   |   |__assoc-text?)
Note

Pruning must be done to avoid a combinatorical explosion. It is common for DTD's to define content hierarchies of infinite depth. Even with a predefined maximum depth, the generated tree can become very large.

Since the tree outputed is static, the inclusion and exclusion sets of elements are treated specially. Inclusion and exclusion elements inherited from ancestors are not propagated down to determine what elements are printed, but special markup is presented at a given element if there exists inclusion and exclusion elements from ancestors. The reason inclusions and exclusions are not propagated down is because of the pruning done. Since an element may occur in multiple contexts -- and have different ancestoral inclusions and exclusions in effect -- an element without "..." may be the only place of reference to see the content hierarchy of the element.

Example:

    D1
     |  {+} idx needbegin needend newline
     | 
     |_(head,
     |   | {A+} idx needbegin needend newline
     |   |  {-} needbegin needend
     |   | 
     |   |_(((#PCDATA |
     |   |____((acro |
     |   |       | {A+} idx needbegin needend newline
     |   |       | {A-} needbegin needend
     |   |       | 
     |   |       |_(((#PCDATA |
     |   |       |____((super | ...
     |   |       |______sub)))*)) ...

Ignoring the lines starting with {}'s, one gets the content hierachy of an element as defined by the DTD without concern of where it may occur in the overall structure. The {} lines give additional information regarding the element with respect to its existance within a specific context. For example, when an ACRO element occurs within D1,HEAD -- along with its normal content -- it can contain IDX and NEWLINE elements due to inclusions from ancestors. However, it cannot contain NEEDBEGIN and NEEDEND regardless of its defined content since an ancestor(s) excludes them.

Note
Exclusions override inclusions. If an element occurs in an inclusion set and an exclusion set, the exclusion takes precedence. Therefore, in the above example, NEEDBEGIN, NEEDEND are excluded from ACRO.

Explanation of {}'s keys:

{+}
The list of inclusion elements defined by the current element. Since this is part of the content model of the element, the inclusion subelements are printed as part of the content hierarchy of the current element after the base content model. Subelements that are inclusions will have {+} appended to the subelement entry.
{A+}
The list of inclusion elements due to ancestors. This is listed as reference to determine the content of an element within a given context. None of the ancestoral inclusion elements are printed as part of the content hierarchy of the element.
{-}
The list of exclusion elements defined by the current element. Since this is part of the content model of the element, any subelement in the content model that would be excluded will have {-} appended to the subelement listing.
{A-}
The list of exclusion elements due to ancestors. This is listed as reference to determine the content of an element within a given context. None of the ancestoral exclusion elements have any effect on the printing of the content hierarchy of the current element.

element

The element page describes the content of element. The element page is divided into the following sections:

element.attr

The element.attr page describes the attributes of element. The element.attr page is divided into the following sections:

This page is not created if no attributes are defined for element.

element.cont

The element.cont page gives the element's content model decleration as defined in the DTD. The element.cont page is divided into the following sections:

The content models are reformatted to allow better readability. The maximum width to use when reformating is set by the -modelwidth option. Each element listed in the content model is a hyperlink to that element's page.

Here's an example of how dtd2html formats content model declarations:

    (((#PCDATA|
       ((acro|book|emph|location|not|parm|term|var))|
       ((super|sub))|
       ((link|xref))|
       ((computer|cursor|display|keycap|softkey|user))|
       ((footnote|ineqn|ingraphic|fillin))|
       ((nobreak)))*))

This page is not created if element is defined with empty content.


Description File

dtd2html supports the ability to add documentation to the HTML files generated from a DTD through the -descfile option. Documentation can be added to the element pages, the attribute pages, and/or ents page.

Basic Syntax

The basic syntax of the description file is as follows:

    <?DTD2HTML identifier>
    <P>
    Description of identifier here.
    </P>
    <?DTD2HTML identifier>
    <P>
    Description of identifier here.
    </P>
    ...

The line <?DTD2HTML identifier> signifies the beginning of a description entry for identifier. All text up to the next <?DTD2HTML ...> instruction or end-of-file is used as the identifier description.

The identifier can be one of the following formats:

element

An element name in the DTD. The following description text will go at the top of the element's page.

element*

An element in the DTD followed by a `*'. The following description text will go at the top of the element's attribute page.

element*attribute

An element in the DTD followed by a `*' which is followed by an attribute name of the element. The following description text will go below the attribute heading of the element's attribute page.

element+

An element in the DTD followed by a '+'. The following description text goes after each elements listed in ALL-ELEM and in element pages. Due to the context that the description text will appear (ie. inside a <LI> element), it is best to keep the description to a single sentence.

*attribute

A `*' followed by an attribute name. The following description text will go to any attribute named attribute, unless a specific description is given to the attribute via an element*attribute. This identifier allows to add descriptions to commonly shared attributes in one locale.

entity&

A general entity followed by a '&'. The following description text will go after each entity listed in the ENTS page. Due to the context that the description text will appear (ie. inside a <LI> element), it is best to keep the description to a single sentence.

identifier,identifier,...

A sequence of identifiers separated by commas, `,'. This allows a description to be shared among muliple identifiers. Note: there should be NO whitespace between the identifiers and the commas.

If the special element, -HOME-, is specified in the description file, then its description text will be put on the DTD-HOME page.

Special Instructions

dtd2html provides special instructions that may be used in a description file to control how dtd2html processes the file.

Special instructions follow a similiar syntax as descriptive instructions:

    <?DTD2HTML #instruction argument>

The following special instructions are defined:

#include argument

The include directive tells dtd2html to treat the argument as a filename to read that contains description entries. Example:

    <?DTD2HTML #include ents.dsc>

The example instructs dtd2html to open a file called ents.dsc and read it for description entries.

Comments

SGML comments are also supported in the description file. Comments are skipped by dtd2html. The syntax for a comment is the following:

    <!-- This is a comment -->
WARNING

dtd2html can only handle a comment that spans a single line (to make the parsing simple). Therefore, the following will cause dtd2html to add the comment text beyond the first line of the comment to an indentifier's description:

    <!-- This is a comment
         that spans more than one line.
      -->

If you want to put line breaks in the description file without them being applied to an indentifier's description, then use the SGML short comment: <!>.

Example

<!-- Include external descriptions -->
<!>
<?DTD2HTML #include ents.dsc>
<!>
<!-- A short description -->
<!>
<?DTD2HTML a+ >
Anchor; source and/or destination of a link
<!>
<!-- A shared description -->
<!>
<?DTD2HTML h1,h2,h3,h4,h5,h6 >
<p>
The six heading elements,
<H1> through <H6>, denote section headings.
Although the order and occurrence of headings is not constrained by
the HTML DTD, documents should not skip levels (for example, from H1
to H3), as converting such documents to other representations is
often problematic.
</p>
<!>
<!-- Element and attribute descriptions -->
<!>
<?DTD2HTML a >
<p>
The &lt;A&gt; element indicates a hyperlink anchor.
At least one of the NAME and HREF attributes should be present.
</p>
<?DTD2HTML a* >
<?DTD2HTML a*href >
<p>
Gives the URI of the head anchor of a hyperlink.
</p>
<?DTD2HTML a*methods >
<p>
Specifies methods to be used in accessing the
destination, as a whitespace-separated list of names.
The set of applicable names is a function of the scheme
of the URI in the HREF attribute. For similar reasons as
for the <a href="title.html">TITLE</a>
attribute, it may be useful to include the
information in advance in the link. For example, the
HTML user agent may chose a different rendering as a
function of the methods allowed; for example, something
that is searchable may get a different icon.
</p>

Description File Notes


Quick Reference Mode

dtd2html supports the ability to generate a quick reference document of a DTD with the -qref option. The document generated is sent to standard output (STDOUT). Therefore, one should redirect STDOUT to a file. Example:

    % dtd2html -qref html.dtd > htmlqref.html

No other output/files are generated while in quick reference mode.

The format of the quick reference document is as follows:

Defintion List, <DL>, Format

An alternative format for the quick reference document may be generated with the -qrefdl command-line option. The format of the document shares the same properties as those of the -qref option, with the following exceptions:

Each element is still wrapped in a <A NAME> statement to allow cross-referencing.

Quick Reference Notes

Quick Reference Tips


Updating Description File

As a DTD changes, one can automatically update the element description file for the DTD to reflect the changes via the -updateel command line option. The new updated description file is sent to standard output (STDOUT). Therefore, one should redirect STDOUT to a file. Example:

    % dtd2html -updateel html.desc html.dtd > html-new.desc

When updating a description file, a report is prepended to the new description file. The report is contained in SGML comment declaration statements. Here's an example of what the report looks like:

<!-- Element Description File Update                                      -->
<!-- Source File:  sgm/html.desc                                          -->
<!-- Source DTD:  sgm/html.2.0/html.dtd                                   -->
<!-- Deleting Old?  Yes                                                   -->
<!-- Date:  Mon Jun 27 00:25:41 EDT 1994                                  -->
<!-- New identifiers:                                                     -->
<!--    br, dl*, dl*compact, form, form*, form*action, form*enctype,      -->
<!--    form*method, img*ismap, input, input*, input*align,               -->
<!--    input*checked, input*maxlength, input*name, input*size,           -->
<!--    input*src, input*type, input*value, option, option*,              -->
<!--    option*selected, option*value, select, select*,                   -->
<!--    select*multiple, select*name, select*size, strike, textarea,      -->
<!--    textarea*, textarea*cols, textarea*name, textarea*rows            -->
<!-- Old identifiers:                                                     -->
<!--    dir*, dir*compact, key, link*name, menu*, menu*compact, ol*,      -->
<!--    ol*compact, u, ul*, ul*compact                                    -->
<!--                                                                      -->

Updating Notes


Resolving External Entities

Defining the mapping between external entities to system files may be done via the -catalog command-line option. The catalog provides you with the capability of mapping public identifiers to system identifiers (files) or to map entity names to system identifiers.

Catalog Syntax

The syntax of a catalog is a subset of SGML catalogs (as defined in SGML Open Draft Technical Resolution 9401:1994).

A catalog contains a sequence of the following types of entries:

PUBLIC public_id system_id

This maps public_id to system_id.

ENTITY name system_id

This maps a general entity whose name is name to system_id.

ENTITY %name system_id

This maps a parameter entity whose name is name to system_id.

Syntax Notes

Example catalog file:

        -- ISO public identifiers --
PUBLIC "ISO 8879-1986//ENTITIES General Technical//EN"            iso-tech.ent
PUBLIC "ISO 8879-1986//ENTITIES Publishing//EN"                   iso-pub.ent
PUBLIC "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN"  iso-num.ent
PUBLIC "ISO 8879-1986//ENTITIES Greek Letters//EN"                iso-grk1.ent
PUBLIC "ISO 8879-1986//ENTITIES Diacritical Marks//EN"            iso-dia.ent
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN"                iso-lat1.ent
PUBLIC "ISO 8879-1986//ENTITIES Greek Symbols//EN"                iso-grk3.ent 
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 2//EN"                ISOlat2
PUBLIC "ISO 8879-1986//ENTITIES Added Math Symbols: Ordinary//EN" ISOamso

        -- HTML public identifiers and entities --
PUBLIC "-//IETF//DTD HTML//EN"                                    html.dtd
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML"          ISOlat1.ent
ENTITY "%html-0"                                                  html-0.dtd
ENTITY "%html-1"                                                  html-1.dtd

Environment Variables

The following envariables (ie. environment variables) are supported:

P_SGML_PATH

This is a colon (semi-colon for MSDOS users) separated list of paths for finding catalog files or system identifiers. For example, if a system identifier is not an absolute pathname, then the paths listed in P_SGML_PATH are used to find the file.

SGML_CATALOG_FILES

This envariable is a colon (semi-colon for MSDOS users) separated list of catalog files to read. If a file in the list is not an absolute path, then file is searched in the paths listed in the P_SGML_PATH and SGML_SEARCH_PATH.

SGML_SEARCH_PATH

This is a colon (semi-colon for MSDOS users) separated list of paths for finding catalog files or system identifiers. This envariable serves the same function as P_SGML_PATH. If both are defined, paths listed in P_SGML_PATH are searched first before any paths in SGML_SEARCH_PATH.

The use of P_SGML_PATH is for compatibility with earlier versions. SGML_CATALOG_FILES and SGML_SEARCH_PATH are supported for compatibility with James Clark's nsgmls(1).

Note
When searching for a file via the P_SGML_PATH and/or SGML_SEARCH_PATH, if the file is not found in any of the paths, then the current working directory is searched.
Note

The file specified by -catalog is read first before any files specified by SGML_CATALOG_FILES.


Availability

This program is part of the perlSGML package; see <URL:http://www.oac.uci.edu/indiv/ehood/perlSGML.html>


Author

Earl Hood <ehood@medusa.acs.uci.edu>