Introduction

Top  Previous  Next

In this topic:

linkInternal What are Standard Format Markers?

linkInternal History of USFM

linkInternal Unification Notes

linkInternal Software Notes

linkInternal Markup Additions/Extensions

linkInternal Paratext Stylesheets

 

See also: linkPage Release Notes | linkPage Syntax Notes

What are Standard Format Markers?

In general terms, a markup language is a special notation for identifying the components and structure of an electronic document. It combines extra information about the text together with the text itself. The extra information is what is expressed using markup. Markup can also include information about the intended presentation of the text, or instructions for how a software process should handle the text. A good markup system is easily identified as separate from the text itself.

 

Standard Format Markers have been used for many years within the Bible translation community as a method for identifying the unique textual elements which exist within an electronic scripture document. SFMs start with a backslash character "\" and end with the next space. Over time many different local "standards" for SFM use were developed, adapted, and used, for supporting the varied requirements of Bible translation and publishing projects around the globe.

History of USFM

The divergent use of SFMs led to a variety of problems – most notably the challenges associated with sharing text or related text processing tools among entities, departments, or partner organizations. Separate and ongoing maintenance of duplicated tools and procedures, which were required for managing the flow of the text through its life-cycle, became costly and very difficult to support.

 

In March 2002 a working group was established within the United Bible Societies with the mandate of "crafting a unified specification for SFM use across 4 UBS areas". Having one SFM standard would provide numerous benefits:

 

·Allow more thought and effort to be put into developing just one set of tools and utilities to be shared by all projects:
oTools for text checking and analysis.
oTools for developing supporting textual resources such as concordances and indexes.
oTools for streamlining the publishing process.
·Eliminate or minimize duplication of effort in providing these tools.
·Allow better sharing of both tools and data.
·Allow Paratext 6 users to use one tested and proven stylesheet.
·Prepare the project for a smoother transition to other markup formats or future technologies.

 

Ideally an SFM standard would have as one of its goals that of marking common scriptural element types, and not formatting (presentation) information. USFM has attempted to "unify" a long history of SFM type scripture markup "standards", some of which were more or less strict in their tolerance for format-oriented markers. The primary focus in USFM development was on unification, not markup creation. What this means is that USFM inherits support for both the positive (and some negative) aspects of pre-existing SFM marker use. The USFM working group did not wish to create an unmanageable conversion task for legacy SFM encoded texts.

Unification Notes

·Markers which would be used in a broader text "environments" were named using a reserved initial letter, rather than an opening and closing tag:
o\i - Introductions
o\f - Footnotes
o\x - Cross references
o\e - Explanatory (study) material - for extended notes, sidebars and bridge materials.
·Related marker types were often consolidated using “numbered” marker definitions.
oExample: \mt#, \ms#, \s#, \li# etc., where the variable # represents a number which can indicate a level or relative weighting.
·Marker definition "collisions" were resolved (same marker used to mark different content).

Software Notes

·Translation editors which implement support for USFM encoded text may provide a formatted view of the text using a set of style definitions for each USFM marker. These "stylesheets" most often refer to these formatting definitions as paragraph and character styles.
 
·In USFM, character level markup can be nested (embedded) within a paragraph element, or another character element, but (depending on the way in which the markers are written) does not necessarily cancel out the previous marker's attributes. linkWeb Paratext (a UBS translation editor) is not capable of rendering all of the display variations that would be implied due to marker nesting.

Markup Additions/Extensions

Over the course of its development it has become clear that the USFM standard will not likely include and handle markup for all potential (real or perceived) markup needs which a project may require. There are a number of reasons for this, which include:

 

·The intention of keeping the USFM marker inventory manageable from a typical end user's perspective.
Some may argue that the more than 180 existing marker possibilities are already more than challenging enough to select from and use correctly. For this reason a "draft" stylesheet was created for Paratext editing which lists only the essential markers for editing a typical translation's 1st draft text.

 

·The intention to encourage content oriented markup.
Although USFM contains markers which are format (presentation) oriented (see History of USFM), the use of these is typically discouraged, wherever an suitable alternative is available.
 
·The intention to maintain a stable target for tool developers.
The USFM marker inventory cannot continue to evolve and develop indefinitely without minimizing the benefit it potentially offers to developers working on text checking, analysis, conversion, and publishing tools. A stable target is needed for these tools to work against.

 

Since USFM 1.0, requests for additions to the standard have been received and considered by the USFM committee (see Release Notes for some details). The committee has made a choice to specifically exclude new markers submissions that specify formatting from becoming part of USFM. What this means is that the USFM standard is open to the consideration of new markup needs, but closed to allowing new format oriented markup. At times this position has caused some frustration for users who might otherwise be quite satisfied with USFM, but would be greatly assisted by adding markup to their text which is not a part of the standard.

Option: The \z namespace

As a means of offering a type of solution to the need for occasional local markup additions/extension, the USFM committee has agreed to document a recommended practise.

 

check

USFM officially recommends that any additional user generated markup should begin with \z (e.g. \zMyMarker). Markers in this namespace will not be considered a part of the USFM standard, or be generally supported in USFM aware applications. This will become a kind of "private use area". It will become the user or tool builder's responsibility to support support specific \z markup in ways which meet a local need. Other USFM processing tools cannot be expected to handle \z markup or associated text, and are free to ignore them when they are encountered in the text.

 

The USFM committee felt that this was a reasonable approach since applications like linkWeb Paratext and linkWeb Publishing Assistant already provide a mechanism for user generated markers to be added to project stylesheets, allowing proper functioning of checking and formatting tasks.

Paratext Stylesheets

The most recent full and draft USFM stylesheet files for use with the Paratext translation editor are always available from linkWeb http://ubs-icap.org/usfm.