Washington University
Construction and Pedagogical
Use of Digital Archives
22 May-3 June 2006
David L. Gants
University of New Brunswick
DTD Key Points
Note: In the example listed in this set of related
documents, a courier face means the characters
should be typed as is, while an italic Times Roman indicates a variable.
1. Basic Rules Refresher
To be well-formed, an XML document must obey these rules:
-
The XML declaration must begin the document;
-
Every opening <tag> must have an accompanying
closing </tag>;
-
All elements must be nested hierarchically;
-
Empty tags must end with />, for example, <tag/>;
-
The document must contain exactly one root element that completely contains
all other elements;
-
All attribute values must be within quotes;
-
The characters "<" and "&"
are reserved and must be used only to begin tags and entity references
respectively;
-
The only native XML entity references are &,
<, >, ', and ".
The XML declaration looks like this:
<?xml version="1.0"?>
The Cascading Stylesheet declaration follows
the XML declaration but precedes the root element.
Observe the following naming conventions when creating tags:
-
Do not include whitespaces in tag names;
-
Do not include reserved XML characters or characters that have special
meaning in processing languages like perl;
-
Tag names are case-sensitive--<tag> is different
from <TAG>.
2. Document Type Definitions (DTD)
In order to be valid, an XML document must have an accompanying DTD.
The DTD may be included within in the XML document after the XML declaration and
before the root element:
<!DOCTYPE Name
SYSTEM [
Element, Attribute
and Entity Declarations...
]>
The DTD may exist as a private, external file referred to in a doctype declaration from within the XML
document, after the XML declaration and before the root element. The keyword
"SYSTEM" indicates a private DTD used by a single author or group:
<!DOCTYPE Root_element_name
SYSTEM "Filename_URL">
The TEI was created as a public consortium and distributes its guidelines and software for wider use. When using public DTDs such as those created by the TEI, the doctype declaration employs the keyword "PUBLIC" and includes the specific name of the DTD:
<!DOCTYPE Root_element_name
PUBLIC "DTD_name" "Filename_URL">
The sample TEI-Lite template contains the complete doctype declaration. For more complex project, the TEI has created a tripartite set of DTDs:
- Core: standard components of the TEI main DTD in all its forms; these are always included without any special action by the encoder
- Base: basic building blocks for specific text types; exactly one base must be selected by the encoder (unless one of the 'combined' bases is used)
- Additional: extra tags useful for particular purposes. All additional tag sets are compatible with all bases and with each other; an encoder may therefore add them to the selected base in any combination desired.
These auxiliary DTDs are invoked by declaring the appropriate parameter entity within square brackets and with the replacement text "INCLUDE". The CEWBJ uses this protocol.
When declaring elements and attributes, use the following symbols to indicate
relationships:
? Optional,
0 or 1
+ Required,
1 or more
* Optional
and repeatable, 0 or more
| Or
, Sequential
3. Element Declarations
Element declarations take the following form:
<!ELEMENT Tagname
(Content
Model)>
The Content Model may contain:
-
Specific tagnames or ANY (essentially any and all tagnames)
-
#PCDATA (essentially simple characters)
-
EMPTY (indicating an empty tag
4. Attribute Declarations
Attribute declarations take the following form:
<!ATTLIST Element-name
Attribute_name Type Default-value>
Attribute types can have the following values:
-
CDATA: Characters data that is not markup;
-
Enumerated: A list of values from which only one may be chosen;
-
ID: A unique name not shared by any other ID type in the document;
-
IDREF: The value of an ID type attribute of an element in the document;
-
IDREFS: Multiple IDS of elements separated by whitespace;
-
ENTITY: The name of an entity declared in the DTD;
-
ENTITIES: Names of multiple entitities declared in the DTD, separated by
whitespace.
Default types can have the following values:
-
#REQUIRED: The attribute must be included and given a value;
-
#IMPLIED: The attribute is optional
-
#FIXED: The attribute is required and the value is declared in the attribute
declaration within quotes immediately after the default value declaration.
There are two predefined XML attributes, both of which begin "xml:".
You must declare them for each element in which you plan to use them:
xml:space This
instructs the browser to recognise multiple whitespace.
xml:lang This
identifies the language used within the element.
5. Entity Declarations
Document-level entity declarations take the following form:
<!ENTITY Name
"Replacement text">
Within a DTD, parameter entities take the following form:
<!ENTITY % Name
"Replacement text">