## 3. Document Markup Languages

Why we need a document markup? Bryan ([Bry88], pp. 5) defines markup as: "Markup is the term used to describe codes added to electronically prepared text to define the structure of the text or the format in which it is to appear." There can be two types of markups: specific markup and generalized markup. Specific markup describes the format of the document whereas generalized markup describes the structure of the document (headings, citations etc.). For example, Rich Text Format (RTF) is a specific markup language and TeX, LaTeX, SGML and HTML are general markup languages.

### 3.1. Standard Generalized Markup Language (SGML)

SGML is an international stardard (ISO 8879) for document markup. An SGML document contains a document type definition (DTD) and a set of elements, that are defined in DTD ([Bry88], pp. 20). Each element has a name and it can be used as a tag in SGML document.

### 3.2. HyperText Markup Language (HTML)

HTML is a SGML based markup language for WWW documents. HTML is actually a DTD, a set of definitions of how to interpret HTML tags.

### 3.3. HyTime

HyTime is an international standard for hypermedia documents. It is based on SGML, but it can reference to a data in almost any format. Only the hypertext link information is required to be in SGML format ([DeR94], pp.5).

### 3.4. TeX and LaTeX

TeX and LaTeX are also general markup languages in the sense that we only describe document structures with LaTeX macros. The definition of the macros can be later changed and the document could be formatted differently.

There is a HyperTeX that has limited hypertext capability by implementing \special keyword so that is supports for exaple URL's. DVI viewer is then used to display ps files containing URL's. DVI viewer could call WWW browser to follow the URL.

### 3.5. Rich Text Format (RTF)

The difference between SGML and RTF is that SGML describes the stucture of a document, whereas RTF describes mainly the physical charasteristics of the text (text face, size, etc). However, RTF includes also certain tags that describe document stucture. The author can define a set of styles for the document (heading 1, heading 3, abstract, etc) that are written into the beginning of the RTF file and have a special tag in the RTF markup. An RTF file contains all text formatting, pictures and formulas and it is a standard defined by Microsof.

### 3.6. OpenMath

OpenMath consortium is an international group of researchers designing a protocol for exchanging mathematical information between applications [Abb95]. For example, a general purpose computer algebra system could call a specific purpose application to execute an algorithm implemented only in this application. OpenMath tries to preserve semantic information in addition to the structural information of the formula. For example, TeX describes only the visual appearance of a formula, not the semantic structure of the formula. Similar visual representation of mathematical formulas has been planned to SGML. MathLink is communications protocol for exchanging Mathematica expressions and data between Mathematica and external applications. The difference between MathLink and OpenMath is that MathLink does not define the semantical information of a formula.

OpenMath will include SGML compatibility [Abb95], so that OpenMath objects can also be included in SGML documents.