An Introduction to Markup Languages

This guide provides an introduction to a variety of markup languages, including TEI, VRA Core, and EAD,

Markup Languages

Markup Languages

Markup languages are designed for the processing, definition and presentation of texts.  Electronic markup falls into three general categories:  presentational, procedural and descriptive.

Presentational markup consists of binary codes embedded within a document that are usually hidden from human users, including authors and editors. Procedural markup is embedded in the text and provides instructions for programs that will process the text.  Examples of procedural markup include troff, TeX, and PostScript.  Descriptive markup labels parts of the document, but doesn't necessarily provide specific instructions as to how they should be processed. HTML and XML are well-known examples of descriptive markup.

XML Markup Languages

XML or Extensible Markup Language is widely used in web development and has many applications for digital scholarship.  XML shares many similarities  with HTML, but they were each designed with different goals.  HTML was designed to display data and focuses on how the data looks.  XML, on the other hand, was designed to describe data and focuses what data is.  This separation of data from presentation simplifies data sharing, data transport, platform changes, and data availability.  

Another strength of XML is the lack of predefined tags which are required in HTML (ex. <p>, <h1>, <table>, etc.).  XML document authors can define their own tags and document structure, which provides for  a great deal of flexibility.  Or, authors can make use of existing XML based markup languages. This guide will highlight a number of XML based markup languages commonly used in digital scholarship.

Working with XML documents:

<?xml version="1.0" encoding="UTF-8"?>


<title type="main">This is an XML document</title>

<p>Creating XML documents is fun and easy as long as you follow some simple rules</p>


XML and HTML are both descended from SGML or Standard Generalized Markup Language, which accounts for their similarities.  However, XML and HTML were designed with different purposes in mind.


Designed to describe data Designed to display data
No predefined tags; Author created tags Uses predefined tags such as <p> and <h1>