Change Tracking Markup Specification

Since change tracking is needed for collaborative efforts among many authors the changes captured during an editing session need to be saved in the document. Namespaces are used for change tracking markup when it is represented in an XML file. The onus is on the user to avoid a conflict of namespaces with the namespace used for change tracking: atict. This namespace will be defined by the URL http://www.arbortext.com/namespace/atict.
An equivalent representation using PI’s will be used for the SGML file representation. In particular, the start tag <atict:add> will appear as <?Pub Tag atict:add> while the end tag will appear as <?Pub /atict:add>. Attributes will appear within the start tag PI as name/value pairs.
A table of change tracking information appears at the beginning of the document. The actual tracked changes are marked up in-line in the document in such a way that both old and new content is preserved. The following sections define the markup. All markup is automatically available (pre-compiled) in all doctypes.

1   Types of change tracking markup

There are nine new elements used to mark up changes to an xml document:
Name
Element identifier
Content model
add
atict:add
#ANY
delete
atict:del
#ANY *
change markup
atict:chgm
#ANY *
add markup
atict:addm
#EMPTY
delete markup
atict:delm
#EMPTY
join †
atict:join1
#EMPTY
 
atict:join2
#EMPTY
split
atict:split1
#EMPTY
 
atict:split2
#EMPTY
† region between joined tags is read-only.
* content is read-only.
The split and join change track records are made up of two singletons each. The two parts are linked by the ref attribute which is defined below.
The generic add and generic delete change tracking markup may appear anywhere within a document. The remaining markup applies only to document instance tags and must follow the start tag in a chain of atict elements according to the following (informal) production model:
   modified-tag == open-tag atict-schain | atict-echain end-tag

   open-tag     == unpaired-tag | start-tag

   tag-pair     == unpaired-tag | start-tag end-tag

   unpaired-tag == <gi attrs/>

   start-tag    == <gi attrs>

   end-tag      == </gi>



   atict-schain == atict-dels? atict-chgm* atict-adds?

   atict-echain == atict-dele? atict-adde?

   atict-dels   == <atict:delm/> | <atict:join2/>

   atict-adds   == <atict:addm/> | <atict:split2/>

   atict-chgm   == <atict:chgm> tag-pair </atict:chgm>

   atict-dele   == <atict:join1/> 

   atict-adde   == <atict:split1/>

In addition, the markup always follows a strict reverse chronological order after any given tag. Examples appear below.

2   Change tracking attributes

The following are common attributes for all change tracking markup.
Name
Attribute
Value
reference number
ref
number
user id
user
unique user identifier
timestamp
time
system time (sec)
subtype
subtype
string
generic attribute
attr[1–9]
any
The reference number will be used to link two change track records or two parts of a change track record together. For example, the <atict:split1> and <atict:split2> tags are two parts of a single change track split record, so they will have the same reference number. Reference numbers are otherwise unique in a document or file entity. That is, the same reference number cannot be used to identify two distinct change track records.
The user id identifies the user who made the change that triggered this markup to be inserted. It is an index in the atict:user table. The timestamp is a number which represents the time of the change or group of changes. All changes that can be undone by a single undo operation are given the timestamp of the start of the change. The time is expressed in seconds since Jan. 1, 1970.
The <atict:add> and <atict:delete> markup can be caused by many different operations. The subtype attribute is a string that identifies the operation performing the change more specifically. This is the same string that will be used in tooltips for the undo button.
The generic attributes “attr1” to “attr9” are available to customize change tracking applications. Attributes to control formatting may be added in the future.

3  Change tracking info tables

Change tracking info tables appear at the top of a document and contain detailed information pertaining to the change tracking markup in the document. Currently two info tables are defined, the user table and the generic table.
Table Name
Attribute
Value
atict:info
tracking
on | off
 
ref
number
atict:user
user
string (login id)
 
fullname
string
Change tracking info table markup consists of empty elements in the atict namespace whose gi name is the name of the table, and whose attribute value pairs come from the table above. Table entries appear at the beginning of the document. For example:
<?xml version="1.0" encoding="utf-8"?>

<book xmlns:atict="http://www.arbortext.com/namespace/atict" 

      isbn="0-13-636928-8">

<atict:info tracking="on" ref="0"/>

<atict:user user="eap" fullname="The King"/>

<atict:user user="rdm" fullname="Red D. Mix"/>

  <title> ... </title>

  ...

</book>

Any changes to the top level tag (name or attribute) will be stored following the change tracking information markup.

4   Change tracking markup

The following information is provided for each specific type of change markup defined in this document.
• conditions under which the markup is applied
• exceptions to those conditions
• placement of change tracking markup relative to changed material
• content of markup, if it paired
• additional atict attributes specific to this markup, if any
• restrictions on any content (e.g. read-only)
• example markup usage (before and after)
• special notes where needed
The examples will include whitespace for clarity that might not be present in the ASCII representation of the markup.

ADD

The <atict:add>...</atict:add> is used to indicate additions to the document. The markup surrounds the material that has been added. The added material may come from an insert_equation, insert_graphic, insert_string, insert_table, insert_tag, newline, paste, read, or substitute command. It may be the result of the insert(), tbl_insert() or insert_tag() functions. Or it may be the result of typing or of using the insert_tag panel. The list given here is not intended to be exhaustive.
Note that if the insert_tag command or insert_tag() function is used with a pending region which is not deleted as a result of the insert then the <atict:addm/> markup is used instead.
The added material may include elements and CDATA, but the elements must be well nested. The region within the generic add is modifiable.
The subtype indicates the type of action which caused the add. This is the same string that will be used in undo operation tooltips.
Example: Suppose a document contained two paragraphs, and a user wanted to add a newpage tag between them. The following markup would result:
  <p>Existing data.</p>  

    <atict:add user="aed" time="9789765" subtype="markup">

      <newpage/>

    </atict:add>

  <p>Existing data.</p>  

The newpage element has been added in this example by user "aed".

DELETE

The <atict:del>...</atict:del> markup is used to indicate material that has been deleted from the document. This can result from the use of the insert() function with a pending region. The delete_character, and delete_mark commands cause a generic delete. It may also be caused by an insert_string, insert_tag, paste, or read command with a pending region. The substitute or translate commands will cause a generic delete and a generic add. Likewise for certain operations of the spelling checker. The list given here is not intended to be exhaustive.
Note that a "pending delete", where a region is highlighted and then a new object is added, is treated as two tracked operations: a generic delete followed by a generic add.
The <atict:del>...</atict:del> markup surrounds the material marked for deletion but this material is not actually removed from the document. The surrounded material may include elements and CDATA and it is always well balanced. If needed an unbalanced region is subdivided into balanced components before they are marked up as deleted.
The content of the generic delete <atict:del>...</atict:del> is read only.
The subtype indicates the type of action which caused the delete. This is the same string that will be used in undo operation tooltips.
Example: Suppose this is the original document content.
  <p>Existing data.</p>  

  <p>This paragraph should be deleted.</p>

  <p>Existing data.</p>  

After user "aed" deleted the second paragraph the markup became:
  <p>Existing data.</p>  

    <atict:del user="aed" time="9789767" subtype="region">

      <p>This paragraph should be deleted.</p>

    </atict:del>

  <p>Existing data.</p>  

From this point on the common atict attributes will not be used in the examples for the sake of clarity.

CHANGE MARKUP

The <atict:chgm>...</atict:chgm> element is used to indicate that an existing tag has been changed in the document. The markup indicates that either the attributes changed ("delete_lms", "modify_tag"), or the tag name changed ("change_tag").
The <atict:chgm>...</atict:chgm> markup appears after the new start tag of the modified element. The content is the start and end tag as it appeared before being modified. The content is read only.
Example: The following listing is of the original material in a document.
  <p id="I23">Existing paragraph.</p>  

  <p>Existing data.</p>  

Subsequently, the id attribute was moved from the first paragraph to the second which was renamed to quote. This was done in three steps: first a "delete_lms" applied to the first tag, then a "modify_tag" applied to the second tag, finally the id attribute is added. The result is the following markup:
  <p>

    <atict:chgm><p id="I23"></p></atict:chgm>

    Existing paragraph.</p>

  <quote id="I23">

    <atict:chgm><p></p></atict:chgm>

    Existing data.</quote>

If the first <p> tag were to be given a new id value of "I24", the result would be the following markup:
  <p id="I24">

    <atict:chgm><p></p></atict:chgm>

    <atict:chgm><p id="I23"></p></atict:chgm>

    Existing paragraph.</p>

  <quote id="I23">

    <atict:chgm><p></p></atict:chgm>

    Existing data.</quote>

Note how the latest change immediately follows the <p> tag and older changes appear further from the modified tag (reverse chronological order).

INSERT MARKUP

The <atict:addm/> element is used to indicate that the preceding start or unpaired tag has been added to the document. The tag may have been added explicitly ("insert_tag") or implicitly (via the atd file driven context fix ups, templates, or the parser driven context fixing code).
After the element’s markup is added to the document the <atict:addm/> singleton is placed after the element’s start tag. It is always the last atict markup to follow a tag in the atict chain for that tag.
The addm tag uses the common atict attributes, there are no additional attributes.
Example: Suppose the document contains a paragraph with plain text, such as the following:
  <p>Click on the next link.</p>

If the user adds an emphasis tag around the word "next" while automatic change tracking is in effect, the following markup will appear:
  <p>Click on the <em><atict:addm/>next</em> link.</p>

Note: Adding markup and changing attributes may be merged into one markup step if desired. The addm markup design supports attributes on the inserted tag.

DELETE MARKUP

The <atict:delm/> element is used to indicate that the preceding tag has been deleted from the current document. This is used to indicate that the tag (and its mate if any) not the element content has been deleted. This can happen as a result of the acl delete_tag command and by some internal operations that delete markup.
The markup for the element still remains in the document, however the <atct:delm/> singleton appears after the element’s start tag. It is always the first atict markup to follow a tag in the atict chain for that tag since after a tag is deleted no further changes to that tag are possible.
Example: Suppose the document contains a paragraph with some emphasized text, such as the following:
  <p>Click in the <em>next</em> link.</p>

If the user deletes the emphasis tag while automatic change tracking is in effect, the following markup will appear:
  <p>Click in the <em><atict:delm>next</em> link.</p>

JOIN

The <atict:join1/> and <atict:join2/> singletons indicate that two adjacent elements of the same type have been joined into one element holding their combined content. This is done with the acl "join" command. It may also happen as a result of a delete of an unbalanced selection.
Both elements have a numerical attribute called “ref” so that they are mutually cross referenced. The attribute is required and is automatically generated when the markup is created.
The markup for the two joined elements is not modified, however the first element’s end tag is preceded by <atict:join1/> while the second element’s start tag is followed by <atict:join2/>. The <atict:join2/> markup is always the first in the atict chain following a tag.
The region between two joined elements is read only, as is the end tag of the first element and the start tag of the second element.
Example: Suppose the cursor is placed before the last paragraph in the following example:
  <p>This is the first paragraph.  </p>

  <p>This is the second.  </p>

  <p>This is the last paragraph.</p>

When the join command is issued the following markup will result:
  <p>This is the first paragraph.  </p>

  <p>This is the second.  <atict:join1 ref="1"/></p>

  <p><atict:join2 ref="1"/>This is the last paragraph.</p>

A subsequent join command with the cursor after the first paragraph will result in:
  <p>This is the first paragraph.  <atict:join1 ref="2"/></p>

  <p><atict:join2 ref="2"/>This is the second.  <atict:join1 ref="1"/></p>

  <p><atict:join2 ref="1"/>This is the last paragraph.</p>

The regions between the joined paragraphs are read only. They are well balanced and considered deleted in the latest version. Accepting the join causes the intervening data to disappear. If the region between joined elements is not empty at the time of the join, it will be marked up as a delete. If the delete is rejected before the join is accepted, the join will not be acceptable.

SPLIT

The <atict:split1/> and <atict:split2/> singletons indicate that a single element has been split into two adjacent elements of the same type. This is done with the acl "split" command. It may also happen as a result of a “newline” command.
Both elements have a numerical attribute called “ref” so that they are mutually cross referenced. The attribute is required and is automatically generated when the markup is created.
To indicate the split the first element’s end tag is preceded by <atict:split1/> while the second element’s start tag is followed by <atict:split2/>. The <atict:split2/> markup is always the last in the atict chain following a tag.
Example: Suppose the cursor is placed at the start of the second sentence.
  <item><p>This is the first part.  This is the rest.</p></item>

When the split command is issued the following markup will result:
  <item><p>This is the first part.  <atict:split1 ref="1"/></p>

  <p><atict:split2 ref="1"/>This is the rest.</p></item>

Another split of the item this time results in the following markup:
  <item><p>This is the first part.  <atict:split1 ref="1"/></p>

    <atict:split1 ref="2"/></item>

    <item><atict:split2 ref="2"/>

  <p><atict:split2 ref="1"/>This is the rest.</p></item>

5  Markup Optimizations

There are a number of cases where markup optimization is posible:
• any operation within an addition
• addition adjacent to an addition of the same type
• deletion adjacent to a deletion of the same type
• change markup to the same tag
• add markup followed by delete markup
• split followed by a join
• any operation that results in an empty add
Optimizations apply only if the new change is being done by the same user that created the pre-existing change. The assumption is that consecutive changes of this sort are all parts of one operation. There are three distinct types of optimizations.
1. When a modification of a document would lead to redundant nested markup, such as an addition within an addition by the same user, the redundant markup is removed. This is called merging nested change tracking markup.
2. Two adjacent changes of the same type can sometimes be combined into one. This is called merging adjacent change tracking markup.
3. Two changes applied to the same object may sometimes cancel each other out, such as when markup is first added and then deleted. This is called cancelling change tracking markup.
When multiple changes are merged, the subtype of the oldest operation is retained in the existing change tracking markup. Likewise the time stamp of the oldest operation is retained.

6  Markup Ambiguity

When a modification of the document can be represented by more than one type of change tracking markup, the following priorities will be used to determine the markup to use:
• avoid duplication of content
• minimize the number of markup tags
• prefer add and delete regions over add and delete markup
These priorities have been established because they cause a change of behavior when implemented. For example, if content were duplicated by using add and del markup instead of addm and delm when possible, the changes to the content done after the markup change would be lost if the markup change were rejected. By not duplicating the content, changes to the content can be accepted or rejected independently of the change to the markup. From the user’s point of view this is a (beneficial) change of behavior.
Specific markup is not mandated for any remaining ambiguous situations.

7   Additional notes

The markup discussed here will be used for both the external representation of a change tracked document (the result of a "write" command) and the internal representation (docfrags). There is no compelling reason to use different schemes for these two representations.
The reason that old tags are retained in the document as tags with all of their old attributes intact is that the view selection code must work on the document without modifying the docfrag list. The downside to this is that parsers which simply ignore the namespace elements may get confused by the remaining document tags when they try to validate the changed document against its dtd.
In the first release the subtype field will not be supported.