Change
Tracking Markup Specification
Since change tracking is needed for collaborative
efforts among many authors the changes captured during an editing session
need to be saved in the document. Namespaces are used for change tracking
markup when it is represented in an XML file. The onus is on the user to avoid
a conflict of namespaces with the namespace used for change tracking: atict. This namespace will be defined by
the URL
http://www.arbortext.com/namespace/atict.
An equivalent representation using PI’s will be used for the SGML file
representation. In particular, the start tag <atict:add> will appear as
<?Pub Tag atict:add> while the end tag will appear as <?Pub Tag /atict:add>.
Attributes will appear within the start tag PI as name/value pairs.
A table of change tracking information appears at the beginning of the document.
The actual tracked changes are marked up in-line in the document in such a
way that both old and new content is preserved. The following sections define
the markup. All markup is automatically available (pre-compiled) in all doctypes.
1
Types of change tracking markup
There are nine new elements used to mark up changes
to an xml document:
| Name |
Element identifier |
Content model |
|
add |
atict:add |
#ANY |
|
delete |
atict:del |
#ANY * |
|
change markup |
atict:chgm |
#ANY * |
|
add markup |
atict:addm |
#EMPTY |
|
delete markup |
atict:delm |
#EMPTY |
|
join † |
atict:join1 |
#EMPTY |
| |
atict:join2 |
#EMPTY |
|
split |
atict:split1 |
#EMPTY |
| |
atict:split2 |
#EMPTY |
| † region between joined tags is read-only.
|
* content is read-only. |
The split and join change track records are made
up of two singletons each. The two parts are linked by the ref attribute which
is defined below.
The generic add and generic delete change tracking
markup may appear anywhere within a document. The remaining markup applies
only to document instance tags and must follow the start tag in a chain of
atict elements according to the following (informal) production model:
modified-tag == open-tag atict-schain | atict-echain end-tag
open-tag == unpaired-tag | start-tag
tag-pair == unpaired-tag | start-tag end-tag
unpaired-tag == <gi attrs/>
start-tag == <gi attrs>
end-tag == </gi>
atict-schain == atict-dels? atict-chgm* atict-adds?
atict-echain == atict-dele? atict-adde?
atict-dels == <atict:delm/> | <atict:join2/>
atict-adds == <atict:addm/> | <atict:split2/>
atict-chgm == <atict:chgm> tag-pair </atict:chgm>
atict-dele == <atict:join1/>
atict-adde == <atict:split1/>
In addition, the markup always follows a strict
reverse chronological order after any given tag. Examples appear below.
2
Change tracking attributes
The following are common attributes for all change tracking markup.
| Name |
Attribute |
Value |
|
reference number |
ref |
number |
|
user id |
user |
unique user identifier |
|
timestamp |
time |
system time (sec) |
|
subtype |
subtype |
string |
|
generic attribute |
attr[1–9] |
any |
The reference number will be used to link two change
track records or two parts of a change track record together. For example,
the <atict:split1> and <atict:split2> tags are two parts of a single
change track split record, so they will have the same reference number. Reference
numbers are otherwise unique in a document or file entity. That is, the same
reference number cannot be used to identify two distinct change track records.
The user id identifies the user who made the change
that triggered this markup to be inserted. It is an index in the atict:user
table. The timestamp is a number which represents the time of the change or
group of changes. All changes that can be undone by a single undo operation
are given the timestamp of the start of the change. The time is expressed
in seconds since Jan. 1, 1970.
The <atict:add> and <atict:delete> markup
can be caused by many different operations. The subtype attribute is a string
that identifies the operation performing the change more specifically. This
is the same string that will be used in tooltips for the undo button.
The generic attributes “attr1” to “attr9” are available
to customize change tracking applications. Attributes to control formatting
may be added in the future.
3 Change
tracking info tables
Change tracking info tables appear at the top of
a document and contain detailed information pertaining to the change tracking
markup in the document. Currently two info tables are defined, the user table
and the generic table.
|
Table Name |
Attribute |
Value |
|
atict:info |
tracking |
on | off |
| |
ref |
number |
|
atict:user |
user |
string (login id) |
| |
fullname |
string |
Change tracking info table markup consists of empty
elements in the atict namespace whose gi name is the name of the table, and
whose attribute value pairs come from the table above. Table entries appear
at the beginning of the document. For example:
<?xml version="1.0" encoding="utf-8"?>
<book xmlns:atict="http://www.arbortext.com/namespace/atict"
isbn="0-13-636928-8">
<atict:info tracking="on" ref="0"/>
<atict:user user="eap" fullname="The King"/>
<atict:user user="rdm" fullname="Red D. Mix"/>
<title> ... </title>
...
</book>
Any changes to the top level tag (name or attribute)
will be stored following the change tracking information markup.
4
Change tracking markup
The following information is provided for each specific type of change markup
defined in this document.
• conditions under which the markup is
applied
• exceptions to those conditions
• placement of change tracking markup
relative to changed material
• content of markup, if it paired
• additional atict attributes specific
to this markup, if any
• restrictions on any content (e.g. read-only)
• example markup usage (before and after)
• special notes where needed
The examples will include whitespace for clarity that might not be present
in the ASCII representation of the markup.
ADD
The <atict:add>...</atict:add> is used to
indicate additions to the document. The markup surrounds the material that
has been added. The added material may come from an insert_equation, insert_graphic,
insert_string, insert_table, insert_tag, newline, paste, read, or substitute
command. It may be the result of the insert(), tbl_insert() or insert_tag()
functions. Or it may be the result of typing or of using the insert_tag panel.
The list given here is not intended to be exhaustive.
Note that if the insert_tag command or insert_tag() function is used with
a pending region which is not deleted as a result of the insert then the <atict:addm/>
markup is used instead.
The added material may include elements and CDATA,
but the elements must be well nested. The region within the generic add is
modifiable.
The subtype indicates the type of action which
caused the add. This is the same string that will be used in undo operation
tooltips.
Example: Suppose a document contained two paragraphs,
and a user wanted to add a newpage tag between them. The following markup
would result:
<p>Existing data.</p>
<atict:add user="aed" time="9789765" subtype="markup">
<newpage/>
</atict:add>
<p>Existing data.</p>
The newpage element has been added in this example
by user "aed".
DELETE
The <atict:del>...</atict:del> markup is
used to indicate material that has been deleted from the document. This can
result from the use of the insert() function with a pending region. The delete_character,
and delete_mark commands cause a generic delete. It may also be caused by
an insert_string, insert_tag, paste, or read command with a pending region.
The substitute or translate commands will cause a generic delete and a generic
add. Likewise for certain operations of the spelling checker. The list given
here is not intended to be exhaustive.
Note that a "pending delete", where a region is
highlighted and then a new object is added, is treated as two tracked operations:
a generic delete followed by a generic add.
The <atict:del>...</atict:del> markup surrounds
the material marked for deletion but this material is not actually removed
from the document. The surrounded material may include elements and CDATA
and it is always well balanced. If needed an unbalanced region is subdivided
into balanced components before they are marked up as deleted.
The content of the generic delete <atict:del>...</atict:del> is read
only.
The subtype indicates the type of action which
caused the delete. This is the same string that will be used in undo operation
tooltips.
Example: Suppose this is the original document
content.
<p>Existing data.</p>
<p>This paragraph should be deleted.</p>
<p>Existing data.</p>
After user "aed" deleted the second paragraph the
markup became:
<p>Existing data.</p>
<atict:del user="aed" time="9789767" subtype="region">
<p>This paragraph should be deleted.</p>
</atict:del>
<p>Existing data.</p>
From this point on the common atict attributes
will not be used in the examples for the sake of clarity.
CHANGE
MARKUP
The <atict:chgm>...</atict:chgm> element
is used to indicate that an existing tag has been changed in the document.
The markup indicates that either the attributes changed ("delete_lms", "modify_tag"),
or the tag name changed ("change_tag").
The <atict:chgm>...</atict:chgm> markup appears
after the new start tag of the modified element. The content is the start
and end tag as it appeared before being modified. The content is read only.
Example: The following listing is of the original
material in a document.
<p id="I23">Existing paragraph.</p>
<p>Existing data.</p>
Subsequently, the id attribute was moved from the
first paragraph to the second which was renamed to quote. This was done in
three steps: first a "delete_lms" applied to the first tag, then a "modify_tag"
applied to the second tag, finally the id attribute is added. The result is
the following markup:
<p>
<atict:chgm><p id="I23"></p></atict:chgm>
Existing paragraph.</p>
<quote id="I23">
<atict:chgm><p></p></atict:chgm>
Existing data.</quote>
If the first <p> tag were to be given a new
id value of "I24", the result would be the following markup:
<p id="I24">
<atict:chgm><p></p></atict:chgm>
<atict:chgm><p id="I23"></p></atict:chgm>
Existing paragraph.</p>
<quote id="I23">
<atict:chgm><p></p></atict:chgm>
Existing data.</quote>
Note how the latest change immediately follows
the <p> tag and older changes appear further from the modified tag (reverse
chronological order).
INSERT MARKUP
The <atict:addm/> element is used to indicate
that the preceding start or unpaired tag has been added to the document. The
tag may have been added explicitly ("insert_tag") or implicitly (via the atd
file driven context fix ups, templates, or the parser driven context fixing
code).
After the element’s markup is added to the
document the <atict:addm/> singleton is placed after the element’s
start tag. It is always the last atict markup to follow a tag in the atict
chain for that tag.
The addm tag uses the common atict attributes,
there are no additional attributes.
Example: Suppose the document contains a paragraph
with plain text, such as the following:
<p>Click on the next link.</p>
If the user adds an emphasis tag around the word
"next" while automatic change tracking is in effect, the following markup
will appear:
<p>Click on the <em><atict:addm/>next</em> link.</p>
Note: Adding markup and changing attributes may
be merged into one markup step if desired. The addm markup design supports
attributes on the inserted tag.
DELETE MARKUP
The <atict:delm/> element is used to indicate
that the preceding tag has been deleted from the current document. This is
used to indicate that the tag (and its mate if any) not the element content
has been deleted. This can happen as a result of the acl delete_tag command
and by some internal operations that delete markup.
The markup for the element still remains in the document, however the <atct:delm/>
singleton appears after the element’s start tag. It is always the first
atict markup to follow a tag in the atict chain for that tag since after a
tag is deleted no further changes to that tag are possible.
Example: Suppose the document contains a paragraph with some emphasized text,
such as the following:
<p>Click in the <em>next</em> link.</p>
If the user deletes the emphasis tag while automatic
change tracking is in effect, the following markup will appear:
<p>Click in the <em><atict:delm>next</em> link.</p>
JOIN
The <atict:join1/> and <atict:join2/> singletons
indicate that two adjacent elements of the same type have been joined into
one element holding their combined content. This is done with the acl "join"
command. It may also happen as a result of a delete of an unbalanced selection.
Both elements have a numerical attribute called
“ref” so that they are mutually cross referenced. The attribute
is required and is automatically generated when the markup is created.
The markup for the two joined elements is not modified, however the first
element’s end tag is preceded by <atict:join1/> while the second
element’s start tag is followed by <atict:join2/>. The <atict:join2/>
markup is always the first in the atict chain following a tag.
The region between two joined elements is read only, as is the end tag of
the first element and the start tag of the second element.
Example: Suppose the cursor is placed before the last paragraph in the following
example:
<p>This is the first paragraph. </p>
<p>This is the second. </p>
<p>This is the last paragraph.</p>
When the join command is issued the following markup
will result:
<p>This is the first paragraph. </p>
<p>This is the second. <atict:join1 ref="1"/></p>
<p><atict:join2 ref="1"/>This is the last paragraph.</p>
A subsequent join command with the cursor after
the first paragraph will result in:
<p>This is the first paragraph. <atict:join1 ref="2"/></p>
<p><atict:join2 ref="2"/>This is the second. <atict:join1 ref="1"/></p>
<p><atict:join2 ref="1"/>This is the last paragraph.</p>
The regions between the joined paragraphs are read
only. They are well balanced and considered deleted in the latest version.
Accepting the join causes the intervening data to disappear. If the region
between joined elements is not empty at the time of the join, it will be marked
up as a delete. If the delete is rejected before the join is accepted, the
join will not be acceptable.
SPLIT
The <atict:split1/> and <atict:split2/> singletons
indicate that a single element has been split into two adjacent elements of
the same type. This is done with the acl "split" command. It may also happen
as a result of a “newline” command.
Both elements have a numerical attribute called “ref” so that
they are mutually cross referenced. The attribute is required and is automatically
generated when the markup is created.
To indicate the split the first element’s
end tag is preceded by <atict:split1/> while the second element’s
start tag is followed by <atict:split2/>. The <atict:split2/> markup
is always the last in the atict chain following a tag.
Example: Suppose the cursor is placed at the start of the second sentence.
<item><p>This is the first part. This is the rest.</p></item>
When the split command is issued the following
markup will result:
<item><p>This is the first part. <atict:split1 ref="1"/></p>
<p><atict:split2 ref="1"/>This is the rest.</p></item>
Another split of the item this time results in
the following markup:
<item><p>This is the first part. <atict:split1 ref="1"/></p>
<atict:split1 ref="2"/></item>
<item><atict:split2 ref="2"/>
<p><atict:split2 ref="1"/>This is the rest.</p></item>
5 Markup
Optimizations
There are a number of cases where markup optimization
is posible:
• any operation within an addition
• addition adjacent to an addition of
the same type
• deletion adjacent to a deletion of
the same type
• change markup to the same tag
• add markup followed by delete markup
• split followed by a join
• any operation that results in an empty
add
Optimizations apply only if the new change is being
done by the same user that created the pre-existing change. The assumption
is that consecutive changes of this sort are all parts of one operation. There
are three distinct types of optimizations.
1. When a modification of a document would
lead to redundant nested markup, such as an addition within an addition by
the same user, the redundant markup is removed. This is called merging nested
change tracking markup.
2. Two adjacent changes of the same type can
sometimes be combined into one. This is called merging adjacent change tracking
markup.
3. Two changes applied to the same object
may sometimes cancel each other out, such as when markup is first added and
then deleted. This is called cancelling change tracking markup.
When multiple changes are merged, the subtype of the oldest operation is retained
in the existing change tracking markup. Likewise the time stamp of the oldest
operation is retained.
6 Markup Ambiguity
When a modification of the document can be represented
by more than one type of change tracking markup, the following priorities
will be used to determine the markup to use:
• avoid duplication of content
• minimize the number of markup tags
• prefer add and delete regions over
add and delete markup
These priorities have been established because
they cause a change of behavior when implemented. For example, if content
were duplicated by using add and del markup instead of addm and delm when
possible, the changes to the content done after the markup change would be
lost if the markup change were rejected. By not duplicating the content, changes
to the content can be accepted or rejected independently of the change to
the markup. From the user’s point of view this is a (beneficial) change
of behavior.
Specific markup is not mandated for any remaining
ambiguous situations.
7
Additional notes
The markup discussed here will be used for both the external representation
of a change tracked document (the result of a "write" command) and the internal
representation (docfrags). There is no compelling reason to use different
schemes for these two representations.
The reason that old tags are retained in the document
as tags with all of their old attributes intact is that the view selection
code must work on the document without modifying the docfrag list. The downside
to this is that parsers which simply ignore the namespace elements may get
confused by the remaining document tags when they try to validate the changed
document against its dtd.
In the first release the subtype field will not
be supported.