teic / tei Goto Github PK

The Text Encoding Initiative Guidelines

License: Other

Shell 0.34% XSLT 5.47% Makefile 0.33% Perl 0.04% HTML 92.77% CSS 0.52% XQuery 0.14% TeX 0.10% Python 0.04% JavaScript 0.18% Batchfile 0.03% AMPL 0.03%

tei's People

Stargazers

Watchers

Forkers

jensopetersen dlc-mexico stuartyeates yaninavaleria xindavidlee hernezcc whrl starschamp11b lingsig tomihasa rvdb nczirjak-acdh timamirrockdude polypunkt stefandumont gerritbruening duncdrum joesadmercado somiyagawa paddymcall muranamihdk apresa74 rebecca-palm pdurusau baz-ga chantalpivetta yamasani emmamorlock hidaruma jbampton rclm-project qanu-survey pietroliuzzo rowissmann bwbohl hlshao shuki77 tei-eaj tei-cmc-sig gimena susannalles martinascholger knagasaki kzhr martindholmes raffazizzi giannetti jamescummings lb42 nickprose rafaelortet-cloud rettinghaus carmen-grijalba amoralesy zchaffeevaldes pribas65 patrymigallon mribao carmendesantiago nakagawanatuko humbertoolmospellicer sakuyui dh-miami jenniferisasi kharolusiii ivonneestelamartinez carlesmarmol adrianacampru joloor2 casanso jkrick77 codesandbax hcayless torstenroeder cortoru mehmetkay-sudo adunning ibarguez skurzinz schassan cmm2209

tei's Issues

<index>

The current <index> model of TEI suffers from several
limitations, including:

1. index entries are given as attribute values, so they
cannot contain additional markup
2. there is no support for ranges
3. there is no support for cross-references like “see” or
“see also”

The first limitation is easily fixed by substituting <label>
elements to the “level<n>” attributes.

As to the second limitation, <index> could be changed
from a ‘milestone-like’ element to an element containing
all of the material to be indexed in a subelement, e.g.:

<index id=“index.lemmatization.arabic”>
<indexLabel level=“1”>lemmatization</label>
<indexLabel level=“2”>arabic</label>
<indexContent>The students understand procedures
for Arabic lemmatisation and are beginning to build
parsers.</indexContent>
<index>

(For the “id” attribute on <index>, see below.)

The third limitation could be removed by adding a
pointer child to <index>:

<index>
<indexLabel level=“1”>arabic lemmatization</label>
<indexRef>see <ptr
target=“index.lemmatization.arabic”
type=“index”/></indexRef>
</index>

Alternatively, TEI could simply adopt DocBook’s index
model.

Original comment by: @nolda

<roleDesc> should be allowed as child of <castGroup>

The <castGroup> content model needs to allow <roleDesc>.

The attached PNG is an extract from the cast list of
Margaret
Cavendish�s A_Piece_of_a_Play. In it, two
<castGroup>s each contain two <castItem>s. However,
there is only one role description for each <castGroup>.

Ideally, only one <roleDesc> should be permitted in a
<castGroup>, but it should be permitted either before
or after the series of <castItem>s or <castGroup>s
(which may or may not all have <label>s depending on
whether feature request 1022100 is enacted).

In order to make that clearer, here I have expressed
that idea useing straight RelaxNG compact syntax (i.e.,
not a TEI syntax pattern, as there are no references to
the globally included elements nor the TEI class and
pattern indirection system). This presumes both a
desire for <roleDesc> as described here, and for
<label> as requested in 1022100.

maybe label caststuff pairs
mlcp = (
( label, ( castItem | castGroup ) )+
|
( castItem | castGroup )+
)
element castGroup = {
head?,
(
( roleDesc, mlcp )
|
( mlcp, roleDesc? )
),
trailer?
}

Original comment by: @sydb

Letters/memos module

The elements currently available to encode
correspondence are insufficient and require workarounds
that make the encoding of a simple postscript a major
chore.

The ways in which letters are written vary fairly
significantly over time and across cultures. Modern
business letters require the ability to encode letterhead,
addresses, attention lines, subjects, reference lines,
persons copied, and enclosures (this list come off the
top of my head, if it were developed systematically it
might be quite a bit longer).

I am currently working on a 19th century book of
anecdotes that incorporates letters quite frequently as
part of an ongoing narrative. I would appreciate
elements that would isolate the addressee in an opening
<salute> and that would indicate the sender in the
closing one. I would like a way to indicate the position
held by either of these parties, without getting involved
in the depths of names&dates.

Memoranda are also rather difficult to encode. I used to
work with United Nations human rights documents that
included large amounts of memoranda in the immense
bureaucratic morasses they called reports.

Original comment by: sf_user_finkend

"accept" attribute

In linguistics, examples, glosses, or parts thereof are
often prefixed with symbols like “*”, “?”, “?*”, “(?)”, “#”
etc., specifying the type or degree of (un)acceptability.
(Unfortunately, there is no consensus as to the
inventory and exact interpretation of these symbols.)

These specifications could be easily represented in TEI
by an optional “accept” attribute on <mentioned>,
<gloss>, and <seg>. The value of the attribute should
directly give the symbol instead of some fixed
meta-value like “yes” or “no”.

Original comment by: @nolda

origDate without attributes tei.datable

In 1.2 it says, that <origDate> has no other attributes
than those globally available. In the formal
description there’s tei.datable listed.

Besides this: when msHeading was removed, where there
any suggestions where to put the information of
origDate instead? As phrase-level element it can be
used everywhere but where do I find (and put) the
information about “the whole manuscript”, the way many
catalogues provide the information? (And German
catalogues HAVE TO!)

Original comment by: @schassan

Two new elements for bibliograhic entries

The group that has been working on bibliographic
entries for TEI
makes the following proposal. Please see the attached
text file.

Original comment by: sf_user_paultremblay

msDesc: watermarks missing in formal definitions

Hi,

concerning the documentation of the mansucript
description element:
the element watermarks is missing both in 1.6.1
Support, list of subcategories, although mentioned in
example 1.65, and in the formal definition of support.

Greetings, Torsten Schassan

Original comment by: @schassan

New <figure> element and figContent class

The current <figure> element has a content model which
permits all sorts of textual things within it. P4 makes
clear that the intention is that these should be
transcribed from the image whose presence the <figure>
denotes. The image itself is indicated (in pure P4) by an
entity attribute pointing to an external graphic entity, or
(in practice) by a URL attribute. There is no wrapper
element combining a graphic with its heading etc. There
is no way of embedding graphic content expressed using
SVG or even the TEI tags for trees within a <figure>,
though there is a <figDesc> “surrogate” for the graphic
information.

I propose to address these concerns as follows:

1. Introduce a new core element called <graphic> with a
URL attribute (or whatever the TEI Workgroup on
Standoff finally decides it should be called).

2. Introduce a new wrapper element to hold text
content of a graphic, called <figText>, with content
model mle the current <figure> content model.

3. Introduce a new class tei.figContent, with members
<graphic>, <eTree>, <eg> (or whatever we decide to
call that thing), <figText> ).

4. Redefine the content model of <figure> to be
(%(tei.figContent), figDesc?, head?)

What I don’t know is how to add an SVG element into
the mix. Ideally, I would like to add <svg:something> to
the tei.figContent class. Suggestions?

Original comment by: @lb42

<publisher> children

According to P4, <publisher> “provides the name of the
organization responsible for the publication or
distribution of a bibliographic item” and <pubPlace>
“contains the name of the place where a bibliographic
item was published”.

Consider the following <biblStruct> example, which
includes two pairs of publishers and publication places:

<biblStruct>
<monogr>
<editor>
<persName>
<forename>Johan</forename>
<forename>F.</forename>
<forename>A.</forename>
<forename>K.</forename>
<nameLink>van</nameLink>
<surname>Benthem</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Alice</forename>
<nameLink>ter</nameLink>
<surname>Meulen</surname>
</persName>
</editor>
<title lang=“eng” level=“m”>Handbook of Logic and
Language</title>
<imprint>
<pubPlace>Amsterdam</pubPlace>
<publisher>Elsevier</publisher>
<pubPlace>Cambridge, Mass.</pubPlace>
<publisher>MIT Press</publisher>
<date>1997</date>
</imprint>
</monogr>
</biblStruct>

As the example shows, the <pubPlace>-<publisher>
pairs cannot be explicitly stated.

As a consequence, bibliographic stylesheets have to
rely on some document order convention in order to
determine which <pubPlace> applies to which
<publisher>:

Benthem, Johan F. A. K. van and Alice ter Meulen
(eds.) (1997). Handbook of Logic and Language.
Amsterdam: Elsevier and Cambridge, Mass.: MIT Press.

The xml-biblio group (cf. the archives of the
[email protected] mailing list)
proposes to reformulate the above-mentioned
meta-language definition of <publisher> in a way that
covers the following alternative markup of the example:

<biblStruct>
<monogr>
<editor>
<persName>
<forename>Johan</forename>
<forename>F.</forename>
<forename>A.</forename>
<forename>K.</forename>
<nameLink>van</nameLink>
<surname>Benthem</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Alice</forename>
<nameLink>ter</nameLink>
<surname>Meulen</surname>
</persName>
</editor>
<title lang=“eng” level=“m”>Handbook of Logic and
Language</title>
<imprint>
<publisher>
<placeName>Amsterdam</placeName>
<orgName>Elsevier</orgName>
</publisher>
<publisher>
<placeName>Cambridge, Mass.</placeName>
<orgName>MIT Press</orgName>
</publisher>
<date>1997</date>
</imprint>
</monogr>
</biblStruct>

(P4’s content model for <publisher> already allows for
<placeName> and <orgName> children.)

As an alternative, P5 could introduce a wrapper element
for <pubPlace> and <publisher>, e.g. “<publication>”.

Original comment by: @nolda

<theorem>

Please add a general purpose element for theorems,
definitions, and similar displayed text blocks.

This element, say <theorem>, should have the following
content model:

(head?, p+)

Its attributes include besides global ones:

\* “type” (e.g., “theorem” or “definition”)

“typeN” (the theorem number, as in “Definition 3”)

The “n” attribute of <theorem>, however, should be
reserved for specifying running numbers of displayed
blocks in general.

Consider the following example:

<theorem n=“12” type=“definition” typeN=“3”>
<head>Multiplication</head>
<p>…</p>
</theorem>

which could be rendered as:

(12) Definition 3 (Multiplication)
…

Original comment by: @nolda

metadata element for header

I would like to suggest adding a metadata element to
the header (maybe under profileDesc?) This element
would have a series of keys and values; e.g.,

<profileDesc>
<metadata>
<metadataItem>
<key>Provenance</key>
<value>Oxyrhynchus</value>
</metadataItem>
<metadataItem>
<key>Location</key>
<value>Sackler Library</value>
</metadataItem>
</metadata>
</profileDesc>

This would allow people to use their inhouse scheme,
which may not always be compatible with what TEI
already has, or extensions like MASTER.

I attach a file (a work in progress) which is an
attempt to encode a Graeco-Roman papyrus manuscript.
You will see at the bottom of the file my <div
type=“metadata”> hack to include metadata that I need
for the digital library system I am using (Greenstone).
It seems to me that this data should be in the header.
I did try using the existing conventions, but gave up.

Tim Finney

Original comment by: @tfinney

default value of status= of <teiHeader>

The status= attribute of the <teiHeader> element
currently defaults to “new”. It should not have any
default value.

Most encoders do not explicitly specify a status= of
<teiHeader>, as it is not required, and for many people
isn’t useful. Then sometime later the encoder comes
along and updates the header, without even thinking of
status=. Now, because the default for status= is “new”,
an XML parser will report that the header has not been
modified, when in fact it has. This could easily be
avoided by not having a default value for status=.
(Note that status=, about which the Guidelines say very
little, would appear not to be needed when
date.created= and date.updated= are used.)

Original comment by: @sydb

allow <ptr> inside <biblStruct>

I propose that <ptr> be allowed in the <biblStruct>,
<analytic>, and <monogr> elements.

It is often necessary to point to other elements when
constructing bibliographic entries within a
<biblStruct> element. For example,
if you are trying to describe a review, you need to
point to the work being reviewed:

<!—the article of the review—>
<biblStruct>
<analytic>
<author>
…
</author>
<title>This book is work reading</title>
<!—point to book being reviewed with its
authors—>
<!—not valid TEI!—>
<ptr target=“book-reviewed”/>
</analytic>
</biblStruct>

<!—the book being reviewed—>
<biblStruct id=“book-reviewed”>
<monogr>
…
</monogr>
</biblStruct>

Right now, it is impossible to use either <ptr> or
<ref> inside the <analytic>, <monogr>, or <biblStruct>
elements. This makes pointing
impossible and unnecessarily restricts the scope of
<biblStruct>.

Original comment by: sf_user_paultremblay

Permit <email> in <address>

A new <email> element should be permitted as a child of
<address>.

Original comment by: @sydb

<figBody>

Please add a <figBody> child to <figure>, which in turn
can have <tree>, <eTree> or similar elements as
children.

Original comment by: @nolda

What's this all about

Where do I put edw77 then?

Original comment by: @lb42

<castList>s need <label>s, too

If the “label-item pair” model of a list is retained,
then <castList>s must be allowed to have <label>s, too.
For that matter, <castItem>s and <castGroup>s should be
allowed to have <label>s, as well.

Original comment by: @sydb

A place for out-of-line elements

There exist content objects that I would like to have
in my TEI file that are neither part of the original
work being encoded per se, nor are really part of the
“metadata” that is the <teiHeader>, either.

Examples include:
- the <castList> of a drama if no cast list appeared in
the source;
- the keyed list of names that occur in the document¹
- <timeline> elements
- <linkGrp> or <joinGrp> elements
- <note> elements (other than those in the <notesStmt>)
that
one chooses not to encode in-line or in-place.²

These elements belong inside the main <TEI.2> element,
but do not belong inside the <teiHeader> or the
<front>, <body>, or <back> elements. It is arguable
whether they belong inside the <text> element or outside.

This suggestion is for an element (called <hyperDiv>;
the name <ldb> has also been suggsted, for “link data
block”; alternatively, it might just be an <ab>,
although that would make appropriate constraints
difficult or impossible) which would occur as an
optional single child of <text> before <front>, whose
express purpose would be to hold this sort of thing.

Notes
-—
¹ Were a project to decide to make key= of <name> an
IDREF
or XPointer, so that it could point to a database
stored
in the same instance.
² By “in-line” I mean the standard OHCO method of
encoding
<note>s at their anchor point in the text. By
“in-place”
I mean encoding a note where it appeared on the source
page.

Original comment by: @sydb

On msItem Content Model for manuscript description module

From our experiences working with S:Laurentius Digital
Manuscript Library, we would like to see a change in
the msItem content model such that it more reflects the
intellectual content of an object rather than its
physical structure. The means of achieving this would
be to make <locus>…</locus> a repeatable element.

We have in our collection at least one manuscript which
is suct that a given content is split into multiple
page ranges (with other content occuring between these).

For further discussions, you may contact
[email protected]

Original comment by: sf_user_*anonymous

measure should be typed?

I would would like to suggest that the measure element be added
to the typed class so that it has both type and subtype attributes.

The additional subtype attirbute will be useful in dealing with
currency and probably other measurements as well.

For example:

<p>This book costs <measure type=“currency” subtype=“USD”
reg=“$8.00”>eight dollars</measure>.</p>

In the above example, the currency symbol “$” in the reg attribute
is not a precise indicator of the type of currency, since it is used
for both Candian and US dollars.

I think this is a general enough case that it might merit a change in
P5.

John

Original comment by: @johnwalsh

Add namespace information to <tagUsage>

The <tagsDecl> element in the header is used to record
the usage of XML elements present in a document. With
the advent of multi-namespaced documents in TEI P5, it
will be necessary to distinguish element names by
namespace.

Proposal: either
(a) add a ns attribute to <tagsDecl> the value of which
is a full name space (not a prefix). Default is
http://www.tei-c.org/ns/1.0
This requires that <tagsDecl> be made repeatable,
which makes it possible to get things wrong.
or
(b) add a new <nameSpace> element with attribute
NAME, as child of <tagsDecl> and parent of <tagUsage>

On balance, (b) seems preferable. Existing documents
could be accomodated unchanged if we added a rule
that says any <tagUsage> not wrapped in a
<nameSpace> is assumed ipso facto to be in the TEI
namespace.

Original comment by: @lb42

<tagUsage> should not be all or nothing

Currently P4 says “A <tagsDecl> � must � contain
exactly one occurrence of a <tagUsage> element for each
distinct element marked within the outermost <text>
element”. This restriction (that there be exactly n
<tagUsage> elements) is unhelpful. There would be
nothing wrong with specifying <tagUsage gi=“castList”
occurs=“0”/> in a drama that did not have a cast list,
or <tagUsage gi=“emph” occurs=“0”
render=“rend.italic”/> in a document in a project with
many files that do have italicized words encoded as
<emph>, where there happen to be none in this file.
Furthermore, there is no reason to insist that a
project list all the element types used in <text> just
to use the <tagsDecl> in order to specify the default
rendition of a single element.

Thus this restriction should be softened to “contain at
most one occurence of a <tagUsage> element for each
disticnt �”

Original comment by: @sydb

inline, displayed, and floating elements

Please adopt a policy for specifying whether some
element is to be rendered

\* inline

as a displayed block without a running number
as a displayed block with a running number like “(12)”
or as a floating block

These specifications are crucial for authors and cannot
be left to stylesheets.

I would like to propose a policy along the following lines.

For each phrase-level or block element a default
rendition is defined:

\* inline (e.g., for <formula> or <mentioned>)

displayed without running number (e.g. for a putative
<listDisplayed> element; see below)
floating (e.g., for <figure> or <table>)

Authors can switch to a different rendition by specifying
one of the following “rend” values:

\* “inline”

“displayed”
“floating”

Running numbers for displayed elements are given as an
“n” value, with the special value “generated” for
numbers which are to be automatically determined by
the processing stylesheet.

So, for example:

\* <formula> is rendered inline

<formula rend=“displayed”> is rendered as an
unnumbered displayed block
<formula rend=“displayed” n=“12”> and <formula
rend=“displayed” n=“generated”> are rendered as
numbered displayed blocks
<formula rend=“floating”> is rendered as a floating
block

At least in linguistic documents, there can be numbered
displayed blocks on different ‘display levels’, e.g.:

(12) a. …
b. i. …
ii. …

For configurations of this type, a <listDisplayed>
element could be defined, whose children are
automatically rendered as numbered displayed blocks on
a subordinate level:

Original comment by: @nolda

msDesc: layout attributes elsewhere

Hi again,

it would be useful to allow the use of the attributes
of layout (columns, ruledLines and writtenLines) in
subdivisions of layout (p?, span?).

Thus it would be possible to specify more detailed
where to a certain layout aspect applies in the codex.

Right now I have to encode:

<layout columns=“2-3”>
<p>1r-v zweispaltig, 2r-10v dreispaltig.</p>
</layout>

It would be possible to write this:

<layout columns=“2-3”>
<span columns=“2”>1r-v zweispaltig</span>
<span columns=“3”>2r-10v dreispaltig</span>
</layout>

An even more complicated example: (the short form
without locus and dimensions)

<layout writtenLines=“35-43”>Schriftraum:
1r-202v: 19-19,5 × 10 cm, 35-37 Zeilen;
205r-398v: 20,5-21,5 × 10-12,5cm, 38-43 Zeilen.
</layout>

Here, it would be useful to be able to encode:

<layout>Schriftraum:
<p writtenLines=“35-37”>1r-202v: 19-19,5 × 10 cm,
35-37 Zeilen;</p>
<p writtenLines=“38-43”>205r-398v: 20,5-21,5 x
10-12,5cm, 38-43 Zeilen.</p>
</layout>

Additionally it would be helpful to have seperate
elements as well for the description of the written
space as for the way of ruling.

Greetings, Torsten Schassan

Original comment by: @schassan

No rend= on header elements

The rend= attribute exists to record the rendition of
the source. Since the elements in <teiHeader> are
metadata elements created by the encoder, rend= makes
no sense (except for on <rendition>, see 1022072). Thus
rend= should not be permitted on elements that do not
occur as a descendant of <text>, but rather only as a
descendant of <teiHeader>, e.g. <encodingDesc>.

Original comment by: @sydb

Proposal for file tree layout of future TEI releases

With P5, the TEI is committed to a completely new
release of the Guidelines, including Schemas,
Documentation, Stylesheets and other files, while at
the same time also continuing to maintain the current
release tree of P4. To faciliate the task of managing,
releasing, explaining and using this complex system, I
would like to see a well thought out and documented
layout of the TEI subtree on the file system.

As a model for this, I would like to point to a
document from the Debian GNU/Linux distribution, which
details how XML/SGML applications are to be handled in
the distribution, which is available at
http://debian-xml-sgml.alioth.debian.org/xml-policy/xml-dir-layout-file-placement.html

The gist of this would be something like the outline
below, although the details will of course have to be
thought out more clearly.

(…/xml/)/tei/
custom/
myTEI/
schema (customized versions from
Roma etc. go
here)
stylesheet
doc/ (guidelines go here,
maybe also
versioned)
misc/
schema/
dtd/
P4
P5
rnc/
P4
P5
(and other schemas)
stylesheet/
xsl/
P4
P5

Original comment by: @cwittern

<eTree> etc.

Please replace the “label” attribute of <eTree>,
<eLeaf>, and <triangle> by <label> (or <eLabel>)
elements, thereby allowing for specifying labels
containing markup.

Original comment by: @nolda

contents of msItem less restricted

Hi,

I think it is not very useful and against (catalogue)
writing habits to allow contents of msItem only in such
a restricted way: one might for example want to
describe smaller parts of a text with nested msItems
and afterwards make some notes on the text. Or: isn’t
it reasonable to provide filiation of a text right
after its title? Quotes might go everywhere.

Additionally: There might be more than one rubric,
incipit or explicit that the cataloguer wants to
provide, e.g. for the text and for glosses and I don’t
think that this information should be provided using
another msItem.

What I propose is something like this:

Original comment by: @schassan

Create tei.glossy class

Section 6.3.4 of TEI P4 introduces three elements which
don’t seem to have much in common, except that they
are often typographically distinguished in running text.
They are <term>, <gloss>, and <mentioned>.

Leaving aside the last of these, which I think really
ought to be discussed along with its oft-confused friend
<soCalled>, I would like to propose (for P5) a more
rational way of grouping together <term> and <gloss>
and a small number of other similar phrase level
elements. My proposal is to establish a new club for
them and a few select others, tentatively called the
tei.glossy class. The core module would populate this
class with <term>, <gloss> and the following other
elements:

<desc> - currently defined in both the new “gaiji”
module which replaces the old WSD, and the new tagdoc
module which replaces the old TSD
<equiv> and <altIdent> - also defined in the new
tagdoc module
<trans> — defined in the dictionary module

Making these elements all available in the core would
(a) make life a lot easier when trying to build schemas -
you wouldn’t have to load the dictionary module just to
record that a phrase is a translation rather than original
(b) reduce the clutter of near synonyms in the
Guidelines - you wouldn’t be tempted to make up your
own “translated” element

Putting them into a class would
(a) make clearer the conceptual structure of the
Guidelines
(b) enable you to add your own near synonym if you
really want to!

As this is a proposal for P5, I’m posting this as a source-
forge feature request as well as to TEI-L. Feel free to
send your comments to whichever forum you feel more
comfortable with…

Lou

Original comment by: @lb42

<mentioned>

Please add an optional “type” attribute to <mentioned>,
specifying the ontological or linguistic type of the
mentioned entity.

Sample value could include:

\* term (typically rendered in quotes)

symbol (typically rendered as is or in quotes, too)
graphs (typically rendered in italics)
graphemes (typically rendered as “<…>”)
phones (typically rendered as “[…]”)
phonemes (typically rendered as “/…/”)

Original comment by: @nolda

<extent>

According to the examples in P4, § 6.10, <extent> can
be used for specifying the number of pages of a
bibliographic item like a book. Another reasonable usage
of <extent> would be, for instance, the specification of
the total number of volumes of a multi-volume book.

As <extent> does not have a “type” attribute, measure
strings like “pp.” oder “vols” have to be included into its
content. This is unfortunate in cases where
bibliographic data shall be stored in a language- and
style-neutral way.

In addition, <biblScope type=“pages”> cannot simply be
substituted for <extent type=“pages”> because of their
distinct semantics. <biblScope> defines the
‘scope’—some part—of a bibliographic item (say, a
collection) with respect to some subitem (e.g., an
article). <extent>, on the other hand, measures the
whole (e.g., the collection).

So the xml-biblio group (cf. the archives of the
[email protected] mailing list)
proposes to add a “type” attribute to <extent> with
suggested values similar to those of like <biblScope>’s
“type” attribute.

Here is an example, using P4’s <biblStruct> model:

<biblStruct>
<analytic>
<author>
<persName>
<forename>Edward</forename>
<forename>L.</forename>
<surname>Keenan</surname>
</persName>
</author>
<author>
<persName>
<forename>Dag</forename>
<surname>Westerståhl</surname>
</persName>
</author>
<title lang=“eng” level=“a”>Generalized Quantifiers in
Linguistics and Logic</title>
</analytic>
<monogr>
<title lang=“eng” level=“m”>Handbook of Logic and
Language</title>
<editor>
<persName>
<forename>Johan</forename>
<forename>F.</forename>
<forename>A.</forename>
<forename>K.</forename>
<nameLink>van</nameLink>
<surname>Benthem</surname>
</persName>
</editor>
<editor>
<persName>
<forename>Alice</forename>
<nameLink>ter</nameLink>
<surname>Meulen</surname>
</persName>
</editor>
<imprint>
<pubPlace>Amsterdam</pubPlace>
<publisher>Elsevier</publisher>
<pubPlace>Cambridge, Mass.</pubPlace>
<publisher>MIT Press</publisher>
<date>1997</date>
</imprint>
<extent type=“pages”>1247</extent>
<biblScope
type=“pages”>837–893</biblScope>
</monogr>
</biblStruct>

Original comment by: @nolda

type= of <add>, <del>, <addSpan>, <delSpan>

Tone Merete Bruvik points out that <addSpan> has
a type=, but <add> does not. So I poked around a
bit and found the following in P4:2004.

\* <add> does not have type=.

\* <addSpan> does, but it is only listed in
the reference section, not in chapter 18.¹

\* <del> does; a note says that type= “should
not be used to record the manner in which
the deletion is signalled in the source.
This should be recorded using the global
rend= attribute, with values such as
�subpunction�, �overstrike�, �erasure�,
�bracketed�.”

\* <delSpan> does, but the ‘sample values
include’ has entries about the manner in
which the deletion is signalled in the
source.

This needs to be straightened out. I think that
all 4 should bear a type= attribute (listed both
in the prose and the reference section) and the
<note> that is included with <del> should
replace the “sample values include” list of
<delSpan>, although I am certainly open to
other possibilites. (In particular this would
perhaps be considered by some to be an odd
semantic for rend=, as it is being specified on
an empty element, but is supposed to apply to
all the text between this empty element and the
one pointed to by its to= attribute.)

Notes
-—
¹ I.e., the <tagDesc tagDoc=“ADDSPAN”> in file
//TEI/P4/Odds/p2ph.odd does not list “del”
in the value of its atts= attribute.

Original comment by: @sydb

rendition in rend= of <render>, not content

In the <tagsDecl>, the default rendition should be
formally specified
on the rend= attribute of <rendition>; the content of
<rendition> should be a prose description for humans.
(This allows systems where there is a rend= value that
says “take that element’s value and use it” to work
nicely; besides, it makes more sense.)

Original comment by: @sydb

<locus> not available in <p>?

I have looked up the prerelease of the chapter on Manuscript
Description for P5 (at http://www.tei-c.org/Activities/MS/FASC-
ms.pdf), and in 1.2.3 “References to manuacript locations”,
Example 1.10 indicates that the element ‘locus’ should be available
in the elements ‘p’, put when I am testing this with P5, it looks like
that is not the case. Is the error in the chapter or in the P5?

Original comment by: sf_user_*anonymous

msPart examples need update

Hi,

the examples for msPart need an update in the
documentation. Both 1.135 and 1.136 are slightly wrong:

1.135: msPart has altIdentifier as mandatory first child.
The following text mentions idno which has been
included in altIdentifier.
1.136 still has only idno as identifier although it
should be child of altIdentifier.

Original comment by: @schassan

Expand <dateline> content model?

I’m wondering why, in P4 and still in P5, the
<dateline> element is not more inclusive. As defined
(compact Relax NG):

it doesn’t allow for optional name/place elements like
persName and placeName. Is there a reason not to
define its content as macro.phraseSeq, which would
allow for those (and others)?

Original comment by: @DavidSewell

phrase-level element stamp

In analogy to the new phrase-level element “watermark”
I propose to introduce a new element “stamp” which
should contain information about stamps, either such in
ink giving owner marks or blind stamps on the binding etc.

Alternative: an element stamp as child to binding and
decoNote.

Reason: A lot of work has been done in Germany to
identify and categorize any stamp that could be found
on bindings etc. There exists an online database
showing the results. (http://www.hist-einband.de\) There
should be a possibility to refer to the contents of the
DB from a manuscript description which means that there
should be an element containing the relevant information.

Definition:

stamp = element stamp { stamp.content, stamp.attributes }
stamp.content = text
stamp.attributes =
tei.global.attributes,
tei.datable.attributes,
stamp.attributes.type,
stamp.attributes.subtype,
[ a:defaultValue = “stamp” ] attribute TEIform { text }?

stamp.attributes.type = attribute type {
stamp.attributes.type.content }?
stamp.attributes.type.content = datatype.Key

tei.datable |= stamp

tei.phrase |= stamp

Original comment by: @schassan

Manuscript encoding

Henrik Ibsen’s Writings has made several changes and
additions to the TEI DTDs regarding the encoding of
manuscript changes and manuscript phenomena. In the
following we wish to present some of the modifications
and encourage the inclusion of these themes in the P5
revision discussion.

\- <clarification>

We’ve created a new element to record the clarification
phenomenon in manuscripts. Ibsen and his copyists some
times clarify words or letters either writing upon the
already written word/letters or by repeating the
word/letters offline. We encode these instances of
repeating the same for the purpose of clarification, like
this:

<clarification hand=“HI”
place=“inline”>Henrik</clarification>
<clarification hand=“HI”
place=“offline”>Henrik</clarification>

We believe this is a well known phenomenon for
manuscript transcribers, and we think it would be a
useful addition in a revised chapter on “Transcription of
Primary Sources”.

\- TEI elements for manuscript changes

We’ve made it possible to include almost all kinds of
elements in <app> (i.e. in <lem>/<rdg> of course),
<add>, <del> etc. and to use <app>, <add>, <del>,
<gap/> etc. almost globally in our manuscript
transcriptions in order to record manuscript changes in a
way that reflects our view of the changes. E.g. we allow
<div> inside <add> and <del> to make possible the
inclusion or deletion of a complete scene, thus reflecting
the change in the document structure more clearly.

\- The hand attribute

We have allowed the hand attribute in <app>, <hi> and
<emph>. Including hand in <app> makes it unnecessary
to use hand both in <add> and <del> when another
hand has revised the text of the manuscript. Using the
hand attribute in <hi> (and similarly in <emph>) allows
us to record e.g. the red pencil underlining throughout a
manuscript otherwise written in black ink.

\- A new element for substitutions/revision of the <app>
element

The last years there have been discussions on the TEI-L
and elsewhere of the need to modify and revise the
<app> element to make it more useful for manuscript
changes. Several alternatives have been discussed. At
Henrik Ibsen’s Writings we have also discussed this, and
although we have managed to use the excisting <app>
element for the manuscript changes in our material
(though with some modifications of what <lem> and
<rdg> can contain and the inclusion of some attributes),
we would like to encourage the expansion of this part of
the chapter on “Transcription of Primary Sources”, at
least an element for encoding substitutions should be
included. We dislike the double role of the <app>
structure, and would feel better having one structure for
manuscript changes and one for the critical apparatus.
Our use of the <app> element for manuscript changes
has resulted in us constructing a new element, <tcApp>,
for our critical apparatus and text critical notes.

Please contact Hilde Bře ([email protected]) at
Henrik Ibsen’s Writings, if you have any questions or
remarks.

Original comment by: sf_user_hildeboe

simplify encoding of chronological phrases

TEI P4 has four elements, <date>, <dateRange>, <time>,
and <timeRange> for encoding and normalizing text that
describes a point or period in time. (Not counting
<timeline> and <when> which are special purpose
elements for establishing synchronous points.)
The only difference between a date and a time is the
level of precision. The quick description of the
difference between a <date> or <time> and a <dateRange>
or <timeRange> is that the *Range describes a period
greater than the level of precision used.
Furthermore, in its discussion of attributes for these
elements, P4 conflates accuracy and precision (and
also, IIRC, confidence in the accuracy :-), and does
not address whether ranges are inclusive or exclusive.
Thus I am suggesting that this mix of elements and
attributes need some attention for P5. Some first
suggestions follow.

Since it is easy to indicate a range with the
international standard representation of dates and
times (ISO 8601:2000), the *Rnage elements are
unnecessary, and should be dropped from P5. The
following example (from P4 6.4.4, source is Virginia
Woolf’s Mrs._Dalloway) demonstrates an encoding of a
range without <dateRange>.
| Those five years —
| <date value=“1918/1923”>1918 to 1923</date>
| — had been, he suspected,
| somehow very important.
The Guidelines should simply state that the range
specified on value= is inclusive. E.g.
| <date value=“1067/1776-07-03”>After 1066 but before
| American independance</date>
(Which, of course, could also be encoded
| After <date value=“1066”>1066</date> but before
| <date value=“1776-07-04”>American indenpendance</date>
with the same accuracy and precision)
| <date value=“1869-10-02/1948-01-30”>during the life
| of the Mahatma</date>

The exact= attribute of <*Range> should become the
accuracy= attribute of <date> and <time>. The precision
is indicated by the precision of value=.

Since a date and time indicate the same thing (albeit
with varying precision) and the normalized
representation (ISO 8601) can include both, the
Guidelines should explicitly state that <time> and
<date> are technically interchangeable.

The Guidelines should be explicit about whether a “T”
is to be specified between the date and time fields of
an ISO 8601 value=. (I.e., whether the contents of
value= is an ISO 8601 format date followed by
whitespace followed by an ISO 8601 time (e.g.
“2004-09-03 15:24Z”) or an ISO 8601 time and date (e.g.
“2004-09-03T15:24Z”). I prefer the latter myself.

The Guidelines should explicitly prohibit the notation
“24:00” to represent midnight in the value of value=.
(This notation is premitted by ISO 8601, one of the
few indications that it was written by committee :-)

We can imagine two different uses of the value=
attribute of <date> (or <time>, I suppose):
1. regularize the content of <date> into a format which
can easily be searched, preferably one that can easily
be parsed and searched
2. normalize the content of <date> to a date along an
agreed upon timeline (aka calander system)
It might make sense, then, to separate these into two
separate attributes, as one may reasonably want
different values for these purposes. For example, one
might like to regularize the format of the Julian
dates in early modern printing, but may well rather not
be bothered trying to figure out what the Gregorian or
proleptic Gregorian (i.e., normalized) value would be.
| <docDate norm=“1548-04-07”
reg=“1548-03-28”>The.xxviii.day | of <name>Marche</name>
| <lb/>the yere of our lorde.
| <lb/>M.D.XLVIII.</docDate>

Original comment by: @sydb

<respStmt> needs to be more restrictive.

The <respStmt> element in P4 is far too permissive. At
a minimum it should be
element respStmt {
attlist.respStmt ,
tei.Incl*,
( name, tei.Incl* ),
(
( resp, tei.Incl* )+
|
( ( name, tei.Incl* )+, ( resp, tei.Incl* ) )
)
}
although perhaps even the tei.Incl class is too
permissive here. See attached paper for detailed
discussion.

Original comment by: @sydb

class attribute in <biblStruct>

I propose adding a “class” attribute to elements within
the <biblStruct> element as well as to the <biblStruct>
element itself. This attribute would be of type IDREFS,
and would point to an ID in a <taxonomy> element.

It is often necessary to classify works when creating a
bibliographic entry. For example, it is necessary to
know that an article appears in a magazine:

<biblStruct>
<analytic>
<title>article title</title>
</analytic>
<monogr>
<!—need to indicat a magazine rather than a
journal—>
<title level=“j”>Title of magazine</title>
…
</monogr>
</biblStruct>

TEI does not have elements necessary to classify works,
such as <genre> or <container-type>.

A way to classify works is by using the <taxonomy> in
the header. The <taxonomy> is meant to “classify
texts,” so it seems proper to use it to classify
bibliogrphic entries. In order to be able to point to
the actual <taxonomy> element, one would need an
attribute. I propose that “class” be this attribute.
The above entry would then look like this:

<!—in header—>
<taxonomy>
<categoy id=“magazine”>
<catDesc>magazine</catDesc>
</categoy>
</taxonomy>

…

<biblStruct>
<analytic>
<title>article title</title>
</analytic>
<monogr class=“magazine”>
<title level=“j”>Title of magazine</title>
…
</monogr>
</biblStruct>

Original comment by: sf_user_paultremblay

Metrical encoding in verse

\- <lg>

At Henrik Ibsen’s Writings we perform detailed metrical
encoding in all verse texts. Regarding the many verse
dramas, one of the main goals is to mark up the main
verse structures clearly, i.e. the starting and ending
points of the different meters occuring in the text. As
<lg> is defined in the TEI DTD it seems to be related to
poems only, not to verse dramas. The element may
contain verse lines, headings, closers and so on, but not
dramatic elements like speeches and stages. To avoid
heavily fragmenting and linking or milestones, we have
decided to modify our dtd to allow <sp>, <stage> and
<div> inside <lg>. This makes <lg> more parallel to the
<div> element, and we use the <lg> element to mark up
verse structures and the <div> element to mark up the
drama structures (acts and scenes). We would suggest
a similar change to the TEI DTD.

\- additional attributes for metrical analysis

The attributes for metrical analysis in TEI P4 are the
met, real and rhyme attributes. These are intended for
metrical structure, deviation from the metrical structure
and rhyme scheme respectively. An attribute for
deviation from the rhyme scheme is not included in the
TEI DTD. We have therefore split the real attribute in
several categories: realMet (for deviation from metrical
structure) and realRhyme (for deviation from rhyme
scheme). In addition we have attributes for notation of
anacrusis and deviations attached to these, respectively
the an and the realAn attribute. These attributes may
have the values “single”, “double” and “no”. We would
suggest these attributes to be included in TEI P5.

Please contact Stine Brenna Taugbřl
([email protected]) at Henrik Ibsen’s Writings, if
you have any questions or remarks.

Original comment by: sf_user_hildeboe

layoutDesc and layoutNote?

Wouldn’t it be consistent with naming scheme the rest
of the document to have layoutNote as children of
layoutDesc?

Original comment by: @schassan

<interp>

Please make <interp> non-empty, thereby allowing for
specifying ‘values’ containing additional markup.

Original comment by: @nolda

Prosopography module

At present, the elements you need to record information
about real people in a P5 document are scattered across
several modules.

The Corpora module defines <partic> and <particDesc>
for participants in a transcribed text. These allow for
any number of so-called “demographic” subelements,
such as <birth>, <occupation>, <residence> etc.

The Names and Dates module defines <persName> and
various subcomponents for names of people, but
resolutely eschews any attempt to describe reality: it’s
onomastic, rather than prosopographic, by design.

The new Manuscript Description module defines a
<listPerson> elementand a <person> element to fill it.
This contains some of the same elements as <partic>
but also adds some. For example, <birth> is in the
corpora module, but <death> is in the MS one. (Not
really surprising, since the people manuscript describers
are interested in are usually dead, whereas the people
corpus encoders are concerned with usually aren’t)

We could resolve this confusion by creating a new
standalone module, or by groiuping all related elements
in one place. More precisely:

(a) we could define an entirely new module concerned
with prosopographic information, defining <listPerson>
and <person> and their children, and remove <partic>
and <particDesc> from the corpora module

(b) we could do essentially the same but add the new
elements to the core module

Original comment by: @lb42

<note> needs to be permitted anywhere

The placement of <note> elements seems to be too
restrictive. E.g., you can place a <note> as a child of
<head>, but not in as a child of <opener>, <closer>, or
<dateline>; you can place a <note> as a child of
<body>, but not of <front> or <back>. Since a <note>
may well be used to comment on the encoding of a
document, rather than the textual features of the
document being encoded, <note> should be permitted just
about anywhere. The same is true for <anchor>, since
one may want to put only the <anchor> at the spot of
interest, and the <note> elsewhere.

Original comment by: @sydb

Need for tag documentation

I am familiar with TEI lite.

When the standard is converted to a Schema, I strongly
recommend that you provide a very detailed tag library.
Schemas are very difficult to read and interpret. A good
model for documentation would be the EAD tag library at
http://www.loc.gov/ead/tglib/index.html.

Each tag is defined, and possible parent and child tags
for the given tag are listed. Examples are also provided.

The EAD Tag Library also has a good explanation of
linking tags and attributes.

The DTD for P4 was relatively easy to search and
interpret.

The Schema and DTD for P5 are much more difficult.

I also would suggest that you provide detailed
information about what schema validators will work,
provide detailed information on how to install and
configure them and provide information on where they
can be obtained.

Thanks

Original comment by: sf_user_*anonymous

If transcribe, then <pb/> should allow wit

In order to create a collated edition, the pagebreaks in various
manuscripts need to be noted. For this reason, the <pb/> tag
(which has no span) should accumulate the possibility of having the
wit (witness) tag associated with it when the transcription module
is included, as follows.

<witlist>
<witness name=“C1”>ms C1</witness>
<witness name=“C2”>ms C2</witness>
<witness name=“W1”>ms W1</witness>
<witness name=“Sam”>1867 printed edition</witness>
</witlist>

Thanks!

Will Tuladhar-Douglas
[email protected]

Original comment by: sf_user_*anonymous

mcDescription: head or p?

In the current draft chapter on the manuscript
description there might be an inconsistency and
although changed, I still have a need concerning the
former msHeading:

In 1.1 Overview it is said, that the msDesc might
contains msIdentifier, head, msContents and so forth.
Example and formal description of msDesc state:

There seems head to be missing, but maybe it needs only
clarification.

Anyway: In Germany, according to the cataloguing rules,
we must supply main author/s (and sometimes title/s of
the major work/s) as first information in an catalogue
entry.

In the form proposed the element head shall not contain
author or origDate or OrigPlace, like it was used in
MASTER.
The alternative p’s don’t make clear the function of
them for the catalogue entry in general. Therefore,
either the p’s should have the type attribute required
to provide something with the function of the former
msHeading in the P5 or something like msHeading has to
be kept.

On the other hand: If we keep something like msHeading,
I see the need for a group mechanism of author and
title within it.

The problem: especially in catalogue entries for
manuscripts consisting of msParts you may face the
situation that you have more than one
author/title-combination in msHeading. How do we make
clear, which author belongs to which title?

Example: taken from Sankt Gallen, Stiftsbibliothek,
Codex 658 (as to see at www.unifr.ch/cesg\)
Titel: (1) Robertus Monachus: Geschichte des 1.
Kreuzzugs (bebildert); (2) Ottokar von Steiermark:
Österreichische Reimchronik: Fall Akkons.

Suggestion: Not a real one, but as far as I can see the
use of the attribute ‘n’ doesn’t seem to be very
satisfying. Nor do I want to rely on the order of
elements in the file.

Open to discussion,

Original comment by: @schassan

Allow the distributor element in imprint

The distributor element is only allowed within the
publicationStmt element.

However, the distributor is often needed when creating
a bibliographic entry with a biblStruct element. The
MLA and Chicago styles (and probably others) require
the distributor of a book.

I therefore request that the distributor element be
allowed in the imprint element.

Original comment by: sf_user_paultremblay

teic / tei Goto Github PK

tei's People

Stargazers

Watchers

Forkers

tei's Issues

Recommend Projects

Recommend Topics

Recommend Org