Catalan has to be the official language of XML -- there's so many X's. It's the only language that can support all the project.
Different vision for doc. types:
- well-formed is too much
- validity is too little.
Uses of XML
- interchange
- marking up text
- existing data import
- closed loop editing
- value-adding
- verifying
- etc, etc.
Can't say that industry sectors are clumped. Some shared perspectives.
Trade-offs
- atomic vs. mixed content
- trees or linked structures
- does data existence independently of markup (e.g. book)
- symbolic vs. data proc.
- natural output format vs. none
- active vs. static
Publishing
- Mixed content
- Linked structures
- Independent Existence
- Symbolic processing (not numeric; no need for types)
- Natural output/delivery format
- Static
Use these features to select a schema language.
Visual vs. Automatic Verification
Validation = auto
- difficult to create/maintain results
- can't detect pigeonhole errors
Visual verification, e.g. used in authoring stylesheets
- verifying the abstraction, so can miss errors
- only catch visible errors
Status Quo
Axis of complexity/power.
WF to the left, Valid to the right
+ NS, +XLink, etc moves towards the right, and possibly XML 2.0.
SGML '86
Original form of SGML had more to the left AND the right.
Lefthand:
- tag omissions
- short tags
- short references
- delimiter remapping -- rare
- data tags -- rare
More and more implied markup as more to the left.
Righthand:
- DTD (stronger)
- Lexical types
- Architectural Forms
Lack of use of AFs may be due to complexity, or just not useful for publishing.
Don't believe there's a universal schema language. Marketing phenomena.
Look at UML: 9 diagrams to model constraints.
SGML '98
Broke need for DTD?
'Amply Tagged' -- lesser than WF Valid is still stronger
W3C Future PSVI
Schema valid further to the right. Industry tradeoffs will determine how useful that is.
Progression of Logical Phases
Feasibly Tagged
Inferably
Impliably
Amply Tagged
WF
Feasibly Valid
...
...
Minimally Valid
Valid
ISO MDTS
MDTS
Modular Document Type Specification. 2 years, fully baked. Some parts out soon.
publishing requirements, but a lot of things are publishing. Targetting high end publishing initially, but might be applicable.
- Validation Candidate Selection -- e.g. namespace association
- RELAX NG module
- Schematron module -- first draft in few weeks. May borrow some ideas from XCSL, better in some areas
- Integrity Module -- still room for that. Schematron not really suited for that. Needs to be declarative, which Schematron isn't.
- Character Repertoire Module -- request from euro publisher. Testing character content.
- ...Others + Extensions -- biggest extension will be allowing XSD
Still very early. No sense in 'opposing' XSD. Not all modules may get in.
With modules, and schema translation, the emphasis on particular languages is lessened.
Modular schema framework might be able to help with 'cut-and-paste' issues.
Whats Wrong with XSD?
Wanted to avoid competitive feeling. "Richness is the appropriate thing"
No publishing support.
- mixed content
- character repetoire
- lexical typing is under-utilized
- intrusion of DBMS ideas: nil, suffixation limits extensibility
Bad architecture
- no subsetting = monolithic = difficult to implement
- pretension of universality
- PSVI -- no obj. providing it's not called XML. It's about how to enrich data inside a process.
Design problems (dates, integrity, complexity)
"BUT good for a lucrative and large nice. No need to need to knee-jerk (either way!)"
Gets lost in spec!! "shows that it's at least not memorable"
Supporting Amply Tagged
Editors Concrete Syntax (ECS)
- SGML concrete syntax
- basically XML, but without end tags, quoting, case differents. WF without the "terseness is minimal importance"
Made a syntax that uses this, available next week.
Already widespread: HTML. Still no DTD.
- Good for colouring editors
Supporting Inferencing
Named Information Items (NII) -- may go into MDTS
Simple format for sets of declarations.
May do it through augmenting schemas, e.g. appinfo.
When Validity Not Enough
Adam Smith, often document isn't marked up linearly. Each editor has a limited number of tags. Therefore not a linear process.
Incremental validity, islands of validity.
Weakly Valid
E.g. where level is missing (e.g. body). Patent exists (based on tree considerations). SGML has tag-omissions (prior art?)
Minimally Valid
All required elements
Or the document has all the required elements until a choice is forced
Impliably
Document is missing parts, added from schemas
Inferably Valid
Heuristics to ignore errors, or generate placeholders.
Feasibly Valid
subsequence valid -- conforms to content model up to a point. ordered elements ordered in a way that there could be elements introduced that correct it.
Implementation Options
- strength reducing the schemas
- Schematron phasing
- logic systems/language. develop rule based for schema. Because it is logic, can be queried and prompted for suggestions (or limit options in editor)
When WF is Too-Much
Constraining and unpleasant for use with editors.
- Wiki at low end
- ECS
- SGML minimization
Operators may not think in terms of trees.
"Will be very suprised if XHTML has any success. The XML rules are too strict..."
Tag grammar: transform document into a tag-document
<a_start><x/></a_end/>
Can then validate with other tools.
