NewTechnology


Automated translation in Formalized Mathematics by -- GrzegorzBancerek - 09 Dec 2005

I don't know if the whole translation can be re-programmed inside XSLT, however here are some useful steps towards it:

have a BNF description

of the current format file at http://fm.uwb.edu.pl/proof-read/pub.frd -- see enclosed file frdgrm.bnf extracted from the above article and partially cleaned

devise a XML (RNC) equivalent

probably using "fm" namespace for elements and attributes - still to do, should be easy;

OK, first draft with some questions uploaded here; I guess the various headers should be merged into one element FMFormat (started, but questions emerged ...). Provided that the plan is to have extensions anyway (e.g. HTML (MATHML?), proofs, ...), I wouldn't fear to make the merged FMFormat element as rich as can be meaningful in the future.

discusssion of the mapping level
Discouraging note
The Pattern-based suggestion which follows after this note is quite maintanance-costly, unless the FMFormat info (in some human notation) is kept directly in the .miz file (which would obviously be strongly opposed by the SUM chairman :-); if the FMFormat info is not kept directly there, then any Pattern->FMFormat table (even if in prels) will be outdated quickly by revisions; the old way (i.e. Mizar format -> FMFormat mapping) is probably more easy to maintain by hand, since the formats are usually not touched by revisions (their "absolute names" do not change as quickly as those of Patterns). See perhaps the middle-way suggestion
The pattern based suggestion
  • the file pub.frd (and actually the whole FM translation) seems to do the mapping on the level of Mizar formats (http://merak.pb.bialystok.pl/mizardoc/xml/Mizar.html#Format). As discussed some time ago, this can limit usage of the same notation for different purposes - e.g. as Grzegorz says: format "a ^ b" (concatenation) is now tex-ed as "a{^\smallfrown}b", while format "a |^ b" (power) as "a^{b}" - i.e. if someone wanted to used "a ^ b" in Mizar for power, it would be tex-ed incorrectly. The solution is to do the translation one level higher, i.e. at the Pattern level (http://merak.pb.bialystok.pl/mizardoc/xml/Mizar.html#Pattern). So the new RNC spec should not map Mizar formats to FMFormats (i.e. TEX or HTML display) as the current BNF does, but Mizar patterns to FMFormats. This way the mapping of Mizar formats to FMFormats will generally become many-to-many. Handling it in XSLT will be practically same in both cases, and the reimplemntation is probably a good point for removing this bottleneck.
  • the switch to the pattern mapping will be done by calculating for each pattern which of the current Mizar format it uses - done by some simple operation on prels
  • in order to reduce the number of special files, and also to make the online FM processing as easy as the HTML creation, the FMFormat info should be kept directly with its Pattern; i.e. inside the prel .dno files, and then during accommodation transferred to the environment (similarly to the .frx file); temporarily, I will probably still start by having an XML equivalent of the "global" pub.frd file (with patterns specified absolutely)

A middle-way suggestion
Perhaps keep the Format->FMFormat mapping as the basic maintainable level, and still allow an optional FMFormat slot directly in the Pattern element. That FMFormat slot of the Pattern element would (if present) override the default Format->FMFormat mapping. Since these would only be exceptions from the default mapping, their number and revision-based maintanence need would be lower (or perhaps even the SUM chairman might agree to keeping the FMFormat info in .miz file, if it's just a couple of exceptions). This means that the RNC spec should define the purely FMFormat element (i.e. just the Tex or HTML display info), and then also allow different mapping methods (e.g., from a Mizar Format element, or Pattern, or just none when included in a Mizar Pattern).

write a XSL stylesheet

operating on the XML spec in the same (or at least reasonably similar) way to the one described in the article

the product should be both a TeX-only version, and also a natural language (NL) HTML version

I'd vote against having special TeX symbols as images in the HTML (though it is of course a possible alternative), and perhaps extend the XML spec by HTML (MathML??) equivalents

Notes

  • when the above is done (a first hack should be reasonably easy), play also with NL translation patterns for the Mizar proofs (again, extend the XML spec)
  • optionally, explore if the TeX XSL production can be extended to produce linked PDF (however I'd rather concentrate on HTML, in the spirit of e.g. http://www.joma.org/)

-- JosefUrban - 16 Mar 2007

  • frdgrm.rnc: draft of the XML syntax in compact Relax NG
WebForm
TopicClassification Select one...
ImplementationDate? N/A
Topic attachments
I Attachment Action Size Date Who Comment
elsebnf frdgrm.bnf manage 2.7 K 2007-03-16 - 02:22 JosefUrban BNF grammar of translation patterns
elsernc frdgrm.rnc manage 3.9 K 2007-03-16 - 23:00 JosefUrban draft of the XML syntax in compact Relax NG
Topic revision: r5 - 2009-10-31 - 09:14:58 - GrzegorzBancerek
 
Mizar TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback