Mizar Translations
this is now mostly a collection of e-mails I (
JosefUrban) exchanged with various people about translation of Mizar to various formats. People who want to work in this area are encouraged to search for info here, and possibly ask new questions here, so that the info is kept public (send me an e-mail asking for the password for the
GuestAccount? to be able to edit this page). Feel free to refactor this to a more readable form.
E-mails with Lucas Herrenbrueck
Lucas Herrenbrueck to bancerek, urban
Dear Mr. Bancerek, dear Mr. Urban,
I am working with Prof. Michael Kohlhase (he sends his greetings) on the Mizar-OMDoc translation. He has recommended me to ask you about Mizar details I do not understand. Currently, my translation object is the "abcmiz_0" document. I have a few questions:
Looking at abcmiz_0-absref,
1) how do I learn the 'content' of a Const - there is only a number given. What does this number reference?
2) do I need to infer what kind of qualified variables are used in a Let or Given statement? If yes, how?
3) are the seemingly redundant
,
etc. important, even though they hold no information?
4) the difference between hence and thus seems not to be reflected in the XML version. Is that the intention?
And 5), is there translation relevant information in all the other "abcmiz_0"-files - except .miz and .xml?
Thank you so much for your help!
Lucas Herrenbrueck
Josef Urban to Lucas, bancerek, urban
Hi Lucas,
I put a reasonably new xml for abcmiz_0 at
http://lipa.ms.mff.cuni.cz/~urban/abcmiz_0.tar.gz. You'll probably want the file abcmiz_0.xml1 (the xml with added absolute references), a bit of documentation of the format is in the file Mizar.html. All of this is very easy to produce from Mizar if needed for other articles.
>
I am working with Prof. Michael Kohlhase (he sends his greetings) on the Mizar-OMDoc translation. He has recommended me to ask you about Mizar details I do not understand. Currently, my translation object is the "abcmiz_0" document. I have a few questions:
>
Looking at abcmiz_0-absref,
>
1) how do I learn the 'content' of a Const - there is only a number given. What does this number reference?
It's Mizar's level-based numbering of constants. This numbering is not one-to-one, because when a level (say proof) is finished, all the constants introduced on that level disappear, and their numbers are re-used. One way to make them unique is to add their level (attribute plevel of each level) to them. In the abcmiz_0.xml1 file mentioned above, there is also the 'vid' attribute, which refers to the number of identifier used to diaplsy that constant. the numbering table is in the .idx file.
>
2) do I need to infer what kind of qualified variables are used in a Let or Given statement? If yes, how?
I don't understand what you mean by a "kind of qualified variables". And I do not know what you want to achieve.
>
3) are the seemingly redundant , etc. important, even though they hold no information?
They are not redundant, they may be empty, but that's an info needed by Mizar or other tools too.
>
4) the difference between hence and thus seems not to be reflected in the XML version. Is that the intention?
It's not explicit, however can be safely inferred. If the Conclusion is justified by a linked Inference, it was "hence" (well technically not any more after the recent introduction of "thus then", but no sane human is using that syntax).
>
And 5), is there translation relevant information in all the other "abcmiz_0"-files - except .miz and .xml?
Depends on what kind of translation you are after. For a semantic one, the .xml1 file is quite sufficient (look e.g. at the tptp translator at
http://kti.ms.mff.cuni.cz/cgi-bin/viewcvs.cgi/xsl4mizar/mizpl.xsltxt?view=markup). For a more "presentational" one some more of the files (enclosed in the abcmiz_0.tar.gz) are needed - see e.g.
http://kti.ms.mff.cuni.cz/cgi-bin/viewcvs.cgi/xsl4mizar/MHTML for how the HTML presentation can be done. I guess if there is sufficient demand, some fixes to addabsref.xsl could make the .xml1 file even less dependent on the external files.
Best,
Josef
E-mails with Immanuel Normann
From: Immanuel Normann To: urban
Subject: XML-izing Mizar
Dear Josef Urban,
I have just read your paper about XML-izing Mizar. In fact I am interested in semantic processing of the MML and the XML format would be a definetely great help
for my aims. I already have downloaded the mizar system. Unfortunately I couldn't find any instruction how to XML-ize Mizar articles eventually - neither in the
mizar docs nor in your paper. Since your purpose is to open Mizar to third party systems for semantic processing via this XML format it would be nice to provide
some easy instructions how to XML-ize Mizar practically.
Looking forward to get some hints in this direction - Thanks in advance,
Immanuel
urban to Normann
Hi Immanuel,
thanks for reading my paper!
On Mon, 26 Jun 2006, Immanuel Normann wrote:
>
I have just read your paper about XML-izing Mizar. In fact I am interested in semantic processing of the MML and the XML format would be a definetely great
>
help for my aims. I already have downloaded the mizar system. Unfortunately I couldn't find any instruction how to XML-ize Mizar articles eventually -
>
neither in the mizar docs nor in your paper.Since your purpose is to open Mizar to third party systems for semantic processing via this XML format it would
>
be nice to provide some easy instructions how to XML-ize Mizar practically.
It's in the paper, p. 8 says:
This semantic description of an article is now automatically constructed
during the verification, so the way to produce such description
for the whole MML from installed Mizar distribution is e.g.
following:
mkdir tmp; cp -r $MIZFILES/mml tmp; cd tmp/mml
for i in `cat $MIZFILES/mml.lar`; do accom $i; verifier $i; done
This will create .xml files in the tmp/mml directory.
If you want to add absolute references to the .xml files (I speak about this in the paper), get also
http://kti.ms.mff.cuni.cz/cgi-bin/viewcvs.cgi/*checkout*/xsl4mizar/addabsrefs.xsl from my CVS and run additionally something like:
for i in `cat $MIZFILES/mml.lar`; do xsltproc addabsrefs.xsl $i > $i.rich; done
Best,
Josef
Normann to Urban
Hello Josef,
thanks for the immediate response! Obviously I have overlooked in my first reading the paragraph on page 8 which describes how to xml-ize the miz-files. Actually
because I did not understand why "accom" and "verifier" is needed for a translation into XML. Instead I was reading the subsection "2.2 XML-izing Mizar" three
times where I expected some command line "miz2xml foo.miz". Anyway now I have the XML ouput.
However this calls for more questions: For each miz-file there are several xml-files constructed. Apart from the foo.xml for foo.miz what are the others good
for?
In general it is hard to figure out how a miz file is mapped to its corresponding xml-file. What is your suggestion to learn this quickly? From your paper I can
only learn a little cutout, I would say, and the relaxng documentation helps only a little bit as a lot of documentation is missing inside. Is there any other
documentation in this respect that I might have overlooked?
Moreover, the main problem for me is to understand the Mizar type system (with modes, cluster, adjectives, vocabularies, requierments, etc) well. Can this be
translated to pure first order logic? Would it be complicated to export mizar files to xml-files where formulas are encoded in FOL? I think it is quite hard for
a non-expert to Mizar to reconstruct the complete context of a given mathematical statement in Mizar. The translation to XML does not help much in this sense.
The problem is that too much (also in the XML versions) is kept implicit or encrypted. So is there something that could ease the access to mizar content to
non-expert?
If this is not the case: Could you please tell me which is the definitve reference manual for the type system.
Best,
Immanuel
Urban to Normann
Hi Immanuel,
On Tue, 27 Jun 2006, Immanuel Normann wrote:
>
However this calls for more questions: For each miz-file there are several xml-files constructed. Apart from the foo.xml for foo.miz what are the others good
>
for?
They mostly encode so called "environment" of a Mizar article. This is what the "accom" utility is for: It extracts from the Mizar internal database (this is
the "prel" directory in the Mizar distro) the parts of other Mizar articles (e.g. theorems, definitions, etc.), which are mentioned in the "environment"
declarations of a new article. When you run "verifier" it then does not use the Mizar database any more - it relies completely on the "environment" files
created by "accom".
>
In general it is hard to figure out how a miz file is mapped to its corresponding xml-file.
Well, for the XML element names ("Proof", "Now", "Consider", "JustifiedTheorem", ...) the mapping should be fairly obvious - at least I intentionally chose
verbosity for them, with the hope that they are obvious enough. Additionally, watch for positional attributes ("line","col"), which (usually) map pieces of XML
to positions (line and column numbers) in the original miz file.
What can be the problem are some XMl attributes, usually encoding by one letter a "kind" of some Mizar object, e.g.
. I don't know if all of
them are explicitly explained in the relaxng doc, but if not, let me know and I'll add a short explanation there. Note that I cannot be too verbose, because
all the RNC documentation resides directly in the Mizar sources (this has a lot of advantages), and writing novels there would not be exactly welcome by other
Mizar people.
> What is your suggestion to learn this quickly? From your paper I can only learn a little cutout, I would say, and the relaxng documentation helps only a
> little bit as a lot of documentation is missing inside. Is there any other documentation in this respect that I might have overlooked?
Other problem can be that the relaxng is a documentation of a format for the Mizar logic, and unless you e.g. know what are Mizar "structures", "selectors" or
"definienda", this can be really hard to understand. I don't know what is the quickiest way how to learn this. One is definitely Freek's tutorial:
http://www.cs.ru.nl/~freek/courses/pa-2006/doc/mizman.ps.gz, another (a bit outdated probably) might be the "Mizar Lecture Notes":
http://markun.cs.shinshu-u.ac.jp/kiso/projects2/proofchecker/mizar/Mizar4/en_wframe/index.html . Another "learn by clicking" way might be exploring Grzegorz
Bancerek's MML Query (http://merak.pb.bialystok.pl/) - e.g. take one article which (you think
you understand on the miz-file level, and learn what kind of
MML query items are generated from it (note that MML Query does not include proofs however yet).
> Moreover, the main problem for me is to understand the Mizar type system (with modes, cluster, adjectives, vocabularies, requierments, etc) well. Can this be
> translated to pure first order logic?
It's done by me in the MPTP (http://wiki.mizar.org/cgi-bin/twiki/view/Mizar/MpTP) project, so yes :-).
> Would it be complicated to export mizar files to xml-files where formulas are encoded in FOL?
No, but in MPTP 0.2, I exported to sort-extended TPTP, for a number of reasons (read http://kti.ms.mff.cuni.cz/~urban/MPTP2/mptp2.ps if you want to know more).
In MPTP 0.1., it was indeed direct FOL. As Geoff Sutcliffe is slowly adopting Petr Pudlak's XML syntax for TPTP, I might export to XML instead of Prolog in
some time. Btw. you can already now transform TPTP to XML by Petr Pudlak's tools.
> I think it is quite hard for a non-expert to Mizar to reconstruct the complete context of a given mathematical statement in Mizar. The translation to XML
> does not help much in this sense.
Right, the XML is still Mizar logic. On the other hand, I guess that e.g. in various XML encoding of ontologies, you don't explicitly mention e.g. the isa
hierarchy of a concept at each point either - that would be very heavy, so the context is also given by some logic which you have to learn. In the MPTP
translation, the sort hierarchy is encoded fairly explicitly.
> The problem is that too much (also in the XML versions) is kept implicit or encrypted. So is there something that could ease the access to mizar content to
> non-expert?
> If this is not the case: Could you please tell me which is the definitve reference manual for the type system.
After a short look, section 1.3.4 in Freek's manual could tell you something useful. Don't expect "defnitive reference manuals" for anyhting though :-).
Best,
Josef
Normann to Urban
Hi Josef,
thankvery much for the extentsive hints and explainations! Concerning the translation to FOL, Michael Kohlhase gave me also the hint to read the MPTP paper right
after I have sent you my last email. Ok, that's great now I have really enough to start.
Thanks again.
Immanuel
Normann to Urban
Dear Josef,
I have just implemented a dfg-syntax parser in Haskell. (see compiled binary attached) and applied it to the 12530 files from the directory mptp0.2/dfg/. It
turned out that 815 of them are not conform to dfg-syntax (or more precisely as far my parser sees it ;-). It seems in all of the error cases that there is a
digit (always a "0" or a "1") where an identifier is expected.
For more details you can look in the log file (see attachment). For each of the dfg-files a little parse report is recorded.
If you fix this little bug in the translation from mizar to dfg I would be interested in the fixed version of mptp0.2/dfg.
Best,
Immanuel
P.S.: If you want to apply the parser your self: just put it in the directory where your dfg-files are and execute it. In writes then to the file "log" the parse
report. If you are also interested in the parser source I can send it to you.
Urban to Normann
Hi Immanuel,
sorry for the late reply - I was out of office.
The error (as far as your parser sees it
is in the list_of_symbols DFG declaration, specifically, in the
functions[(0,0),...
part. I just quickly tried with SPASS (which was the only reason for using DFG in MPTP0.1), and it parses the file OK. The MPTP0.2 dfg files were produced
using Geoff Sutcliffe's tptp2X tool (BabelFish? for ATP systems, serving e.g. the CASC competition) in this way from the TPTP format (which is the default in
MPTP0.2):
tptp2X -f dfg arytm_0__t12_arytm_0.p
I haven't checked the DFG specification, but even if your parser is right, I'd probably suggest that you relax the "identifier" to something like "number or
identifier". The other option is that you complain to Geoff, and to the SPASS people that they do not implement their own standard
(btw., I wish you good
luck with any bug reports to the SPASS team, they are extremely responsive ;-)).
If I may, I'd also suggest to you to switch from DFG to TPTP, which is really quite actively developed, and today's standard. If you are looking for something
Haskell-ish, perhaps Koen Classen's Paradox and Equinox systems (http://www.cs.chalmers.se/~koen/folkung/) have an implemented TPTP parser.
Best,
Josef
Normann to Urban
Hi Josef,
actually I just wanted to notify you that some files are not compliant to the dfg syntax specification. And of course I relaxed my parser accordingly to parse
these files (in fact this was done just by adding the keyword "digit" at some place of my parser code so absolutely no effort). And the reason why I focused on
your dfg version of mptp instead of the tptp version was simply your article which also decided for dfg.
Btw as I am writing about compliance, there occured another pathological singular phenomenon:
...exists([],...)... in gate_1__t2_gate_1.dfg. Probably this a translation artefact. I haven't considered quantifiers without quantifying anything in my analysis
tools and so this formula crashed my analysis. My parser accepted this formula (since it is dfg syntax compliant), but my postproccessing procedures could not
deal with it. And it was hard to find the flaw. One formula of over 1500000 million formulas - that was exactly the so called needle in the haystack.
Thanks for the link to the TPTP parser. I will have a look at it. Right now I am fine with my dfg parser.
Best,
Immanuel
Immanuel Normann to Josef
Dear Josef,
I have just read again your paper "Translating Mizar for First Order Theorem Provers" which is well written with good examples. Now as I am comparing myself some Mizar articles with the corresponding dfg version some questions come up:
What is the reason for redundant blow up of formula of the form "and(true,P(X))"?
For instance in relat_1_t10_relat_1.dfg one can find the redundant formulae:
formula(
true,
dt_k1_xboole_0 ).
formula(
forall([A],
implies(
true,
true)),
dt_k1_zfmisc_1 ).
formula(
forall([A, B],
implies(
and(
true,
true),
true)),
dt_k4_xboole_0 ).
Where does this come from?
Second question I couldn't answer myself: Each dfg file seem to contain exactly one theorem. So one article generates usually many dfg-files. However, these dfg-files corresponding to a single article differ in their axioms in general. Can I assume that only those axioms are included in the def-file which were actually used for the proof of dfg-files theorem?
Thanks for some enlightning reply in advance,
Immanuel
Josef Urban to Immanuel
Dear Immanuel,
On 1/24/07, Immanuel Normann <i.normann@iu-bremen.de> wrote:
> Dear Josef,
>
> I have just read again your paper "Translating Mizar for First Order
> Theorem Provers" which is well written with good examples.
Thanks, but please note that the first MPTP version decribed in that article is now outdated. The second version is more correct and complete, but it uses an intermediate extended TPTP format, which is only afterwards transformed by Prolog utilities to pure first-order TPTP. These things are described in http://kti.ms.mff.cuni.cz/~urban/MPTP/mptp2.ps .
> Now as I am
> comparing myself some Mizar articles with the corresponding dfg version
> some questions come up:
>
> What is the reason for redundant blow up of formula of the form
> "and(true,P(X))"?
> For instance in relat_1_t10_relat_1.dfg one can find the redundant formulae:
>
> formula(
> true,
> dt_k1_xboole_0 ).
>
> formula(
> forall([A],
> implies(
> true,
> true)),
> dt_k1_zfmisc_1 ).
>
> formula(
> forall([A, B],
> implies(
> and(
> true,
> true),
> true)),
> dt_k4_xboole_0 ).
>
> Where does this come from?
Just "true" usually means that there was a "trivially true" type constraint, usually for typing of functions. Generally, the types of functions are translated to predicates, e.g.
let m,n be natural number,
func m + n -> natural number;
would be translated as
implies(and(natural(M),natural(N)), natural(+(M,N))).
but when the type is just "set", this contraint says nothing, because every object in Mizar is "set". To make ATPs know about this, such constraints are translated just as "true" (instead of e.g. "set(X)"). In first MPTP version, these formulas were not simplified. In the second version, there is a simplification implemented.
> Second question I couldn't answer myself: Each dfg file seem to contain
> exactly one theorem. So one article generates usually many dfg-files.
> However, these dfg-files corresponding to a single article differ in
> their axioms in general. Can I assume that only those axioms are
> included in the def-file which were actually used for the proof of
> dfg-files theorem?
You would have to tell me the exact names of the files that you write about. Generally, if you are already talking about files with ATP problems, then yes: each of the problems corresponds to one Mizar theorem, and only the axioms needed in Mizar for proving that theorem is present there (possibly with some redundancy).
> Thanks for some enlightning reply in advance.
No problem, however I am on vacation till end of February, so my replys can be quite delayed.
Best,
Josef
Normann to Josef
Josef Urban wrote:
> Second question I couldn't answer myself: Each dfg file seem to
> contain
> exactly one theorem. So one article generates usually many dfg-files.
> However, these dfg-files corresponding to a single article differ in
> their axioms in general. Can I assume that only those axioms are
> included in the def-file which were actually used for the proof of
> dfg-files theorem?
>
>
> You would have to tell me the exact names of the files that you write about. Generally, if you are already talking about files with ATP problems, then yes:
> each of the problems corresponds to one Mizar theorem, and only the axioms needed in Mizar for proving that theorem is present there (possibly with some
> redundancy).
I am not looking at one particular file. My general insterst currently is: How can I figure out whether a dfg-file contains only those axioms Mizar really needed
to prove the theorem of that file (i.e. there are no redundant axioms w.r.t. the proof found by Mizar.) You say those files with ATP problems. How can I know
whether a problem is an ATP problem? Are there any other problems with this non-redudancy property?
Cheers,
Immanuel
Urban to Normann
On 1/26/07, Immanuel Normann <i.normann@iu-bremen.de> wrote:
Josef Urban wrote:
>> Second question I couldn't answer myself: Each dfg file seem to
>> contain
>> exactly one theorem. So one article generates usually many dfg-files.
>> However, these dfg-files corresponding to a single article differ in
>> their axioms in general. Can I assume that only those axioms are
>> included in the def-file which were actually used for the proof of
>> dfg-files theorem?
>>
>>
>> You would have to tell me the exact names of the files that you write
>> about. Generally, if you are already talking about files with ATP
>> problems, then yes: each of the problems corresponds to one Mizar
>> theorem, and only the axioms needed in Mizar for proving that theorem
>> is present there (possibly with some redundancy).
>
>> I am not looking at one particular file. My general insterst currently
>> is: How can I figure out whether a dfg-file contains only those axioms
>> Mizar really needed to prove the theorem of that file ( i.e. there are no
>> redundant axioms w.r.t. the proof found by Mizar.)
The Mizar export is only approximative: pinning doen the exact set of axioms is only possible by actually running an ATP, and looking at what axioms it used.
> You say those files
> with ATP problems. How can I know whether a problem is an ATP problem?
> Are there any other problems with this non-redudancy property?
IN the MPTP 0.1 distro, there were I think only files with theorems definitions, etc. The problems were created by running the main perl program. Maybe some generated problems are somewhere available from my web page.
Best,
Josef
(I am on vacations till end of February, so my answers will be slow)
-- GuestAccount? - 01 Jan 2009