Mizar project wishlist
(originally a note submitted to the 30 Years of Mizar
MKM 2004 Affiliated Workshop -
http://merak.pb.bialystok.pl/mkm2004 )
My main wishes for Mizar as a project are in this note.
Go Big
MML now contains 855 articles by about 150 authors,
it grows at cca 50-100 articles a year.
Wikipedia
http://en.wikipedia.org has now over 300 000
articles accumulated over 3 years by about 6000 active volunteers.
In 2002 Debian GNU/Linux had about 9000 packages working together
on 11 platforms and about 1000 active developers. In 1995 the numbers
were 250 packages, 60 developers and 2 platforms
http://telemetrybox.org/tokyo.
In the internet age, very large projects can be done very quickly,
and not in 300 years as Piotr Rudnicki once estimated.
The essential part of the trick is to get volunteers interested.
To succeed in that, you have to give something back to the volunteers.
In the internet-driven collaborative projects, it is the satisfaction
(pride, egoboo) from doing a useful, technically or
scientifically challenging work
owned by all the authors,
and discussing, justifying and taking the decisions about how
the work should be done.
The construction of MML lacks these properties. My opinion is, that
if it continues to lack these properties for longer time, it will
be overrun by another formalization project, despite
its initial advantage.
Licensing
My attempts to find out what the Mizar and MML license is have
been unsuccessful. I doubt that anyone else knows. MML should
be released under GPL
http://www.gnu.org/copyleft/gpl.html
or under the GFDL
http://www.gnu.org/copyleft/fdl.html, maybe under both.
The software should be GPL.
I doubt that there is some strong intention leading to the current
obscure licensing situation, it is rather the laziness to deal with it.
If there is any fear of ``theft'' of MML or Mizar (I have heard such words
quite often) the lack of clear license is the best way to make such a
thing possible. On the other hand, very large business corporations
today rely on GPL and employ expensive lawyers to fight for it.
Open Source
Even if Mizar somehow succeeded in formalizing large part of mathematics
without going open-source, it would only stimulate creation
of open-source substitutes. The recent history of useful software
systems is full of such examples. Mizar as a piece of software is
complicated (sometimes quite unnecessarily), but not as complicated
to survive a larger reverse-engineering effort.
My own work with Mizar started by reverse-engineering it, recently
e.g. Paul Cairns and Jeremy Gow tried that too.
Hiding the source of a strictly academic and very hardly commercial
project that tries to be collaborative, is in my opinion nonsense,
or at least very uninformed attitude.
Many people explicitely refuse to contribute any work to a project
dependent on some secret part, and I
(and anyone else who reads the Mizar forum)
already know of several people who have
so far refused Mizar for exactly that reason. All
the ``make the world better''
motivation of the volunteers is safely stopped by that.
There are people who want to do interesting research on MML,
which could eventually lead to easier searching and formalization.
Just in the last two years I have heard about the team of Andrea
Asperti, Michael Kohlhase, Paul Cairns and Jeremy Gow, Hazel Duncan.
The Mizar sources are the
only real documentation of the system, and without them serious work
with the MML is almost impossible.
Leadership
Even if Mizar and MML are GPL-ed in some near future, it does not
mean that automatically there will be hundreds of new contributors.
Each successful collaborative project started from small, and the
strength of the project idea and the communication and managing
skills of the project founders decided about its growth.
While I have no doubts about the strength and attractiveness
of the idea of formalization of mathematics, I think there could
(and should) be a lot of improvement when it comes to discussing
and managing the project. As said above, the volunteers grouped
around the successful projects are usually self-motivated and working
out of their interest and for their fun, and definitely not
to suffer any excessive dictatorship of the leaders, lack
of enthusiasm for their own ideas, or simply lack of communication.
Mizar is still very far from being the ultimate system for
formalization of mathematics, and it needs a lot of new good ideas
and experimenting, much faster and parallel than is the current progress.
People should be allowed to experiment with the software, to
create new tools for transforming MML, to suggest and implement
changes and even fork the system or the MML if necessary.
All these things should be discussed in a supportive way and possibly
in real time, not with one week delay. The current practice is
very far from that.
Modularization
The Mizar verifier and especially its checker part is now quite a
monolithic piece of software. One way how to make more experimenting
and improvements possible is to make it more modular. It should
be as easy as possible to add some semantic strengthening or
some syntactic sugar or support for new constructs.
There should be very clear semantic interface for such new modules
or plugins. E.g. if some new computer algebra module is added, it
should be possible to print the steps it does and their standard
Mizar justification, and to check those detailed steps without
the module.
Gray zone
I do not agree with the idea that every theorem in MML
has to be proved. It delays formalization of many mathematical
fields and excludes many mathematicians that would be otherwise
interested. I would prefer a more ``gap-filling'' approach.
After all, anyone who has ever written some formalization very likely
used that method, and Mizar is quite suitable for it.
Yes, it is a bit dangerous, but not much. First of all, Mizar does not
protect from writing nonsense anyway. It is not hard to find in MML
a lot of nonsense. Second, if there is an error, the computer
processing very often allows an easy fix. In CARD\_FIL, I have
first defined GCH for the type Cardinal instead of Aleph (infinite
Cardinal), and thus it was simply false. When I noticed it about
a year later and changed Cardinal to Aleph, everything worked without
the necessity of any further changes. From time to time a bug
is found in Mizar and sometimes such a bug is really abused in
some articles. Again, it is usually quite easy to fix such misuse.
Even if some proof relies on a fact which is not true, the proof
is usually still valuable - it is simply a proof of something else,
and it is still a piece of formalized mathematics that can be
reused.
Refactoring of MML
It should be much more common and easier to edit existing
MML articles. The complete MML (i.e. with proofs)
should be published on the web in a version system, with
very easy editing access and with the possibility of immediate
online verification of the changes. The online verification should
also immediately create the hyperlinks to symbol definitions,
theorems etc.,
and announce whether something got possibly broken in later
articles.
The current habit of ``one article or nothing'' is unnecessarily
demanding and creating quite artificial barrier for people who
just spotted a possible improvement in MML or want to add only
a smaller bit. There is now a very small group doing
any of these things, without the discipline imposed by the version
system and common peer review, and this is simply unjust and again
deterring other possible self-motivated editors.
This facility would be also very useful for the ``Gray zone''
formalization. While there will be people who are good at
introducing formal definitions in an advantageous way and
sketching whole theories in the defined notions, the task of filling
the missing proofs can be taken up by others, sometimes e.g.
as a student exercise or a task for automated theorem provers.
Better dependence resolving
The MML processing is now in some sense very fragile. One added definition
in an article in the middle of the MML can cause that all the rest
of MML has to be regenerated. This is bad enough for the revisions
that are being done today, and would be very bad for the online
refactoring suggested above.
Mizar notation is overloaded, and that is often useful, because
short mathematical symbols are a scarce resource. But there
is no good way how to refer in an absolute way to the Mizar
functors, predicates and other constructors. This is bad not just for
all kinds of semantic presentation and searching systems and semantic
communication with other systems, but also for authoring articles
in more advanced domains. In such articles clashes of notation become
more frequent,
and it is increasingly hard for the users to decipher the meaning
of their text.
The best answer that
can be now given to the question ``Which + is this?'' is something
like ``It is the fourth functor defined in article FOO''.
This kind of answer is actually used often by the Mizar system,
and it causes a lot of the processing fragility (the numbering changes
when something is added, moved to other article or removed completely).
Many overloaded programming languages create the absolute names
in smarter ways, e.g. by appending the function name with the
names of parameter types, parent object (if it is a method) and e.g. the
name of the module. This allows also smarter rebuilding and linking
when some module is changed.
It probably would not be too difficult to create the absolute names
in a similar automated way in Mizar, and it would be definitely
better than the current naming scheme. My opinion is that formalization
of mathematics should go one step further and force the authors
to create
unique pretty-printing names for the objects that are defined.
I believe that good unique mathematical terminology is simply one of
the subgoals
of the field of formalization of mathematics, and the fact that
Mizar does not care for this is in fact rather surprising.
Integration with TeX?, richer syntax
Many other proof assistants already have some simple
pragmas for including
TeX? text. Many programming languages
have that capability too. Mizar has now a complicated translator
to
TeX? which is not directly usable by the authors and it omits
all the comments. This can and should be simplified, so that
the users could watch the
TeX? output immediately after processing,
and improve by hand its ugly parts.
First, simple pragma denoting the
TeX? part in the Mizar text should be added.
If we really believe in formalization of mathematics,
we should make e.g. complete textbooks based on
formalized mathematics possible. Yes, part of the additional
textbook syntax could and should be captured by new Mizar
keywords, e.g. ``example'', ``exercise'', ``corollary'', etc.
But it is fairly hard to imagine formalization of e.g. historical
notes in the textbooks.
Second, Mizar should really try to serve as a semantic anchor even
for articles which do not attempt full-scale formalization.
A goal which is not unrealistic, provided the ``Gray zone'' idea
gets some support, is to have in some short time the possibility
to express the main results of normal mathematical articles in
the formalized language. This alone would be of great importance
to normal mathematics, it would be a big step towards semantic
searching and classification of normal mathematics. Our goal
should be that such a description becomes wide-spread,
e.g. in the same way as is now
the ``keywords'' annotation of articles.
Since we should make such annotation as easy as possible, we definitely
should forget about forcing such mathematicians into the
``environment creation hell'' which is one of the sore points
of formalization in current Mizar. This is another reason,
why we should provide the unique names for Mizar objects and guarantee
their stability as much as possible.
Conclusion
I have described what could (in my opinion) be done with Mizar,
so that it became the wide-scale collaborative effort leading
to the formalization of mathematics.
The ideas are a bit high-level and some of them can probably
be applied also to other proof assistants, my lower-level wishes
both for the Mizar software and MML are not here.
There are not many new ideas here,
most of them are just copied from the successful internet collaborative
projects that have arisen during the last ten years and surprised the
world with their vitality and power.
There will obviously be the
objection that it is too much work for the small Mizar team.
The readers who think so should read this article again and try
to understand that the main idea (practically working in many other
projects) is to involve more enthusiasts on democratic principles,
so that they could identify with that work and help to do it.
Obviously, this does not come for free. Fair cooperation and
communication with others is the key to success in large
collaborative efforts. To put it in a very straightforward way,
Mizar has two options. The first is to remain to be the project solely
and rigidly steered by the Bialystok Mizar team, advancing at a pace
that makes prof. Rudnicki's estimate more than justified.
All the glory, grants, awards and also responsibility for the project
will belong to that team.
The second option is to give Mizar and MML to the rest of the world,
so that they could play with it, experiment with it, contribute to it
and participate in making decisions about it. Yes, it is possible
(and probable) that someone else would eventually do major
contributions to the Mizar system or MML, and also get some of
the Mizar based grants and the fame for doing useful and important work.
In my opinion, letting others to build
on one's (scientific) results is not just a extremely
successful feature of the last-decade's
internet collaborative projects, it should be the basic creed
of all true scientists.
--
JosefUrban - 21 Aug 2004
to top