*(Disclaimer: this post is intended as an enthusiastic introduction to a topic I knew absolutely nothing about till yesterday. Correct me harshly as needed.)*

We programmers like to instantly pigeonhole problems as "trivial" and "impossible". The AI-Box has been proposed as candidate for the simplest "impossible" problem. Today I'd like to talk about a genuinely hard problem that many people quickly file as "trivial": formalizing mathematics. Not as in, teach the computer to devise and prove novel conjectures. More like, type in a proof for some simple mathematical fact, e.g. the irrationality of the square root of 2, and get the computer to verify it.

Now, if you're unfamiliar with the entire field, what's your best guess as to the length of the proof? I'll grant you the generous assumption that the verifier already has an extensive library of axioms, lemmas and theorems in elementary math, just not this particular fact. Can you do it in one line? Two lines? (Here's a one-liner for humans: when p/q is in lowest terms, p^{2}/q^{2} is also in lowest terms and hence cannot reduce to 2.)

While knowledgeable readers chuckle to themselves and the rest frantically calibrate, I will take a moment to talk about the dream...

Since the cutting edge of math will obviously always outrun any codification effort, why formalize proofs at all? Robert S. Boyer (of Boyer-Moore fame) spelled out his vision of the benefits in the QED Manifesto in 1993: briefly, a machine-checkable repository of humanity's mathematical knowledge could help math progress in all sorts of interesting ways, from educating kids to publishing novel results. It would be like a good twin of Cyc where the promises actually make sense.

But there could be more than that. If you're a programmer like me, what word first comes to your mind upon imagining such a repository? Duh, "refactoring". Our industry has a long and successful history of using tools for automated refactorings, mechanically identifying opportunities to extract common code and then shove it around with guaranteed conservation of program behavior. Judging from our experiences in shortening large ad-hoc codebases, this could imply a radical simplification in all areas of math at once!

**Import Group Theory. Analyze. Duplicate Code Spotted: 119 Instances. Extract? Yes or No.**

...Or so goes the dream.

You ready with that estimate? Here's a page comparing formal proofs that sqrt(2) is irrational for 17 different proof checkers. The typical short proof (e.g. Mizar) runs about a page long, but a lot of its complexity is hidden in innocent-looking "by" clauses or "tactics" that are, in effect, calls to a sophisticated formula-matching algorithm "prove this from that", so you can't exactly verify this kind of proof by hand the way the software does it. The more explicit proofs are many times longer. And yes, all of those were allowed to use the verifiers' standard libraries, weighing in at several megabytes in some cases.

Things aren't so gloomy. We haven't won, but we're making definite progress. We don't have Atiyah-Singer yet, but the Prime number theorem is done and so is the Jordan curve, so the world's formalized math knowledge already contains proofs I can't recite offhand. Still, the situation looks as if a few wannabe AI researchers could substantially help humanity by putting their mind to the proof-encoding problem instead; it's obviously much easier than general AI, but lies in the same rough direction.

In conclusion, ponder this: human beings have no evolutionary reason to be good at math. (As opposed to, say, hunting or deceiving each other.) Chances are that, on an absolute scale, we suck about as hard as it's possible to suck and still be called "mathematicians": what do you mean, you can't do 32-bit multiplication in your head? It's still quite conceivable that a relatively dumb prune-search program, not your ominous paperclipper, can beat humans at math as decisively as they beat us at chess.

At one stage I had high hopes for Raph Levien's "ghilbert", which was a reworking of Metamath to iron out a couple of problems and make it easier to work on problems in a distributed, cooperative way, but now even its domain doesn't seem to work.

Metamath does seem to me closest to the ideal. I'm still trying to make sense of this stuff now.

The December 2008 issue of AMS Notices is entirely devoted to formal proof: http://www.ams.org/notices/200811/

As these papers show, the state of the art is still extremely primitive from an user-friendliness point of view. Formalized math proofs still read too little like math and too much like computer code.

The issue is exacerbated since most proof assistants in common use today are "procedural" proof assistants. A proof in a procedural proof assistant is a linear list of complex proof "tactics" which involve manipulation of the proof state: hypotheses and goals are typically left implicit. These scripts are unreadable unless one replays them step-by-step in the proof assistant.

The most interesting project seems to be http://www.vdash.org - which is intended to be a formal math wiki. The website features a lot of background information but the wiki itself is not up and running, and discussions in the mailing list seem to have died off.

Another useful site is http://www.formalmath.org : it is a "user-friendly" presentation of IsarMathLib, a library for the Isabelle proof assistant based on Zermelo-Frankel set theory.

More expansion on the possibilities of such a solved computational might be in order here; even mathematicians will have to crank their imaginations a bit to think through the specific advantages afforded by the formalized-computer-mathematics future.

Isn't all this being worked on already? Just put "automath", "mizar", or "automated reasoning" into Google Books, and you'll find a vast quantity of material. Did you have something else in mind that no-one is doing?

As far as I know, none of these "proof assistants" is actually used as such. Although Automath and Mizar are decades old, the research is still all about how to build such systems and make verified depositories of known mathematics in them.

Yes, this is all being worked on; I didn't mean to imply this stuff was new. Two examples of useful applications: proving the four-color theorem in Coq, and proving the correctness of microprocessors (e.g. floating point division after the notorious Pentium bug) in ACL2.

Do these proof checkers check a formal proof or generate one?

Later edit: I thought about it a little bit and decided

no waydo they generate the proof. Being able to check a proof is impressive enough.Happily disagree. Mathematical ability isn't the ability to check a proof, or even the ability to generate the proof. Its the ability to notice that the sqrt(2) is irrational in the first place. (and the ability to build programs that surpass our abilities)

It's only quite recently that any computer could credibly be argued to be playing chess as well as the best humans. So "as decisively as they beat us at chess" doesn't seem all that decisive, even though I think it's generally believed now that the best programs are better than the best humans even on stock hardware.

Computers tend to be pretty bad at NP-hard problems, and even the "simple" task of determining whether or not a statement of propositional logic is a contradiction is NP-complete... formalized proofs are, in general, going to be bloody huge computational nightmares.

The NP-complete problem you describe corresponds to automated theorem proving, i.e. searching for a proof of a new mathematical theorem: this is indeed a difficult search problem. Verifying that an existing, human-created proof is correct is a different problem which is definitely tractable.

The real challenge is putting in the grunt-work of formalizing the undergrad math curriculum: one estimate put the required effort at about 140 man-years.