schedule

# Preserving numerical algorithms

A typical result in Numerical Analysis is an algorithm to solve a mathematical problem. The main ingredients are the precise statement of the problem, an algorithm to solve it, and an assessment of the errors. From algorithm, there is a long journey to a program: the method needs to be more user-friendly (handling of error cases, etc.), has to be implemented in a programming language, and tested. The resulting program is not only a practical product, but also a very precise description of the algorithm. If the reader is a computer, nothing is left to interpretation as long as the translator is correct. In my talk, I would like to introduce an effort to preserve mathematical software: the histoRicalg project, and summarize some of its results.

In two decades from 1970 to 1990, there was an upsurge in the writing and publication of mathematical software. Much of this has been collected in subroutine libraries or published in print or in machine readable form. Traditionally, scientific results are published in journals, and journal papers are archived in libraries. However, as programs grew larger, they were no longer printed with the papers, but sent to interested readers on tape, diskettes, or other forms of data storage.

The knowledge accumulated in these codes is extensive and valuable. Its preservation, however, is more difficult than that of papers. Storage technologies come and go, programming languages come into fashion, then become forgotten. To preserve the algorithms, one needs to constantly copy their source code from old computers to newer ones and preserve the tools needed to compile them.

In this poster, I plan to introduce one effort to preserve mathematical codes: the histoRicalg project. Its primary objective is to find, document, and preserve the originals of the numerical codes that have found their way into the R statistical programming system. In particular, I will summarize my experience with attempts to run the originals of these codes on a modern Linux PC.

### PRESERVING NUMERICAL ALGORITHMS

A typical result in Numerical Analysis is an algorithm to solve a mathematical problem. The main ingredients are the precise statement of the problem, an algorithm to solve it, and an assessment of the errors. From algorithm, there is a long journey to a program: the method needs to be more user-friendly (handling of error cases, etc.), has to be implemented in a programming language, and tested. The resulting program is not only a practical product, but also a very precise description of the algorithm. If the reader is a computer, nothing is left to interpretation as long as the translator is correct.

In two decades from 1970 to 1990, there was an upsurge in the writing and publication of mathematical software. Much of this has been collected in subroutine libraries or published in print or in machine readable form. Traditionally, scientific results are published in journals, and journal papers are archived in libraries. However, as programs grew larger, they were no longer printed with the papers, but sent to interested readers on tape, diskettes, or other forms of data storage.

The knowledge accumulated in these codes is extensive and valuable. Its preservation, however, is more difficult than that of papers. Storage technologies come and go, programming languages come into fashion, then become forgotten. To preserve the algorithms, one needs to constantly copy their source code from old computers to newer ones and preserve the tools needed to compile them.

I plan to introduce one effort to preserve mathematical codes: the histoRicalg project. Its primary objective is to find, document, and preserve the originals of the numerical codes that have found their way into the R statistical programming system. In particular, I will summarize my experience with attempts to run the originals of these codes on a modern Linux PC.

# Present status:

• SLATEC codes: Fortran 77. Modifications only needed for machine dependent parts; code is well preserved. LINPACK and EISPACK parts are largely superseded by LAPACK.
• PORT Library: Bell Labs has changed owners and the web site is gone. Fortran 77. What code we have runs without changes. Contains an interesting collection of optimization algorithms.
• CMLIB: large overlap with SLATEC. Fortran 77. So far UNCMIN has been tested.
• NSWC: large overlap with SLATEC. Fortran 77. Needs changes in machine dependent parts. Large collection of special functions routines. So far, I have checked machine parameters: integer parameters match x86_64-linux-gnu.
• NUMAL: one of the early attempts at a mathematical software collection. Algol 60. Needs a translator to run. I have created Ubuntu packages for two translators (MARST and JFF-ALGOL).
• CALGO: Fortran 77 and Algol 60 parts, not all available in machine readable form. Prof. Tim Hopkins, ACM Algorithms Editor, has informed us of translations to Fortran 90 and extensive testing in progress.

# Language translator status

• Fortran 77 is "too big to fail": it is well supported after 40 years (gfortran, f2c)
• Algol 60 is far less so, but a translator to C is available
• Algol 68 and Algol W also have translators
• BASIC, Pascal are not as old and there are workable tools
• Many less widely used languages far more problematic (APL, PL/I, ...)

# Case study: UNCMIN

• Minimization code used, e.g., in R as function nlm()
• Well known: Kahaner Moler Nash (1989) has a subset code, while CMLIB has a version of the full original (code). R apparently has this code translated to C via f2c and hand editing.
• Recent bug reports have appeared for R. Is there a bug in the original?

# Outlook and difficulties

• An additional issue: abandonware. What is to be done if the code is copyrighted, but we can not contact the author anymore?
• Preservation: fix code that violates (sometimes newer) standards
• Are there bugs? If so, were they present in the original version? And which version is the original?