A chapter from the documentation to a class R of 'real numbers of arbitrary precision' that I created for the programming language Ruby Ulrich Mutze 2013-10-03 (part of rnum.rb) Motivation and rational ----------------------- Experimentation with class R provided many experiences which sharpened my long-standing diffuse ideas concerning a coding framework for mathematics. In mathematics we have 'two worlds': the discrete one and the 'continuous' one. For the first world, Ruby is fit without modifications. Its automatic switch from Fixnum representation of integer numbers to a Bignum representation eliminates all limitations to the coding of discrete mathematics problems for which a coding approach is reasonable from a logical point of view. The 'continuous world' is the one for which the real numbers lay the ground. The prime example of this world is 'real analysis' for which the concept of convergence is considered central. When a computationally oriented scientist describes to a pure mathematician his experience that all mathematically and physically relevant structures seem to have natural codable counterparts, this pure mathematician will probably admit that this holds for the trivial part of the story, but he will probably insist that all deeper questions, those concerning convergence, closedness, and completeness are a priori outside the scope of numerical methods. According to my understanding, this mathematician's point of view is misleading. It is, however, suggested by what mathematicians actually do. For them, it is natural, when having to work with a number the square of which is known to be 2, to 'construct' this number as a limit of objects (e.g. finite decimal fractions) with the intention to use this 'exact solution of the equation x^2 = 2' in further constructions and deductions within the framework of real analysis. For a computationally oriented scientist an alterative view is more natural and more promising: Do everything (i.e solving x^2 = 2, and the further constructions and deductions mentioned above) with finite precision and consider this 'whole thing' (and not only x) as a function of this precision. When we consider the behavior for growing precision of the 'whole thing' we have a single limit (if we need to consider a limit at all), and the question never arises whether two limits are allowed to be interchanged. Such questions are typically not easy and a major part of the technical scills of mathematicians is devoted to them. I'm quite sure that I spent more than a year of my live struggling with such questions. To have a realistic example, let us consider that the 'whole thing', mentioned above, is the task to produce all tables, figures, diagrams, and animations that will go into a presentation or publication. Then it is natural to consider a large structured Ruby program as the means to perform this task. (My experience is restricted to C++ programs instead of Ruby programs for this situation.) Let us assume that all data that normally would be represented as Float objects, now are represented as R objects. As discussed already, this means that e.g. instead of x = 2.0 we hat to write x = R.c(2.0) or x = R.c2 or we had to add to the statement x = 2.0 the conversion statement x = R.c(x) No modification or conversion would be needed for the integer numbers! Now consider having an initial statement R.prec = "float" in the main program flow. By executing the program we get curves in diagrams, moving particles in animations, etc. If some of these curves are more jagged than expected, or some table values turn out to be NaN or Infinite we may try R.prec = 40 and R.prec = 80 ... If the results stabilize somewhere, practitioners are sure that 'the result is now independent of numerical errors'. Of course, the mathematician's objection that behavior for a few finite values says nothing about the limit, also applies here. But we are in a very comfortable position to cope with this objection: Since we deal with the solution of a task, we are allowed to assume that we know the experience-based conventions and 'best practices' concerning solutions of tasks in the problem field under consideration. So, the judgement whether the behavior of the solution as a function of precision supports a specific conclusion or not has a firm basis when the context is taken into account. It is an illusionary hope that some abstract framework such as 'Mathematics' could replace the guidance provided by problem-field related knowledge and experience. Let me finally describe an experience which suggested to me the concept of precision as a parameter of whole task (or project) instead of a parameter of individual arithmetic operations: In an industrial project I had to assess the influence of lens aberrations to the efficiency of coupling laser light into the core of an optical fiber. Although the situation is clearly a wave-optical one, the idea was to test whether a ray-tracing simulation would reproduce the few available measured data. In case of success one would rely on ray-tracing for unexpensive optimization and tolerance analysis. At those times, in my company all computation was done in FORTRAN and floating point number representation was by 4 bytes (just as float in C). The simulation reproduced the measured curve not really but it wiggled arround it in a remarkably symmetrical manner. Just at this time our FORTRAN compiler was upgraded to provide 8-byte numbers (corresonding to C's double). What I had suspected turned out to be true: the ray-trace simulation now reproduced the measurements with magical precision. Congratulations to the optical lab for having got such highly consistent data! Soon I turned to programming in C and enjoyed many advantages over FORTRAN, but one paradise was lost: There was no longer a meaningful comparison between 4 byte and 8 byte precision since the available C-compiler worked with 8 bytes internally in both cases. When later we got the type long double in additon, this was an disappointment since it was identical to double for the MS-compiler and only 10 bytes (?) for GNU. So my desire was to have a C++ compiler which implemented, and consistenly used float: 4 bytes double: 8 bytes long double: 16 bytes. (Notice that such a fixation leaves some freedom in the actual setting of the least significant bit of a result of a computation.) I then would write all my programs with a floating point type named R which gets its real meaning in a flexible manner by a statement like typedef long double R; I was rather sure for a long time that I would never meet a practical situation in which a simulation would show objectionable numerical errors when done with 16 byte numbers. I had to learn, however, that this was a silly idea: Let us consider a system of polyspherical elastic particles (the shape of each particle is a union of overlapping spheres) which are placed in a mirror-symmetrical container and let initial positions (placements) and velocities be arranged symmetrically (with respect to the same mirror). Then the exact motion can be seen to preserve this symmetry. However, each simulation will loose this symmetry after a few particle to particle collisions. (The particles start to rotate after collisions, which influences further collisions much more than for mono-spherical particles.) Increasing the number of bytes per number will only allow a few more symmetrical collisions, and the computation time needed to see these will increase. Of coarse, this deviation from symmetry is not objectionable from a 'real world' point of view. It teaches the positive fact that tolerances for making the particles, placing and boosting them, are unrealisticly tight for realizing a symmetrical motion of the system. This shows that there are computational tasks in which not all computed system properties will stabilize with increasing computational precision. With class R such computational phenomena are within the scope of Ruby programming! Addition 2025-05-20 For me, due to my usage of C+- (in which the meaning of CpmRoot::R can conveniently set to the quadmath type float128 or a Boost.Multiprecision type), such phenomena are everyday experience with C++ programming. The quadmath type float128 turns out to be not considerably slower than double, and the support for real functions, even rather exotic ones, seems to be excellent with Boost.Multiprecision.