Okay, see the attached files. As I create future updates, I'll just post them to this thread. I suppose I'll start versioning them, so consider this version 0.0 for now.

I haven't been able to stress test this library, so YMMV. I've calculated a 700x10000 system with 8192 bits of precision, and gotten excellent results. But I did that before I automated things, so I don't know if the same can be accomplished with this version of the library.

BTW, in the case of that 700x10000 system, I'm not sure how much precision was actually needed. I suspect it might be half what I used, perhaps even less. This requires investigation, because that system used 1.2 GB of RAM to solve, and my VM starts swapping before I get to 1.6 GB. Once swapping begins, it crawls at 10%-20% of full speed.

So I'm currently preparing to try an 800x800 system at 6400 bits of precision, just to test it out.

A quick note on the attachments: these are meant to run in a unix/linux environment (whether native or in a VM), so I've attached a gzipped tar file. However, I've also included a Windows zip file, for Windows folks who prefer to manipulate their files outside the VM, or who just want to read the source code or whatever.

I've been chastized before for not stating certain things I took to be implicit.

1. This code is only for base e, though it can be modified for other bases (and I'll do so eventually).

2. The solver only gives you the coefficients for the non-constant terms. Therefore, the first element of the vector (e.g., coeffs_100_accel[0]) is the coefficient for z^1. You'll need to insert your own coefficient for the constant term, typically a value of -1. I don't usually need the constant term, because when I've graphed the slog, I've used Runge-Kutta to integrate the first derivative of the slog, which would eliminate the constant term anyway. (This code will be in a later version of the library, by the way.)

3. There are typos in the comments, e.g., the usage instructions that can spit out of Prepare_Vec2 (I added the afix argument later when I converted it from a global variable to a parameter). There are probably typos in the "instructions" as well. Consider this a draft, and be gentle when pointing out inconsistencies and/or errors.

10/18/2007, 06:02 AM (This post was last modified: 10/18/2007, 06:03 AM by jaydfox.)

Another important note: the base algorithm here, though simple at heart, needs to be attributed to Andrew Robbins. There's been talk that perhaps someone else devised the basic algorithm earlier, and either didn't publish it, or published it in a different form, or only described it, or whatever. But for now, I continue to call it "Andrew Robbin's slog".

It's a clever approach, because I had independently tried my hand at solving a system of equations for tetration (base e) directly, and it involved nonlinear equations that became damned hard to solve beyond about 12 coefficients. This was probably fine for getting within about 0.1% accuracy (I'm actually curious to go back and check), but that's about it.

Andrew's approach of solving the slog, amazingly enough, yields a system of linear equations, trivial to solve on today's hardware with free math libraries.

The only potential difficulty is that it takes a very large system (thousands of rows and columns) to get even marginal accuracy (i.e., better than double precision). I've devised the acceleration technique independently, though who knows if I'm the first. The acceleration technique eeks out significantly more precision with a small increase in computations (fairly insignificant, in fact, for systems larger than 500x500). So I consider this source code a "collaborative effort" of sorts. Hats off to Andrew!

10/20/2007, 04:11 PM (This post was last modified: 10/20/2007, 04:14 PM by jaydfox.)

Hmm, I just solved two 800x800 systems, one at 6400 bits, and one at 8192 bits. I did a careful analysis of memory usage (both within Windows and within the linux session), and I think I can squeeze about 9% more memory, or about 10% more bits of precision (enough for about 9000 bits, give or take).

However, when I did a comparison of the two solutions, I found something unexpected. Even with this fairly large system, the precision was really good. With 1792 bits of precision difference, we can essentially treat the 8192-bit solution as "exact". By subtracting, I found that the absolute precision in bits was in excess of 5300 bits. That means that out of 6400 bits, only about 1100 bits are "junk". This means I should be able to solve the same system at, e.g., 3200 bits, and hopefully get at least 1600-2000 bits of precision out of my results. More hopefully, I should be able to solve a 1000x1000 system, reduced to perhaps 5120 bits of precision, and still get an answer that's perfectly well suited to analysis of the residue. I'll be trying that tonight, overnight (it took about 500 minutes to calculate the 800x800@8192 system).

In addition to the absolute precision test, I even did a "relative" precision test, by multiplying each coefficient by the radius of convergence to the appropriate power, simulating the error in each term for a point on the radius. Even then, the relative precision is still about 5300 bits in absolute magnitude. The following graph shows the absolute (blue) and relative (red) precisions for each term:

By the way, I hadn't quite expected behavior this good. When I was originally solving my accelerated systems, I was calculating the custom vector with the same RealField, and hence getting horrible results unless I used very high precision. For example, for a 300x300 system, I needed somewhere north of 3000 bits to get non-garbage (e.g., first coefficient was 3234134987.12342 or whatever).

But by calculating the custom vector with a very high precision in a separate step, apparently not very much precision is needed for the system itself. I was surprised by this, as I expected that solving a system this large over a realfield would require ridiculously high precision. I suppose the very nature of the system (rising powers going down, rising bases going to the right) makes it inherently stable for solving?

Anyway, the good news for us is that we should be able to solve very large systems without the need to resort to exact (rational) math, which is both slow and memory intensive. The only drawback is the possibility of systems that evaluate as uninvertible with real math (50x50 does, even for ridiculous precision levels, at least with maxima within SAGE). But if that happens, just add or subtract a few terms and try again, I guess.

Quick update: it appears as though the relative error is fairly predictable, at least in the range of system sizes I'm working with. For a 900x900 system with 5120 bits of precision, the relative precision was about 3880 bits, give or take. That leaves about 1240 bits of garbage. The ratio is nearly the same, as 1100/800 and 1240/900 are both about 11/8, or interestingly enough, roughly the distance of the primary fixed points to the origin.

For a 1000x1000 system, then, I'd predict about 1375 bits of garbage in my resulting coefficients, such that I'd need about 3380 bits for my solver routine to get about 2000 bits of precision per term (at the radius of convergence). For 1000 terms, I'd need an extra 10 bits to handle the accumulation of errors (which is trivial at this stage), so we'll just call it 3400 bits that I need.

Memory-wise, however, I think I can get 4608 bits, or 9x512. I can run another system at 4352 bits (256 bits less) for comparison purposes, to make sure I know "precisely" how much relative precision I have. The 4352 bit system should hopefully get roughly 2950 bits of relative precision, even after subtracting out 10 bits for accumulation errors. And of course, I'd use the 4608 bit solution in practice, for about 3200 bits.

I could probably take this process as far as a 1200x1200 system, depending on how much relative precision I'm willing to give up. However, I think I'm going to call an end to this "arms race" after I calculate the 1000x1000 systems. It's time for me to stop calculating, and time to start analyzing and filling in the holes in my knowledge. I also need to provide an updated version of this library, especially graphing code. My original Runge-Kutta graphing code is slow and difficult to modify. It's also a few hundred lines of spaghetti code, to logarithmicize each interval manually. That needs to be automated in a loop. It also needs code to calculate a denser grid around zero, to expose more detail near minus infinity when we logarithmicize.