## Digging away at numbers.

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 4060
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

### Digging away at numbers.

I had wondered if this were a Logic Puzzle, but I'm not sure if there's a Logical Answer. And while I'm thinking of trying some coding-based tests it's not strictly Coding or programming at heart. Maybe here?

I have just had what boils down to a trivial but slightly repetitive task to take a set of fixed-length (top-padded) 'number' strings with a known terminator, interspersed within some dissimilar strings, and 'eat away' the number part by search and replacing 'digit+terminator' with just 'terminator', over various iterations. (Yes, coding-fans, a regexp would have done this in one pass, but I take my fun where I can find it, and it was a fun few minutes, feeling disproportionately more productive than the reality.)

As a worst case scenario, running the process once each for a sequence of every digit, and doing that (as a block) once for each digit in the original run would guarantee the completion of the process. Say (as a brief example) the sequence 1234# was s/1#/#, s/2#/#/, s/3#/#/ and s/4#/#/ in each pass-group (using regexp terms here) then on each group pass the current end-digit is removed. It is during the fourth pass that the last digit goes (only a fifth full pass would report zero substitutions made, or the 2, 3, 4 attempts and then fifth-pass 1attempt would indicate "finished" in a Platonic Cave setup where only by noting the shadow of nothing happening can one understand the state of the world out there.

But, if that same sequence is applied to 4321#, a single group-cycle is enough to reach a final state. In the most efficient way we could design (or hope for) given the process.

Incidentally, 1111# would take four (group) cycles as the s/1#/# does not repeat over the section(s) of strings it has just acted upon, it takes off one 1 and moves on to find the next most evident 1# in the list.

Faced with a selection of numbers (some initially in view, some not) one tactic might be to apply a 'digging out' set of digits that copy (in last-to-first order) the digits in one of the extant examples. Using the 1234# example, extract 4, 3, 2 and 1 and you have guaranteed the nullification of that example. You're left with (from our other examples) 432# and 111#, which you may or may not see in the same viewing window, and obviously these need further weeding out (a minimum of six 'digs' into the data, the order of which might or might not sufficiently resolve other sets, such as the (initial) 4131#, now 413# which would appreciate at least one 1 acted upon between the 3 and 4 of the 2, 3 and 4 sample of the sequence. 1, 2, 1, 3, 1, 4 would work for that, as would 2, 1, 3, 1, 4, 1. Others would work and all possibilities are possibly useful for other 'new' sequences. 4434# would be out of luck, though, without another 4# finder and replacer

These sequences, however, are slightly longer but they are one or maybe a few subset runs of all possible sequences. For a start, they are overwhelmingly zero-padded. (Actually, the top end is non-zero, but that can easily be dealt with as a standard closing sequence of digit deletion problem if fortuitously expedited deletions haven't already removed the foremost digits by advancing back through the digits at an unpessimistic rate.) One might well expect that a slightly modified version of Benford's Law applies to the digit frequency (remembering that zero-padding).

Is there, then, an optimal ordering of digits in a 'group' set to dig away more optimally and optimistically (more than the one digit per group-cycle for most numbers, across most cycles), to reach completion ahead of worst-case scenario?

Is there also a case for taking a poll of the current 'digit+terminator' digraphs (either across the whole document, as a 'free move', or just from the samples currently visible/discernible in the current viewport on the document) and aiming at the most frequent of these?

Running 1..9 would (at least on the first iteration) likely satisfy both of these concerns. But once low-hung fruit are eaten away, this changes the dynamics (the revision of the post-Benford frequencies) and there's nothing to say that the initial removal of a rarer end-digit might not expose yet more of the commoner digit, fresh to harvest, and delaying that common replacement yields net greater progress over the same number of steps taken.

And while sticking to 'one of each per group, whatever order, for multiple groups' is guaranteed to reach the destination no later than the group iterations equal to the digits involved, this does not sound optimal. If 9s are (a la one version of Benford) only a sixth as frequent as 1s, then shouldn't we be trying to switch out 1s six times more frequently than the 9s, across any given subsequence of digit-deletions. (Not to mention the zero being over-represented, at least towards the end of a process.)

Not that any of this has any overwhelming real-world application (that I know of, my own little toy project that raised this aside) but how would you address the semi-blind* guidance towards a hopeful "better than worst-case-but-guaranteed" early finish?

I have a notion that instead of straw-polling the vulnerable end of the sequence, each turn (which gets 'most done that turn', but doesn't guarantee the next turn is entered into with a better two-turn-total 'things done' than an alternative, and onwards), one should perhaps try to back-track from the front end. E.g. look at the number of sequential zeros padding the most padded sequence in your sight and pushing that many of that kind of operation to the end of the queue, then consider what the best number would have been to make that final 'exposal' of the zero-runs across the most examples and make that the instruction to follow immediately before the padder-removals. Continue onwards, towards the termnator-cap (for the digit removal you now think you ought to be starting with) but with a nod to all the sequences you aren't seeing, so smattering a Benfordesque frequency profile of non-obvious digits into the mix.

If any of that makes sense.

* i.e. you can see/sample at least some of the still extant sequences, between number-choices. You could simulate the many different sequences of digging out each possible last digit at each stage and discover which precise sequence gives the best metric, but by then you've obtained the result you wanted each time you completed that task in each new way you tried (or, with width-first searching, have dragged countless divided sub-branches, and all their many necessary resources in storage and time-taken, down so many abandoned paths) and thus have no need to triumphantly run this discovered sequence 'for the first time'.

Xanthir
My HERO!!!
Posts: 5400
Joined: Tue Feb 20, 2007 12:49 am UTC
Contact:

### Re: Digging away at numbers.

I think you want to order the diggings in increasing order of commonness; that is, assuming Benford's law (+zero padding) holds, you want to go from 9 to 0.

This is because, regardless of ordering, you're guaranteed to remove one digit. But once you've done so, what's the most likely next digit? The same distribution applies; you're most likely to see the most-common digit. In particular, given that you've removed any digit N, it's most likely that the next digit will be a *more common* one, so ideally you should have the more-common checks come afterwards.

You're also right that having repeats of some digits, roughly in line with their frequency, might be worthwhile. Would need testing at that point; longer sequences mean more work done per run, and thus more potential for *wasted* work if the next digit is one that only show up earlier in the dig sequence.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))