A fun problem: Biggest difference between elements

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

wraith
Posts: 98
Joined: Mon Aug 06, 2007 6:58 am UTC
Contact:

A fun problem: Biggest difference between elements

I stumbled upon this problem recently and couldn't solve it on my own. However I know the solution now, and it is a legitimate one (initially I doubted that this would be possible).

You have an array of real numbers in some random order. Devise an algorithm that calculates the biggest difference between adjacent numbers in their sorted order. The algorithm needs to work in linear time, so it's needless to say that sorting the input array is out of the question.

Example:

Input: {2, 1, 100, 102, 8, 5}
Output: 92
(If they were sorted, the biggest difference between adjacent numbers would be between 8 and 100)

Have fun
It's only when you look at an ant through a magnifying glass on a sunny day that you realize how often they burst into flames

Flumble
Yes Man
Posts: 2016
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: A fun problem: Biggest difference between elements

I thought of a very simple algorithm
Spoiler:
keep track of the bounds of the currently largest gap, so start with low=min(array[0..1]), high=max(array[0..1])
for every next element:
if the value is between the bounds, either move the low upwards or high downwards, depending on whether the value is closer to low or to high
if the value is greater than high and the distance is larger than the distance between the bounds, make the new bounds (low,high)=(high,value)
if the value is less than low and the distance is larger than the distance between the bounds, make the new bounds (low,high)=(value,low)

but it fails for arrays like

Code: Select all

`[6, 106, 205, 303, 400, 56, 155, 254]`

...so arrays of the form: have a larger gap (A) in front of a smaller gap (B), then have A split into multiple gaps all smaller than B, making it the largest gap.
Since keeping track of all the gaps (e.g. using an ordered data structure, so basically sorting the list) requires at least O(n log n) time, I need to throw more math at the problem.

*space reserved for posterity after more thinking*

scarecrovv
It's pronounced 'double u'
Posts: 674
Joined: Wed Jul 30, 2008 4:09 pm UTC
Location: California

Re: A fun problem: Biggest difference between elements

This might be excessively complicated, but it works.
Spoiler:
If you sort n items into n+1 bins then at least one bin is guaranteed to be empty. Since there is at least one empty bin then you know that the longest interval must be at least as long as a bin, and you can therefore just find runs of consecutive empty bins and then check the distance between the maximum of the bin before the run and the minimum of the bin after the run. This requires a lot of steps, but each can be done in O(n) or less.

Code: Select all

`#!/usr/bin/pythonimport collectionsdef biggest_difference(inputs):    # Make sure we've got floats.    xs = [float(x) for x in inputs]    # Going to sort items into bins, like the beginning of a radix sort but    # without sorting the individual bins later. Each bin just remembers the    # minimum and maximum value. Both are None for an empty bin.    Bin = collections.namedtuple('Bin', ['lo', 'hi'])    # More bins than items guarantees at least one empty bin, and it won't be    # the first or last.    # O(1)    n_bins = len(xs) + 1    # Find the minimum and maximum values, these will be the start of the first    # bin and end of the last bin.    # O(n)    minimum = min(xs)    maximum = max(xs)    # Determine the width of each bin.    # O(1)    bin_width = (maximum - minimum) / n_bins    # Create list of empty bins    # O(n)    bins = [Bin(lo=None, hi=None)] * n_bins    # Add items to bins.    # O(n)    for x in xs:        # Figure out which bin this item goes in.        bin_idx = int((x - minimum) / bin_width)        # The above calculation will put the maximum item in a bin past the end        # of the list. Shove it into the last bin instead.        if bin_idx == n_bins:            assert x == maximum            bin_idx -= 1        # Add the item to the bin.        prev_bin = bins[bin_idx]        if prev_bin.lo is None:            assert prev_bin.hi is None            bins[bin_idx] = Bin(lo=x, hi=x)        else:            assert prev_bin.hi is not None            if x < prev_bin.lo:                bins[bin_idx] = Bin(lo=x, hi=prev_bin.hi)            elif prev_bin.hi < x:                bins[bin_idx] = Bin(lo=prev_bin.lo, hi=x)    assert bins[0].lo == minimum    assert bins[n_bins - 1].hi == maximum    # Find all the runs of consecutive empty bins. There must be at least one.    # O(n)    EmptyBinRun = collections.namedtuple('EmptyBinRun', ['start', 'end'])    empty_bin_runs = []    this_empty_bin_run = None    for this_bin, idx in zip(bins, range(n_bins)):        if this_empty_bin_run is None:            if this_bin.lo is None:                assert this_bin.hi is None                this_empty_bin_run = EmptyBinRun(start=idx, end=idx)        else:            if this_bin.lo is None:                assert this_bin.hi is None                this_empty_bin_run = EmptyBinRun(start=this_empty_bin_run.start,                                                 end=idx)            else:                assert this_bin.hi is not None                empty_bin_runs.append(this_empty_bin_run)                this_empty_bin_run = None    if this_empty_bin_run is not None:        empty_bin_runs.append(this_empty_bin_run)    assert 0 < len(empty_bin_runs)    # Find the length of the longest empty bin run.    # O(n)    longest_empty_bin_run_length = 0    for empty_bin_run in empty_bin_runs:        length = empty_bin_run.end - empty_bin_run.start + 1        if longest_empty_bin_run_length < length:            longest_empty_bin_run_length = length    assert 0 < longest_empty_bin_run_length    # Find all the empty bin runs that are tied for longest.    # O(n)    longest_empty_bin_runs = []    for empty_bin_run in empty_bin_runs:        length = empty_bin_run.end - empty_bin_run.start + 1        assert length <= longest_empty_bin_run_length        if length == longest_empty_bin_run_length:            longest_empty_bin_runs.append(empty_bin_run)    assert 0 < len(longest_empty_bin_runs)    assert len(longest_empty_bin_runs) <= len(empty_bin_runs)    # For each of the longest empty bin runs, find the distance between the    # highest element of the previous bin and the lowest element of the next    # bin.    # O(n)    interval_lengths = []    for empty_bin_run in empty_bin_runs:        interval_lengths.append(bins[empty_bin_run.end + 1].lo -                                bins[empty_bin_run.start - 1].hi)    # Find the maximum interval length    # O(n)    return max(interval_lengths)input1 = [2, 1, 100, 102, 8, 5]input2 = [6, 106, 205, 303, 400, 56, 155, 254]print(biggest_difference(input1))print(biggest_difference(input2))`

edit: well it almost works.
Last edited by scarecrovv on Fri May 01, 2015 3:26 pm UTC, edited 1 time in total.

Nitrodon
Posts: 497
Joined: Wed Dec 19, 2007 5:11 pm UTC

Re: A fun problem: Biggest difference between elements

If I understand your algorithm correctly, it fails with the inputs {1, 10, 18, 26} and {1, 12, 22, 23, 25} (or any rearrangements thereof).
Spoiler:
The largest difference may include one fewer empty bin than the longest run. Fortunately, the algorithm still runs in O(n) time after accounting for this, even if the longest run is of length 1.

scarecrovv
It's pronounced 'double u'
Posts: 674
Joined: Wed Jul 30, 2008 4:09 pm UTC
Location: California

Re: A fun problem: Biggest difference between elements

Nitrodon wrote:If I understand your algorithm correctly, it fails with the inputs {1, 10, 18, 26} and {1, 12, 22, 23, 25} (or any rearrangements thereof).
Spoiler:
The largest difference may include one fewer empty bin than the longest run. Fortunately, the algorithm still runs in O(n) time after accounting for this, even if the longest run is of length 1.

So it does. Good catch.

edit: Actually due to a bug in my implementation it gets the right answer for your second sequence, but not your first one. Investigating...

edit 2: Here's a fixed version. I've learned my lesson and written a unit test. Also, despite my algorithm theoretically having a better asymptotic time complexity than python's built in sort function, a "naive" implementation actually performs far better in practice on inputs my computer can handle. I'm sure that for sufficiently large inputs my algorithm would win, but my computer can't handle inputs that big.
Spoiler:

Code: Select all

`#!/usr/bin/pythonimport timeimport randomimport collectionsdef biggest_difference(inputs):    assert 1 < len(inputs)    # Make sure we've got floats.    xs = [float(x) for x in inputs]    # Going to sort items into bins, like the beginning of a radix sort but    # without sorting the individual bins later. Each bin just remembers the    # minimum and maximum value. Both are None for an empty bin.    Bin = collections.namedtuple('Bin', ['lo', 'hi'])    # More bins than items guarantees at least one empty bin, and it won't be    # the first or last. However, while there will be at least one empty bin,    # and the longest interval must be at least as long as an empty bin, the    # start and end of the longest interval may be near opposing ends of    # adjacent bins, in which case there may be no complete bins contained in    # the interval. Doubling the number of bins cuts the length of the bins in    # half, ensuring that even an interval like this will contain a bin.    # O(1)    n_bins = 2 * (len(xs) + 1)    # Find the minimum and maximum values, these will be the start of the first    # bin and end of the last bin.    # O(n)    minimum = min(xs)    maximum = max(xs)    # Determine the width of each bin.    # O(1)    bin_width = (maximum - minimum) / n_bins    # Create list of empty bins    # O(n)    bins = [Bin(lo=None, hi=None)] * n_bins    # Add items to bins.    # O(n)    for x in xs:        # Figure out which bin this item goes in.        bin_idx = int((x - minimum) / bin_width)        # The above calculation will put the maximum item in a bin past the end        # of the list. Shove it into the last bin instead.        if bin_idx == n_bins:            assert x == maximum            bin_idx -= 1        # Add the item to the bin.        prev_bin = bins[bin_idx]        if prev_bin.lo is None:            assert prev_bin.hi is None            bins[bin_idx] = Bin(lo=x, hi=x)        else:            assert prev_bin.hi is not None            if x < prev_bin.lo:                bins[bin_idx] = Bin(lo=x, hi=prev_bin.hi)            elif prev_bin.hi < x:                bins[bin_idx] = Bin(lo=prev_bin.lo, hi=x)    assert bins[0].lo == minimum    assert bins[n_bins - 1].hi == maximum    # Find all the runs of consecutive empty bins. There must be at least one.    # O(n)    EmptyBinRun = collections.namedtuple('EmptyBinRun', ['start', 'end'])    empty_bin_runs = []    this_empty_bin_run = None    for this_bin, idx in zip(bins, range(n_bins)):        if this_empty_bin_run is None:            if this_bin.lo is None:                assert this_bin.hi is None                this_empty_bin_run = EmptyBinRun(start=idx, end=idx)        else:            if this_bin.lo is None:                assert this_bin.hi is None                this_empty_bin_run = EmptyBinRun(start=this_empty_bin_run.start,                                                 end=idx)            else:                assert this_bin.hi is not None                empty_bin_runs.append(this_empty_bin_run)                this_empty_bin_run = None    if this_empty_bin_run is not None:        empty_bin_runs.append(this_empty_bin_run)    assert 0 < len(empty_bin_runs)    # For each of the empty bin runs, find the distance between the highest    # element of the previous bin and the lowest element of the next bin.    # O(n)    interval_lengths = []    for empty_bin_run in empty_bin_runs:        interval_lengths.append(bins[empty_bin_run.end + 1].lo -                                bins[empty_bin_run.start - 1].hi)    # Find the maximum interval length    # O(n)    return max(interval_lengths)def inefficient_check_implementation(inputs):    # Make sure we've got floats.    xs = [float(x) for x in inputs]    # Find all the interval lengths by sorting.    interval_lengths = []    prev_x = None    for x in sorted(xs):        if prev_x is not None:            interval_lengths.append(x - prev_x)        prev_x = x    # Find the maximum interval length    return max(interval_lengths)efficient_time = {}inefficient_time = {}def test(n):    xs = [random.random() for i in range(n)]    start = time.time()    efficient_answer = biggest_difference(xs)    if n not in efficient_time:        efficient_time[n] = 0    efficient_time[n] += time.time() - start    start = time.time()    inefficient_answer = inefficient_check_implementation(xs)    if n not in inefficient_time:        inefficient_time[n] = 0    inefficient_time[n] += time.time() - start    assert efficient_answer == inefficient_answerprint('Basic tests')input1 = [2, 1, 100, 102, 8, 5]input2 = [6, 106, 205, 303, 400, 56, 155, 254]input3 = [1, 10, 18, 26]input4 = [1, 12, 22, 23, 25]print(biggest_difference(input1), inefficient_check_implementation(input1))print(biggest_difference(input2), inefficient_check_implementation(input2))print(biggest_difference(input3), inefficient_check_implementation(input3))print(biggest_difference(input4), inefficient_check_implementation(input4))print('Exhaustive tests')for i in range(1000):    test(2)    test(3)    test(4)    test(5)    test(10)    test(100)    test(1000)print('Testing performance')efficient_time = {}inefficient_time = {}for i in range(1):    test(10)    test(100)    test(1000)    test(10000)    test(100000)    test(1000000)    test(10000000)assert sorted(efficient_time.keys()) == sorted(inefficient_time.keys())for n in sorted(efficient_time.keys()):    print(n, efficient_time[n], inefficient_time[n], efficient_time[n] / inefficient_time[n])`

Output:

Code: Select all

`Basic tests92.0 92.097.0 97.09.0 9.011.0 11.0Exhaustive testsTesting performance10 0.0013508796691894531 1.3589859008789062e-05 99.40350877192982100 0.0020041465759277344 7.867813110351562e-05 25.4727272727272731000 0.008816242218017578 0.0009288787841796875 9.49127310061601610000 0.09358835220336914 0.00989985466003418 9.453507694530742100000 0.9028651714324951 0.1184701919555664 7.621032400885491000000 12.46615982055664 1.7965278625488281 6.93902949151606710000000 135.46248650550842 22.369660139083862 6.055634536388456`

Column 1 is the input size. Column 2 is my algorithm's time. Column 3 is the naive algorithm's time. Column 4 is the ratio of the two. It gets better with larger inputs, but python's built in sort is just so fast compared to my complex algorithm that it beats my algorithm by a factor of 6 even on a list of 10,000,000 elements.

Qaanol
The Cheshirest Catamount
Posts: 3058
Joined: Sat May 09, 2009 11:55 pm UTC

Re: A fun problem: Biggest difference between elements

Without reading any of the other posts, I’ve found a solution that uses 2n extra space. It uses that space very inefficiently though, so I expect better is possible:

Spoiler:
Given an array of n elements, traverse it once to find its min and max. If max == min, return 0. Otherwise:

Calculate the average gap size g = (max-min) / (n-1). The largest gap must be at least g.

We are going to “bin” the elements by which multiple of g they round down to, so that any two elements in the same bin are less than g apart and thus cannot form the largest gap.

Allocate two arrays of size n. Call one binMax, which will store the largest element in each bin, and initialize its entries to min. Call the other binMin, which will store the smallest element in each bin, and initialize its entries to max.

Go through the array a second time. For each element x, calculate b = floor((x-min) / g). This is the bin into which x falls. Update binMax[b] and binMin[b] with x as appropriate.

The largest gap cannot be within a bin, so it must be between some binMax[i] and the next binMin[j] for which binMin[j]<=binMax[j].

Traverse the bins once to find the largest such gap, and return it.
wee free kings

Derek
Posts: 2178
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: A fun problem: Biggest difference between elements

This one is tough, but I think I have a solution:

Spoiler:
Given N unsorted elements, we can find the maximum and the minimum in O(N) time.

The largest gap must be at least (max - min)/(N - 1). This is pretty easy to see, but if you want to prove it I think you can use some variation of the mean value theorem, or perhaps the pigeon hole principle.

My strategy is to bucket the elements into O(N) buckets which cover a range of less than (max - min)/(N - 1) each. This can be done in O(1) time for each element, as I will show in the code. Then the elements on either side of the largest gap must be in separate buckets. Then the minimum and maximum of each bucket can be found in O(bucket size) time, or O(N) time total. Finally, we go through the list of buckets, finding the gap between the maximum of each bucket and the minimum of the next largest non-empty bucket. This takes O(N) time (since there are O(N) buckets). Then we return the maximum of those gaps, which is up to O(N). All put together, this is O(N).

Code: Select all

`import mathdef largest_gap(elems):   N = len(elems)   if N == 0:      return 0   lower = float(min(elems))   upper = float(max(elems))   diff = upper - lower   if diff == 0:      return 0      # Normalize elements so that min = 0 and max = N.   # This simplifies the math of bucketing.   normalization = float(N)/diff   normed_elems = [(e - lower)*normalization for e in elems]      # The largest gap is now >= N/(N-1) > 1      # Create buckets. The extra bucket holds upper   # (depending on how rounding works).   buckets = [[] for i in range(N+1)]      # Bucket elems.   for e in normed_elems:      bucket = buckets[int(math.floor(e))]      bucket.append(e)      # Compute gaps between buckets.   gaps = []   prev_bucket = None   for bucket in buckets:      if len(bucket) == 0:         continue      if prev_bucket:         gaps.append(min(bucket) - max(prev_bucket))      prev_bucket = bucket      return max(gaps)/normalization`

I've tested this on all the example inputs shown in this thread and it works.

EDIT: So yeah, I have the same algorithm as Scarecrovv and Qaanol. My code is shorter than Scarecrovv's though

Qaanol
The Cheshirest Catamount
Posts: 3058
Joined: Sat May 09, 2009 11:55 pm UTC

Re: A fun problem: Biggest difference between elements

On further reflection, it is in fact possible to make do with only n extra space, by cleverly re-using the original array.
wee free kings

Derek
Posts: 2178
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: A fun problem: Biggest difference between elements

Qaanol wrote:On further reflection, it is in fact possible to make do with only n extra space, by cleverly re-using the original array.

Yeah, but that's kind of a dick thing to do for a function. Unless you're call-by-value, but who passes an array with call-by-value?

Qaanol
The Cheshirest Catamount
Posts: 3058
Joined: Sat May 09, 2009 11:55 pm UTC

Re: A fun problem: Biggest difference between elements

Okay fine then, it’s possible with only n extra space and no changes to the array, by making one extra pass over the data.
wee free kings