A fun problem: Biggest difference between elements

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

User avatar
wraith
Posts: 98
Joined: Mon Aug 06, 2007 6:58 am UTC
Contact:

A fun problem: Biggest difference between elements

Postby wraith » Thu Apr 30, 2015 9:19 am UTC

I stumbled upon this problem recently and couldn't solve it on my own. However I know the solution now, and it is a legitimate one (initially I doubted that this would be possible).

You have an array of real numbers in some random order. Devise an algorithm that calculates the biggest difference between adjacent numbers in their sorted order. The algorithm needs to work in linear time, so it's needless to say that sorting the input array is out of the question.

Example:

Input: {2, 1, 100, 102, 8, 5}
Output: 92
(If they were sorted, the biggest difference between adjacent numbers would be between 8 and 100)

Have fun :)
It's only when you look at an ant through a magnifying glass on a sunny day that you realize how often they burst into flames

User avatar
Flumble
Yes Man
Posts: 1996
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: A fun problem: Biggest difference between elements

Postby Flumble » Thu Apr 30, 2015 2:05 pm UTC

I thought of a very simple algorithm
Spoiler:
keep track of the bounds of the currently largest gap, so start with low=min(array[0..1]), high=max(array[0..1])
for every next element:
if the value is between the bounds, either move the low upwards or high downwards, depending on whether the value is closer to low or to high
if the value is greater than high and the distance is larger than the distance between the bounds, make the new bounds (low,high)=(high,value)
if the value is less than low and the distance is larger than the distance between the bounds, make the new bounds (low,high)=(value,low)

but it fails for arrays like

Code: Select all

[6, 106, 205, 303, 400, 56, 155, 254]

...so arrays of the form: have a larger gap (A) in front of a smaller gap (B), then have A split into multiple gaps all smaller than B, making it the largest gap.
Since keeping track of all the gaps (e.g. using an ordered data structure, so basically sorting the list) requires at least O(n log n) time, I need to throw more math at the problem.

*space reserved for posterity after more thinking*

User avatar
scarecrovv
It's pronounced 'double u'
Posts: 674
Joined: Wed Jul 30, 2008 4:09 pm UTC
Location: California

Re: A fun problem: Biggest difference between elements

Postby scarecrovv » Fri May 01, 2015 6:09 am UTC

This might be excessively complicated, but it works.
Spoiler:
If you sort n items into n+1 bins then at least one bin is guaranteed to be empty. Since there is at least one empty bin then you know that the longest interval must be at least as long as a bin, and you can therefore just find runs of consecutive empty bins and then check the distance between the maximum of the bin before the run and the minimum of the bin after the run. This requires a lot of steps, but each can be done in O(n) or less.

Code: Select all

#!/usr/bin/python

import collections

def biggest_difference(inputs):
    # Make sure we've got floats.
    xs = [float(x) for x in inputs]

    # Going to sort items into bins, like the beginning of a radix sort but
    # without sorting the individual bins later. Each bin just remembers the
    # minimum and maximum value. Both are None for an empty bin.
    Bin = collections.namedtuple('Bin', ['lo', 'hi'])

    # More bins than items guarantees at least one empty bin, and it won't be
    # the first or last.
    # O(1)
    n_bins = len(xs) + 1

    # Find the minimum and maximum values, these will be the start of the first
    # bin and end of the last bin.
    # O(n)
    minimum = min(xs)
    maximum = max(xs)

    # Determine the width of each bin.
    # O(1)
    bin_width = (maximum - minimum) / n_bins

    # Create list of empty bins
    # O(n)
    bins = [Bin(lo=None, hi=None)] * n_bins

    # Add items to bins.
    # O(n)
    for x in xs:
        # Figure out which bin this item goes in.
        bin_idx = int((x - minimum) / bin_width)

        # The above calculation will put the maximum item in a bin past the end
        # of the list. Shove it into the last bin instead.
        if bin_idx == n_bins:
            assert x == maximum
            bin_idx -= 1

        # Add the item to the bin.
        prev_bin = bins[bin_idx]
        if prev_bin.lo is None:
            assert prev_bin.hi is None
            bins[bin_idx] = Bin(lo=x, hi=x)
        else:
            assert prev_bin.hi is not None
            if x < prev_bin.lo:
                bins[bin_idx] = Bin(lo=x, hi=prev_bin.hi)
            elif prev_bin.hi < x:
                bins[bin_idx] = Bin(lo=prev_bin.lo, hi=x)

    assert bins[0].lo == minimum
    assert bins[n_bins - 1].hi == maximum

    # Find all the runs of consecutive empty bins. There must be at least one.
    # O(n)
    EmptyBinRun = collections.namedtuple('EmptyBinRun', ['start', 'end'])
    empty_bin_runs = []
    this_empty_bin_run = None
    for this_bin, idx in zip(bins, range(n_bins)):
        if this_empty_bin_run is None:
            if this_bin.lo is None:
                assert this_bin.hi is None
                this_empty_bin_run = EmptyBinRun(start=idx, end=idx)
        else:
            if this_bin.lo is None:
                assert this_bin.hi is None
                this_empty_bin_run = EmptyBinRun(start=this_empty_bin_run.start,
                                                 end=idx)
            else:
                assert this_bin.hi is not None
                empty_bin_runs.append(this_empty_bin_run)
                this_empty_bin_run = None
    if this_empty_bin_run is not None:
        empty_bin_runs.append(this_empty_bin_run)
    assert 0 < len(empty_bin_runs)

    # Find the length of the longest empty bin run.
    # O(n)
    longest_empty_bin_run_length = 0
    for empty_bin_run in empty_bin_runs:
        length = empty_bin_run.end - empty_bin_run.start + 1
        if longest_empty_bin_run_length < length:
            longest_empty_bin_run_length = length
    assert 0 < longest_empty_bin_run_length

    # Find all the empty bin runs that are tied for longest.
    # O(n)
    longest_empty_bin_runs = []
    for empty_bin_run in empty_bin_runs:
        length = empty_bin_run.end - empty_bin_run.start + 1
        assert length <= longest_empty_bin_run_length
        if length == longest_empty_bin_run_length:
            longest_empty_bin_runs.append(empty_bin_run)
    assert 0 < len(longest_empty_bin_runs)
    assert len(longest_empty_bin_runs) <= len(empty_bin_runs)

    # For each of the longest empty bin runs, find the distance between the
    # highest element of the previous bin and the lowest element of the next
    # bin.
    # O(n)
    interval_lengths = []
    for empty_bin_run in empty_bin_runs:
        interval_lengths.append(bins[empty_bin_run.end + 1].lo -
                                bins[empty_bin_run.start - 1].hi)

    # Find the maximum interval length
    # O(n)
    return max(interval_lengths)

input1 = [2, 1, 100, 102, 8, 5]
input2 = [6, 106, 205, 303, 400, 56, 155, 254]

print(biggest_difference(input1))
print(biggest_difference(input2))

edit: well it almost works.
Last edited by scarecrovv on Fri May 01, 2015 3:26 pm UTC, edited 1 time in total.

Nitrodon
Posts: 497
Joined: Wed Dec 19, 2007 5:11 pm UTC

Re: A fun problem: Biggest difference between elements

Postby Nitrodon » Fri May 01, 2015 2:29 pm UTC

If I understand your algorithm correctly, it fails with the inputs {1, 10, 18, 26} and {1, 12, 22, 23, 25} (or any rearrangements thereof).
Spoiler:
The largest difference may include one fewer empty bin than the longest run. Fortunately, the algorithm still runs in O(n) time after accounting for this, even if the longest run is of length 1.

User avatar
scarecrovv
It's pronounced 'double u'
Posts: 674
Joined: Wed Jul 30, 2008 4:09 pm UTC
Location: California

Re: A fun problem: Biggest difference between elements

Postby scarecrovv » Fri May 01, 2015 3:25 pm UTC

Nitrodon wrote:If I understand your algorithm correctly, it fails with the inputs {1, 10, 18, 26} and {1, 12, 22, 23, 25} (or any rearrangements thereof).
Spoiler:
The largest difference may include one fewer empty bin than the longest run. Fortunately, the algorithm still runs in O(n) time after accounting for this, even if the longest run is of length 1.

So it does. Good catch.

edit: Actually due to a bug in my implementation it gets the right answer for your second sequence, but not your first one. Investigating...

edit 2: Here's a fixed version. I've learned my lesson and written a unit test. Also, despite my algorithm theoretically having a better asymptotic time complexity than python's built in sort function, a "naive" implementation actually performs far better in practice on inputs my computer can handle. I'm sure that for sufficiently large inputs my algorithm would win, but my computer can't handle inputs that big.
Spoiler:

Code: Select all

#!/usr/bin/python

import time
import random
import collections

def biggest_difference(inputs):
    assert 1 < len(inputs)

    # Make sure we've got floats.
    xs = [float(x) for x in inputs]

    # Going to sort items into bins, like the beginning of a radix sort but
    # without sorting the individual bins later. Each bin just remembers the
    # minimum and maximum value. Both are None for an empty bin.
    Bin = collections.namedtuple('Bin', ['lo', 'hi'])

    # More bins than items guarantees at least one empty bin, and it won't be
    # the first or last. However, while there will be at least one empty bin,
    # and the longest interval must be at least as long as an empty bin, the
    # start and end of the longest interval may be near opposing ends of
    # adjacent bins, in which case there may be no complete bins contained in
    # the interval. Doubling the number of bins cuts the length of the bins in
    # half, ensuring that even an interval like this will contain a bin.
    # O(1)
    n_bins = 2 * (len(xs) + 1)

    # Find the minimum and maximum values, these will be the start of the first
    # bin and end of the last bin.
    # O(n)
    minimum = min(xs)
    maximum = max(xs)

    # Determine the width of each bin.
    # O(1)
    bin_width = (maximum - minimum) / n_bins

    # Create list of empty bins
    # O(n)
    bins = [Bin(lo=None, hi=None)] * n_bins

    # Add items to bins.
    # O(n)
    for x in xs:
        # Figure out which bin this item goes in.
        bin_idx = int((x - minimum) / bin_width)

        # The above calculation will put the maximum item in a bin past the end
        # of the list. Shove it into the last bin instead.
        if bin_idx == n_bins:
            assert x == maximum
            bin_idx -= 1

        # Add the item to the bin.
        prev_bin = bins[bin_idx]
        if prev_bin.lo is None:
            assert prev_bin.hi is None
            bins[bin_idx] = Bin(lo=x, hi=x)
        else:
            assert prev_bin.hi is not None
            if x < prev_bin.lo:
                bins[bin_idx] = Bin(lo=x, hi=prev_bin.hi)
            elif prev_bin.hi < x:
                bins[bin_idx] = Bin(lo=prev_bin.lo, hi=x)

    assert bins[0].lo == minimum
    assert bins[n_bins - 1].hi == maximum

    # Find all the runs of consecutive empty bins. There must be at least one.
    # O(n)
    EmptyBinRun = collections.namedtuple('EmptyBinRun', ['start', 'end'])
    empty_bin_runs = []
    this_empty_bin_run = None
    for this_bin, idx in zip(bins, range(n_bins)):
        if this_empty_bin_run is None:
            if this_bin.lo is None:
                assert this_bin.hi is None
                this_empty_bin_run = EmptyBinRun(start=idx, end=idx)
        else:
            if this_bin.lo is None:
                assert this_bin.hi is None
                this_empty_bin_run = EmptyBinRun(start=this_empty_bin_run.start,
                                                 end=idx)
            else:
                assert this_bin.hi is not None
                empty_bin_runs.append(this_empty_bin_run)
                this_empty_bin_run = None
    if this_empty_bin_run is not None:
        empty_bin_runs.append(this_empty_bin_run)
    assert 0 < len(empty_bin_runs)

    # For each of the empty bin runs, find the distance between the highest
    # element of the previous bin and the lowest element of the next bin.
    # O(n)
    interval_lengths = []
    for empty_bin_run in empty_bin_runs:
        interval_lengths.append(bins[empty_bin_run.end + 1].lo -
                                bins[empty_bin_run.start - 1].hi)

    # Find the maximum interval length
    # O(n)
    return max(interval_lengths)

def inefficient_check_implementation(inputs):
    # Make sure we've got floats.
    xs = [float(x) for x in inputs]

    # Find all the interval lengths by sorting.
    interval_lengths = []
    prev_x = None
    for x in sorted(xs):
        if prev_x is not None:
            interval_lengths.append(x - prev_x)
        prev_x = x

    # Find the maximum interval length
    return max(interval_lengths)

efficient_time = {}
inefficient_time = {}
def test(n):
    xs = [random.random() for i in range(n)]

    start = time.time()
    efficient_answer = biggest_difference(xs)
    if n not in efficient_time:
        efficient_time[n] = 0
    efficient_time[n] += time.time() - start

    start = time.time()
    inefficient_answer = inefficient_check_implementation(xs)
    if n not in inefficient_time:
        inefficient_time[n] = 0
    inefficient_time[n] += time.time() - start

    assert efficient_answer == inefficient_answer

print('Basic tests')
input1 = [2, 1, 100, 102, 8, 5]
input2 = [6, 106, 205, 303, 400, 56, 155, 254]
input3 = [1, 10, 18, 26]
input4 = [1, 12, 22, 23, 25]

print(biggest_difference(input1), inefficient_check_implementation(input1))
print(biggest_difference(input2), inefficient_check_implementation(input2))
print(biggest_difference(input3), inefficient_check_implementation(input3))
print(biggest_difference(input4), inefficient_check_implementation(input4))

print('Exhaustive tests')
for i in range(1000):
    test(2)
    test(3)
    test(4)
    test(5)
    test(10)
    test(100)
    test(1000)

print('Testing performance')
efficient_time = {}
inefficient_time = {}
for i in range(1):
    test(10)
    test(100)
    test(1000)
    test(10000)
    test(100000)
    test(1000000)
    test(10000000)

assert sorted(efficient_time.keys()) == sorted(inefficient_time.keys())

for n in sorted(efficient_time.keys()):
    print(n, efficient_time[n], inefficient_time[n], efficient_time[n] / inefficient_time[n])

Output:

Code: Select all

Basic tests
92.0 92.0
97.0 97.0
9.0 9.0
11.0 11.0
Exhaustive tests
Testing performance
10 0.0013508796691894531 1.3589859008789062e-05 99.40350877192982
100 0.0020041465759277344 7.867813110351562e-05 25.472727272727273
1000 0.008816242218017578 0.0009288787841796875 9.491273100616016
10000 0.09358835220336914 0.00989985466003418 9.453507694530742
100000 0.9028651714324951 0.1184701919555664 7.62103240088549
1000000 12.46615982055664 1.7965278625488281 6.939029491516067
10000000 135.46248650550842 22.369660139083862 6.055634536388456

Column 1 is the input size. Column 2 is my algorithm's time. Column 3 is the naive algorithm's time. Column 4 is the ratio of the two. It gets better with larger inputs, but python's built in sort is just so fast compared to my complex algorithm that it beats my algorithm by a factor of 6 even on a list of 10,000,000 elements.

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3057
Joined: Sat May 09, 2009 11:55 pm UTC

Re: A fun problem: Biggest difference between elements

Postby Qaanol » Fri May 01, 2015 10:33 pm UTC

Without reading any of the other posts, I’ve found a solution that uses 2n extra space. It uses that space very inefficiently though, so I expect better is possible:

Spoiler:
Given an array of n elements, traverse it once to find its min and max. If max == min, return 0. Otherwise:

Calculate the average gap size g = (max-min) / (n-1). The largest gap must be at least g.

We are going to “bin” the elements by which multiple of g they round down to, so that any two elements in the same bin are less than g apart and thus cannot form the largest gap.

Allocate two arrays of size n. Call one binMax, which will store the largest element in each bin, and initialize its entries to min. Call the other binMin, which will store the smallest element in each bin, and initialize its entries to max.

Go through the array a second time. For each element x, calculate b = floor((x-min) / g). This is the bin into which x falls. Update binMax[b] and binMin[b] with x as appropriate.

The largest gap cannot be within a bin, so it must be between some binMax[i] and the next binMin[j] for which binMin[j]<=binMax[j].

Traverse the bins once to find the largest such gap, and return it.
wee free kings

Derek
Posts: 2176
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: A fun problem: Biggest difference between elements

Postby Derek » Sat May 02, 2015 1:11 am UTC

This one is tough, but I think I have a solution:

Spoiler:
Given N unsorted elements, we can find the maximum and the minimum in O(N) time.

The largest gap must be at least (max - min)/(N - 1). This is pretty easy to see, but if you want to prove it I think you can use some variation of the mean value theorem, or perhaps the pigeon hole principle.

My strategy is to bucket the elements into O(N) buckets which cover a range of less than (max - min)/(N - 1) each. This can be done in O(1) time for each element, as I will show in the code. Then the elements on either side of the largest gap must be in separate buckets. Then the minimum and maximum of each bucket can be found in O(bucket size) time, or O(N) time total. Finally, we go through the list of buckets, finding the gap between the maximum of each bucket and the minimum of the next largest non-empty bucket. This takes O(N) time (since there are O(N) buckets). Then we return the maximum of those gaps, which is up to O(N). All put together, this is O(N).

Code: Select all

import math

def largest_gap(elems):
   N = len(elems)
   if N == 0:
      return 0
   lower = float(min(elems))
   upper = float(max(elems))
   diff = upper - lower
   if diff == 0:
      return 0
   
   # Normalize elements so that min = 0 and max = N.
   # This simplifies the math of bucketing.
   normalization = float(N)/diff
   normed_elems = [(e - lower)*normalization for e in elems]
   
   # The largest gap is now >= N/(N-1) > 1
   
   # Create buckets. The extra bucket holds upper
   # (depending on how rounding works).
   buckets = [[] for i in range(N+1)]
   
   # Bucket elems.
   for e in normed_elems:
      bucket = buckets[int(math.floor(e))]
      bucket.append(e)
   
   # Compute gaps between buckets.
   gaps = []
   prev_bucket = None
   for bucket in buckets:
      if len(bucket) == 0:
         continue
      if prev_bucket:
         gaps.append(min(bucket) - max(prev_bucket))
      prev_bucket = bucket
   
   return max(gaps)/normalization


I've tested this on all the example inputs shown in this thread and it works.

EDIT: So yeah, I have the same algorithm as Scarecrovv and Qaanol. My code is shorter than Scarecrovv's though :P

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3057
Joined: Sat May 09, 2009 11:55 pm UTC

Re: A fun problem: Biggest difference between elements

Postby Qaanol » Sat May 02, 2015 4:21 am UTC

On further reflection, it is in fact possible to make do with only n extra space, by cleverly re-using the original array.
wee free kings

Derek
Posts: 2176
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: A fun problem: Biggest difference between elements

Postby Derek » Sat May 02, 2015 7:07 am UTC

Qaanol wrote:On further reflection, it is in fact possible to make do with only n extra space, by cleverly re-using the original array.

Yeah, but that's kind of a dick thing to do for a function. Unless you're call-by-value, but who passes an array with call-by-value?

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3057
Joined: Sat May 09, 2009 11:55 pm UTC

Re: A fun problem: Biggest difference between elements

Postby Qaanol » Sat May 02, 2015 8:33 am UTC

Okay fine then, it’s possible with only n extra space and no changes to the array, by making one extra pass over the data.
wee free kings


Return to “Computer Science”

Who is online

Users browsing this forum: No registered users and 5 guests