Memoizing a generator in Python

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
LucasBrown
Posts: 293
Joined: Thu Apr 15, 2010 2:57 am UTC
Location: Poway, CA

Memoizing a generator in Python

Postby LucasBrown » Wed Jun 08, 2016 11:50 pm UTC

I'd like to memoize a generator in Python. Specifically, I have an incremental sieve of Eratosthenes that gets called from scratch a lot, and I'd like to avoid recomputing (up to some arbitrary finite number of) the numbers it yields. Memoizing a pure function is easy enough, and there are any number of ways to do it, but since this is a generator, and since Python doesn't seem to allow copying generators, things get a bit trickier.

What I want to do is something like the following:
Spoiler:

Code: Select all

nsprimes = 10**6    # We'll cache this many values.

def primegen(limit=None):
    ...?

for p in primegen():
    ...
            break
    ...

# While running that loop we yielded 120 primes from primegen().
# The cache of primes now contains 120 values.

for p in primegen(17):
    ...
    # While running this loop, we don't recompute any primes,
    # since they've already been cached.
    ...

for p in primegen(10**8):
    ...
    # There are over 5,000,000 primes < 10**8.  So while
    # running this loop, we produce the first 120 primes
    # quickly, since they're already cached, then compute
    # and cache the next 998,980 primes, and then
    # compute but don't cache the remaining primes < 10**8.
    ...

for p in primegen(10**6):
    ...
    # This time, we don't recompute anything, since
    # there are < 1,000,000 primes under 1,000,000,
    # and since these values are already cached.
    ...

for p in primegen(10**8):
    ...
    # Again, we don't recompute the first million primes
    # since they've already been cached, but since the loop
    # will cycle over more than 5,000,000 primes, we need
    # to recompute the primes after the first million.
    ...


If I had infinite memory, then I wouldn't have to care about the cache size: I'd be able to do

Code: Select all

def primer():
   # Generates the primes.
   ...

smallprimes = []    # We'll cache the primes here.
_primer = primer()

def primegen(limit=None):
    # Generates primes < limit, or all of them if limit == None.
    # Caches the results along the way.
    p = 0
    for p in smallprimes:
        if (limit is None) or (p < limit): yield p
        else: return
    for p in _primer:
        smallprimes.append(p)
        if (limit is None) or (p < limit): yield p
        else: return


But since memory is a finite resource, I have to worry about the cache size. If we could copy generators, then this would still be a fairly easy thing to do: a solution might resemble

Code: Select all

def primer():
    # Generates the primes.
    ...

nsprimes = 10**6    # We'll cache this many values.
smallprimes = []    # We'll cache the primes here.
_primer = primer()

def primegen(limit=None):
    # Generates primes < limit, or all of them if limit == None.
    # Caches the first nsprimes results along the way.
    p = 0
    for p in smallprimes:
        if (limit is None) or (p < limit): yield p
        else: return
    for p in _primer.copy():
        smallprimes.append(p)
        if (limit is None) or (p < limit): yield p
        else: return

Unfortunately, generators don't seem to be copyable: they don't support the .copy() attribute, using the copy module also throws errors, and I even found a page in the Python docs that indicates that generators can't be copied.

There's this recipe, but that uses a bytecode hack and isn't portable across versions. The pickle module might provide a solution, but that also seems rather hacky to me, and I avoid it on general principles because the pickle module isn't secure.

Is there a "natural" way to do what I want to do?

User avatar
You, sir, name?
Posts: 6974
Joined: Sun Apr 22, 2007 10:07 am UTC
Location: Chako Paul City
Contact:

Re: Memoizing a generator in Python

Postby You, sir, name? » Thu Jun 09, 2016 12:17 am UTC

Had a look at the solutions here? http://stackoverflow.com/questions/4566 ... -generator

They don't copy the generator, but use itertools.tee to achieve the memoization.
I edit my posts a lot and sometimes the words wrong order words appear in sentences get messed up.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 13 guests