diff options
Diffstat (limited to 'Doc/library/itertools.rst')
-rw-r--r-- | Doc/library/itertools.rst | 547 |
1 files changed, 547 insertions, 0 deletions
diff --git a/Doc/library/itertools.rst b/Doc/library/itertools.rst new file mode 100644 index 0000000..9f9cb24 --- /dev/null +++ b/Doc/library/itertools.rst @@ -0,0 +1,547 @@ + +:mod:`itertools` --- Functions creating iterators for efficient looping +======================================================================= + +.. module:: itertools + :synopsis: Functions creating iterators for efficient looping. +.. moduleauthor:: Raymond Hettinger <python@rcn.com> +.. sectionauthor:: Raymond Hettinger <python@rcn.com> + + +.. versionadded:: 2.3 + +This module implements a number of iterator building blocks inspired by +constructs from the Haskell and SML programming languages. Each has been recast +in a form suitable for Python. + +The module standardizes a core set of fast, memory efficient tools that are +useful by themselves or in combination. Standardization helps avoid the +readability and reliability problems which arise when many different individuals +create their own slightly varying implementations, each with their own quirks +and naming conventions. + +The tools are designed to combine readily with one another. This makes it easy +to construct more specialized tools succinctly and efficiently in pure Python. + +For instance, SML provides a tabulation tool: ``tabulate(f)`` which produces a +sequence ``f(0), f(1), ...``. This toolbox provides :func:`imap` and +:func:`count` which can be combined to form ``imap(f, count())`` and produce an +equivalent result. + +Likewise, the functional tools are designed to work well with the high-speed +functions provided by the :mod:`operator` module. + +The module author welcomes suggestions for other basic building blocks to be +added to future versions of the module. + +Whether cast in pure python form or compiled code, tools that use iterators are +more memory efficient (and faster) than their list based counterparts. Adopting +the principles of just-in-time manufacturing, they create data when and where +needed instead of consuming memory with the computer equivalent of "inventory". + +The performance advantage of iterators becomes more acute as the number of +elements increases -- at some point, lists grow large enough to severely impact +memory cache performance and start running slowly. + + +.. seealso:: + + The Standard ML Basis Library, `The Standard ML Basis Library + <http://www.standardml.org/Basis/>`_. + + Haskell, A Purely Functional Language, `Definition of Haskell and the Standard + Libraries <http://www.haskell.org/definition/>`_. + + +.. _itertools-functions: + +Itertool functions +------------------ + +The following module functions all construct and return iterators. Some provide +streams of infinite length, so they should only be accessed by functions or +loops that truncate the stream. + + +.. function:: chain(*iterables) + + Make an iterator that returns elements from the first iterable until it is + exhausted, then proceeds to the next iterable, until all of the iterables are + exhausted. Used for treating consecutive sequences as a single sequence. + Equivalent to:: + + def chain(*iterables): + for it in iterables: + for element in it: + yield element + + +.. function:: count([n]) + + Make an iterator that returns consecutive integers starting with *n*. If not + specified *n* defaults to zero. Does not currently support python long + integers. Often used as an argument to :func:`imap` to generate consecutive + data points. Also, used with :func:`izip` to add sequence numbers. Equivalent + to:: + + def count(n=0): + while True: + yield n + n += 1 + + Note, :func:`count` does not check for overflow and will return negative numbers + after exceeding ``sys.maxint``. This behavior may change in the future. + + +.. function:: cycle(iterable) + + Make an iterator returning elements from the iterable and saving a copy of each. + When the iterable is exhausted, return elements from the saved copy. Repeats + indefinitely. Equivalent to:: + + def cycle(iterable): + saved = [] + for element in iterable: + yield element + saved.append(element) + while saved: + for element in saved: + yield element + + Note, this member of the toolkit may require significant auxiliary storage + (depending on the length of the iterable). + + +.. function:: dropwhile(predicate, iterable) + + Make an iterator that drops elements from the iterable as long as the predicate + is true; afterwards, returns every element. Note, the iterator does not produce + *any* output until the predicate first becomes false, so it may have a lengthy + start-up time. Equivalent to:: + + def dropwhile(predicate, iterable): + iterable = iter(iterable) + for x in iterable: + if not predicate(x): + yield x + break + for x in iterable: + yield x + + +.. function:: groupby(iterable[, key]) + + Make an iterator that returns consecutive keys and groups from the *iterable*. + The *key* is a function computing a key value for each element. If not + specified or is ``None``, *key* defaults to an identity function and returns + the element unchanged. Generally, the iterable needs to already be sorted on + the same key function. + + The operation of :func:`groupby` is similar to the ``uniq`` filter in Unix. It + generates a break or new group every time the value of the key function changes + (which is why it is usually necessary to have sorted the data using the same key + function). That behavior differs from SQL's GROUP BY which aggregates common + elements regardless of their input order. + + The returned group is itself an iterator that shares the underlying iterable + with :func:`groupby`. Because the source is shared, when the :func:`groupby` + object is advanced, the previous group is no longer visible. So, if that data + is needed later, it should be stored as a list:: + + groups = [] + uniquekeys = [] + data = sorted(data, key=keyfunc) + for k, g in groupby(data, keyfunc): + groups.append(list(g)) # Store group iterator as a list + uniquekeys.append(k) + + :func:`groupby` is equivalent to:: + + class groupby(object): + def __init__(self, iterable, key=None): + if key is None: + key = lambda x: x + self.keyfunc = key + self.it = iter(iterable) + self.tgtkey = self.currkey = self.currvalue = [] + def __iter__(self): + return self + def __next__(self): + while self.currkey == self.tgtkey: + self.currvalue = next(self.it) # Exit on StopIteration + self.currkey = self.keyfunc(self.currvalue) + self.tgtkey = self.currkey + return (self.currkey, self._grouper(self.tgtkey)) + def _grouper(self, tgtkey): + while self.currkey == tgtkey: + yield self.currvalue + self.currvalue = next(self.it) # Exit on StopIteration + self.currkey = self.keyfunc(self.currvalue) + + .. versionadded:: 2.4 + + +.. function:: ifilter(predicate, iterable) + + Make an iterator that filters elements from iterable returning only those for + which the predicate is ``True``. If *predicate* is ``None``, return the items + that are true. Equivalent to:: + + def ifilter(predicate, iterable): + if predicate is None: + predicate = bool + for x in iterable: + if predicate(x): + yield x + + +.. function:: ifilterfalse(predicate, iterable) + + Make an iterator that filters elements from iterable returning only those for + which the predicate is ``False``. If *predicate* is ``None``, return the items + that are false. Equivalent to:: + + def ifilterfalse(predicate, iterable): + if predicate is None: + predicate = bool + for x in iterable: + if not predicate(x): + yield x + + +.. function:: imap(function, *iterables) + + Make an iterator that computes the function using arguments from each of the + iterables. If *function* is set to ``None``, then :func:`imap` returns the + arguments as a tuple. Like :func:`map` but stops when the shortest iterable is + exhausted instead of filling in ``None`` for shorter iterables. The reason for + the difference is that infinite iterator arguments are typically an error for + :func:`map` (because the output is fully evaluated) but represent a common and + useful way of supplying arguments to :func:`imap`. Equivalent to:: + + def imap(function, *iterables): + iterables = map(iter, iterables) + while True: + args = [next(i) for i in iterables] + if function is None: + yield tuple(args) + else: + yield function(*args) + + +.. function:: islice(iterable, [start,] stop [, step]) + + Make an iterator that returns selected elements from the iterable. If *start* is + non-zero, then elements from the iterable are skipped until start is reached. + Afterward, elements are returned consecutively unless *step* is set higher than + one which results in items being skipped. If *stop* is ``None``, then iteration + continues until the iterator is exhausted, if at all; otherwise, it stops at the + specified position. Unlike regular slicing, :func:`islice` does not support + negative values for *start*, *stop*, or *step*. Can be used to extract related + fields from data where the internal structure has been flattened (for example, a + multi-line report may list a name field on every third line). Equivalent to:: + + def islice(iterable, *args): + s = slice(*args) + it = iter(range(s.start or 0, s.stop or sys.maxint, s.step or 1)) + nexti = next(it) + for i, element in enumerate(iterable): + if i == nexti: + yield element + nexti = next(it) + + If *start* is ``None``, then iteration starts at zero. If *step* is ``None``, + then the step defaults to one. + + .. versionchanged:: 2.5 + accept ``None`` values for default *start* and *step*. + + +.. function:: izip(*iterables) + + Make an iterator that aggregates elements from each of the iterables. Like + :func:`zip` except that it returns an iterator instead of a list. Used for + lock-step iteration over several iterables at a time. Equivalent to:: + + def izip(*iterables): + iterables = map(iter, iterables) + while iterables: + result = [next(it) for it in iterables] + yield tuple(result) + + .. versionchanged:: 2.4 + When no iterables are specified, returns a zero length iterator instead of + raising a :exc:`TypeError` exception. + + Note, the left-to-right evaluation order of the iterables is guaranteed. This + makes possible an idiom for clustering a data series into n-length groups using + ``izip(*[iter(s)]*n)``. For data that doesn't fit n-length groups exactly, the + last tuple can be pre-padded with fill values using ``izip(*[chain(s, + [None]*(n-1))]*n)``. + + Note, when :func:`izip` is used with unequal length inputs, subsequent + iteration over the longer iterables cannot reliably be continued after + :func:`izip` terminates. Potentially, up to one entry will be missing from + each of the left-over iterables. This occurs because a value is fetched from + each iterator in- turn, but the process ends when one of the iterators + terminates. This leaves the last fetched values in limbo (they cannot be + returned in a final, incomplete tuple and they are cannot be pushed back into + the iterator for retrieval with ``next(it)``). In general, :func:`izip` + should only be used with unequal length inputs when you don't care about + trailing, unmatched values from the longer iterables. + + +.. function:: izip_longest(*iterables[, fillvalue]) + + Make an iterator that aggregates elements from each of the iterables. If the + iterables are of uneven length, missing values are filled-in with *fillvalue*. + Iteration continues until the longest iterable is exhausted. Equivalent to:: + + def izip_longest(*args, **kwds): + fillvalue = kwds.get('fillvalue') + def sentinel(counter = ([fillvalue]*(len(args)-1)).pop): + yield counter() # yields the fillvalue, or raises IndexError + fillers = repeat(fillvalue) + iters = [chain(it, sentinel(), fillers) for it in args] + try: + for tup in izip(*iters): + yield tup + except IndexError: + pass + + If one of the iterables is potentially infinite, then the :func:`izip_longest` + function should be wrapped with something that limits the number of calls (for + example :func:`islice` or :func:`takewhile`). + + .. versionadded:: 2.6 + + +.. function:: repeat(object[, times]) + + Make an iterator that returns *object* over and over again. Runs indefinitely + unless the *times* argument is specified. Used as argument to :func:`imap` for + invariant parameters to the called function. Also used with :func:`izip` to + create an invariant part of a tuple record. Equivalent to:: + + def repeat(object, times=None): + if times is None: + while True: + yield object + else: + for i in range(times): + yield object + + +.. function:: starmap(function, iterable) + + Make an iterator that computes the function using arguments tuples obtained from + the iterable. Used instead of :func:`imap` when argument parameters are already + grouped in tuples from a single iterable (the data has been "pre-zipped"). The + difference between :func:`imap` and :func:`starmap` parallels the distinction + between ``function(a,b)`` and ``function(*c)``. Equivalent to:: + + def starmap(function, iterable): + iterable = iter(iterable) + while True: + yield function(*next(iterable)) + + +.. function:: takewhile(predicate, iterable) + + Make an iterator that returns elements from the iterable as long as the + predicate is true. Equivalent to:: + + def takewhile(predicate, iterable): + for x in iterable: + if predicate(x): + yield x + else: + break + + +.. function:: tee(iterable[, n=2]) + + Return *n* independent iterators from a single iterable. The case where ``n==2`` + is equivalent to:: + + def tee(iterable): + def gen(next, data={}, cnt=[0]): + for i in count(): + if i == cnt[0]: + item = data[i] = next() + cnt[0] += 1 + else: + item = data.pop(i) + yield item + it = iter(iterable) + return (gen(it.__next__), gen(it.__next__)) + + Note, once :func:`tee` has made a split, the original *iterable* should not be + used anywhere else; otherwise, the *iterable* could get advanced without the tee + objects being informed. + + Note, this member of the toolkit may require significant auxiliary storage + (depending on how much temporary data needs to be stored). In general, if one + iterator is going to use most or all of the data before the other iterator, it + is faster to use :func:`list` instead of :func:`tee`. + + .. versionadded:: 2.4 + + +.. _itertools-example: + +Examples +-------- + +The following examples show common uses for each tool and demonstrate ways they +can be combined. :: + + >>> amounts = [120.15, 764.05, 823.14] + >>> for checknum, amount in izip(count(1200), amounts): + ... print 'Check %d is for $%.2f' % (checknum, amount) + ... + Check 1200 is for $120.15 + Check 1201 is for $764.05 + Check 1202 is for $823.14 + + >>> import operator + >>> for cube in imap(operator.pow, range(1,5), repeat(3)): + ... print cube + ... + 1 + 8 + 27 + 64 + + >>> reportlines = ['EuroPython', 'Roster', '', 'alex', '', 'laura', + ... '', 'martin', '', 'walter', '', 'mark'] + >>> for name in islice(reportlines, 3, None, 2): + ... print name.title() + ... + Alex + Laura + Martin + Walter + Mark + + # Show a dictionary sorted and grouped by value + >>> from operator import itemgetter + >>> d = dict(a=1, b=2, c=1, d=2, e=1, f=2, g=3) + >>> di = sorted(d.iteritems(), key=itemgetter(1)) + >>> for k, g in groupby(di, key=itemgetter(1)): + ... print k, map(itemgetter(0), g) + ... + 1 ['a', 'c', 'e'] + 2 ['b', 'd', 'f'] + 3 ['g'] + + # Find runs of consecutive numbers using groupby. The key to the solution + # is differencing with a range so that consecutive numbers all appear in + # same group. + >>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28] + >>> for k, g in groupby(enumerate(data), lambda t:t[0]-t[1]): + ... print map(operator.itemgetter(1), g) + ... + [1] + [4, 5, 6] + [10] + [15, 16, 17, 18] + [22] + [25, 26, 27, 28] + + + +.. _itertools-recipes: + +Recipes +------- + +This section shows recipes for creating an extended toolset using the existing +itertools as building blocks. + +The extended tools offer the same high performance as the underlying toolset. +The superior memory performance is kept by processing elements one at a time +rather than bringing the whole iterable into memory all at once. Code volume is +kept small by linking the tools together in a functional style which helps +eliminate temporary variables. High speed is retained by preferring +"vectorized" building blocks over the use of for-loops and generators which +incur interpreter overhead. :: + + def take(n, seq): + return list(islice(seq, n)) + + def enumerate(iterable): + return izip(count(), iterable) + + def tabulate(function): + "Return function(0), function(1), ..." + return imap(function, count()) + + def iteritems(mapping): + return izip(mapping.iterkeys(), mapping.itervalues()) + + def nth(iterable, n): + "Returns the nth item or raise StopIteration" + return islice(iterable, n, None).next() + + def all(seq, pred=None): + "Returns True if pred(x) is true for every element in the iterable" + for elem in ifilterfalse(pred, seq): + return False + return True + + def any(seq, pred=None): + "Returns True if pred(x) is true for at least one element in the iterable" + for elem in ifilter(pred, seq): + return True + return False + + def no(seq, pred=None): + "Returns True if pred(x) is false for every element in the iterable" + for elem in ifilter(pred, seq): + return False + return True + + def quantify(seq, pred=None): + "Count how many times the predicate is true in the sequence" + return sum(imap(pred, seq)) + + def padnone(seq): + """Returns the sequence elements and then returns None indefinitely. + + Useful for emulating the behavior of the built-in map() function. + """ + return chain(seq, repeat(None)) + + def ncycles(seq, n): + "Returns the sequence elements n times" + return chain(*repeat(seq, n)) + + def dotproduct(vec1, vec2): + return sum(imap(operator.mul, vec1, vec2)) + + def flatten(listOfLists): + return list(chain(*listOfLists)) + + def repeatfunc(func, times=None, *args): + """Repeat calls to func with specified arguments. + + Example: repeatfunc(random.random) + """ + if times is None: + return starmap(func, repeat(args)) + else: + return starmap(func, repeat(args, times)) + + def pairwise(iterable): + "s -> (s0,s1), (s1,s2), (s2, s3), ..." + a, b = tee(iterable) + next(b, None) + return izip(a, b) + + def grouper(n, iterable, padvalue=None): + "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" + return izip(*[chain(iterable, repeat(padvalue, n-1))]*n) + + + |