summaryrefslogtreecommitdiffstats
path: root/Doc/lib/libsets.tex
blob: e90e527d1ffa553be952784bf4b7e342820f11cb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
\section{\module{sets} ---
         Unordered collections of unique elements}

\declaremodule{standard}{sets}
\modulesynopsis{Implementation of sets of unique elements.}
\moduleauthor{Greg V. Wilson}{gvwilson@nevex.com}
\moduleauthor{Alex Martelli}{aleax@aleax.it}
\moduleauthor{Guido van Rossum}{guido@python.org}
\sectionauthor{Raymond D. Hettinger}{python@rcn.com}

\versionadded{2.3}

The \module{sets} module provides classes for constructing and manipulating
unordered collections of unique elements.  Common uses include membership
testing, removing duplicates from a sequence, and computing standard math
operations on sets such as intersection, union, difference, and symmetric
difference.

Like other collections, sets support \code{\var{x} in \var{set}},
\code{len(\var{set})}, and \code{for \var{x} in \var{set}}.  Being an
unordered collection, sets do not record element position or order of
insertion.  Accordingly, sets do not support indexing, slicing, or
other sequence-like behavior.

Most set applications use the \class{Set} class which provides every set
method except for \method{__hash__()}. For advanced applications requiring
a hash method, the \class{ImmutableSet} class adds a \method{__hash__()}
method but omits methods which alter the contents of the set. Both
\class{Set} and \class{ImmutableSet} derive from \class{BaseSet}, an
abstract class useful for determining whether something is a set:
\code{isinstance(\var{obj}, BaseSet)}.

The set classes are implemented using dictionaries.  Accordingly, the
requirements for set elements are the same as those for dictionary keys;
namely, that the element defines both \method{__eq__} and \method{__hash__}.
As a result, sets
cannot contain mutable elements such as lists or dictionaries.
However, they can contain immutable collections such as tuples or
instances of \class{ImmutableSet}.  For convenience in implementing
sets of sets, inner sets are automatically converted to immutable
form, for example, \code{Set([Set(['dog'])])} is transformed to
\code{Set([ImmutableSet(['dog'])])}.

\begin{classdesc}{Set}{\optional{iterable}}
Constructs a new empty \class{Set} object.  If the optional \var{iterable}
parameter is supplied, updates the set with elements obtained from iteration.
All of the elements in \var{iterable} should be immutable or be transformable
to an immutable using the protocol described in
section~\ref{immutable-transforms}.
\end{classdesc}

\begin{classdesc}{ImmutableSet}{\optional{iterable}}
Constructs a new empty \class{ImmutableSet} object.  If the optional
\var{iterable} parameter is supplied, updates the set with elements obtained
from iteration.  All of the elements in \var{iterable} should be immutable or
be transformable to an immutable using the protocol described in
section~\ref{immutable-transforms}.

Because \class{ImmutableSet} objects provide a \method{__hash__()} method,
they can be used as set elements or as dictionary keys.  \class{ImmutableSet}
objects do not have methods for adding or removing elements, so all of the
elements must be known when the constructor is called.
\end{classdesc}


\subsection{Set Objects \label{set-objects}}

Instances of \class{Set} and \class{ImmutableSet} both provide
the following operations:

\begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result}
  \lineiii{len(\var{s})}{}{cardinality of set \var{s}}

  \hline
  \lineiii{\var{x} in \var{s}}{}
         {test \var{x} for membership in \var{s}}
  \lineiii{\var{x} not in \var{s}}{}
         {test \var{x} for non-membership in \var{s}}
  \lineiii{\var{s}.issubset(\var{t})}{\code{\var{s} <= \var{t}}}
         {test whether every element in \var{s} is in \var{t}}
  \lineiii{\var{s}.issuperset(\var{t})}{\code{\var{s} >= \var{t}}}
         {test whether every element in \var{t} is in \var{s}}

  \hline
  \lineiii{\var{s}.union(\var{t})}{\var{s} \textbar{} \var{t}}
         {new set with elements from both \var{s} and \var{t}}
  \lineiii{\var{s}.intersection(\var{t})}{\var{s} \&\ \var{t}}
         {new set with elements common to \var{s} and \var{t}}
  \lineiii{\var{s}.difference(\var{t})}{\var{s} - \var{t}}
         {new set with elements in \var{s} but not in \var{t}}
  \lineiii{\var{s}.symmetric_difference(\var{t})}{\var{s} \^\ \var{t}}
         {new set with elements in either \var{s} or \var{t} but not both}
  \lineiii{\var{s}.copy()}{}
         {new set with a shallow copy of \var{s}}
\end{tableiii}

Note, the non-operator versions of \method{union()},
\method{intersection()}, \method{difference()}, and
\method{symmetric_difference()} will accept any iterable as an argument.
In contrast, their operator based counterparts require their arguments to
be sets.  This precludes error-prone constructions like
\code{Set('abc') \&\ 'cbs'} in favor of the more readable
\code{Set('abc').intersection('cbs')}.
\versionchanged[Formerly all arguments were required to be sets]{2.3.1}

In addition, both \class{Set} and \class{ImmutableSet}
support set to set comparisons.  Two sets are equal if and only if
every element of each set is contained in the other (each is a subset
of the other).
A set is less than another set if and only if the first set is a proper
subset of the second set (is a subset, but is not equal).
A set is greater than another set if and only if the first set is a proper
superset of the second set (is a superset, but is not equal).

The subset and equality comparisons do not generalize to a complete
ordering function.  For example, any two disjoint sets are not equal and
are not subsets of each other, so \emph{all} of the following return
\code{False}:  \code{\var{a}<\var{b}}, \code{\var{a}==\var{b}}, or
\code{\var{a}>\var{b}}.
Accordingly, sets do not implement the \method{__cmp__} method.

Since sets only define partial ordering (subset relationships), the output
of the \method{list.sort()} method is undefined for lists of sets.

The following table lists operations available in \class{ImmutableSet}
but not found in \class{Set}:

\begin{tableii}{c|l}{code}{Operation}{Result}
  \lineii{hash(\var{s})}{returns a hash value for \var{s}}
\end{tableii}

The following table lists operations available in \class{Set}
but not found in \class{ImmutableSet}:

\begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result}
  \lineiii{\var{s}.update(\var{t})}
         {\var{s} \textbar= \var{t}}
         {return set \var{s} with elements added from \var{t}}
  \lineiii{\var{s}.intersection_update(\var{t})}
         {\var{s} \&= \var{t}}
         {return set \var{s} keeping only elements also found in \var{t}}
  \lineiii{\var{s}.difference_update(\var{t})}
         {\var{s} -= \var{t}}
         {return set \var{s} after removing elements found in \var{t}}
  \lineiii{\var{s}.symmetric_difference_update(\var{t})}
         {\var{s} \textasciicircum= \var{t}}
         {return set \var{s} with elements from \var{s} or \var{t}
          but not both}

  \hline
  \lineiii{\var{s}.add(\var{x})}{}
         {add element \var{x} to set \var{s}}
  \lineiii{\var{s}.remove(\var{x})}{}
         {remove \var{x} from set \var{s}; raises KeyError if not present}
  \lineiii{\var{s}.discard(\var{x})}{}
         {removes \var{x} from set \var{s} if present}
  \lineiii{\var{s}.pop()}{}
         {remove and return an arbitrary element from \var{s}; raises
	  KeyError if empty}
  \lineiii{\var{s}.clear()}{}
         {remove all elements from set \var{s}}
\end{tableiii}

Note, the non-operator versions of \method{update()},
\method{intersection_update()}, \method{difference_update()}, and
\method{symmetric_difference_update()} will accept any iterable as
an argument.
\versionchanged[Formerly all arguments were required to be sets]{2.3.1}

Also note, the module also includes a \method{union_update()} method
which is an alias for \method{update()}.  The method is included for
backwards compatability.  Programmers should prefer the
\method{update()} method because it the one supported by the builtin
\class{set()} and \class{frozenset()} types.

\subsection{Example \label{set-example}}

\begin{verbatim}
>>> from sets import Set
>>> engineers = Set(['John', 'Jane', 'Jack', 'Janice'])
>>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice'])
>>> managers = Set(['Jane', 'Jack', 'Susan', 'Zack'])
>>> employees = engineers | programmers | managers           # union
>>> engineering_management = engineers & managers            # intersection
>>> fulltime_management = managers - engineers - programmers # difference
>>> engineers.add('Marvin')                                  # add element
>>> print engineers
Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
>>> employees.issuperset(engineers)           # superset test
False
>>> employees.union_update(engineers)         # update from another set
>>> employees.issuperset(engineers)
True
>>> for group in [engineers, programmers, managers, employees]:
...     group.discard('Susan')                # unconditionally remove element
...     print group
...
Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
Set(['Janice', 'Jack', 'Sam'])
Set(['Jane', 'Zack', 'Jack'])
Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack'])
\end{verbatim}


\subsection{Protocol for automatic conversion to immutable
            \label{immutable-transforms}}

Sets can only contain immutable elements.  For convenience, mutable
\class{Set} objects are automatically copied to an \class{ImmutableSet}
before being added as a set element.

The mechanism is to always add a hashable element, or if it is not
hashable, the element is checked to see if it has an
\method{__as_immutable__()} method which returns an immutable equivalent.

Since \class{Set} objects have a \method{__as_immutable__()} method
returning an instance of \class{ImmutableSet}, it is possible to
construct sets of sets.

A similar mechanism is needed by the \method{__contains__()} and
\method{remove()} methods which need to hash an element to check
for membership in a set.  Those methods check an element for hashability
and, if not, check for a \method{__as_temporarily_immutable__()} method
which returns the element wrapped by a class that provides temporary
methods for \method{__hash__()}, \method{__eq__()}, and \method{__ne__()}.

The alternate mechanism spares the need to build a separate copy of
the original mutable object.

\class{Set} objects implement the \method{__as_temporarily_immutable__()}
method which returns the \class{Set} object wrapped by a new class
\class{_TemporarilyImmutableSet}.

The two mechanisms for adding hashability are normally invisible to the
user; however, a conflict can arise in a multi-threaded environment
where one thread is updating a set while another has temporarily wrapped it
in \class{_TemporarilyImmutableSet}.  In other words, sets of mutable sets
are not thread-safe.


\subsection{Comparison to the built-in \class{set} types
            \label{comparison-to-builtin-set}}

The built-in \class{set} and \class{frozenset} types were designed based
on lessons learned from the \module{sets} module.  The key differences are:

\begin{itemize}
\item \class{Set} and \class{ImmutableSet} were renamed to \class{set} and
      \class{frozenset}.
\item There is no equivalent to \class{BaseSet}.  Instead, use
      \code{isinstance(x, (set, frozenset))}.
\item The hash algorithm for the built-ins performs significantly better
      (fewer collisions) for most datasets.
\item The built-in versions have more space efficient pickles.
\item The built-in versions do not have a \method{union_update()} method.
      Instead, use the \method{update()} method which is equivalent.
\item The built-in versions do not have a \method{_repr(sort=True)} method.
      Instead, use the built-in \function{repr()} and \function{sorted()}
      functions:  \code{repr(sorted(s))}.
\item The built-in version does not have a protocol for automatic conversion
      to immutable.  Many found this feature to be confusing and no one
      in the community reported having found real uses for it.
\end{itemize}