summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorAntoine Pitrou <solipsis@pitrou.net>2011-11-25 15:33:53 (GMT)
committerAntoine Pitrou <solipsis@pitrou.net>2011-11-25 15:33:53 (GMT)
commitfd9ebd4a361805607baea3e038652f207575ced8 (patch)
tree4ab36059698f9ebb40ea9164f83571b1f380e1f8
parent5a53f368e61a5535571362e36c451827ee7d3a27 (diff)
downloadcpython-fd9ebd4a361805607baea3e038652f207575ced8.zip
cpython-fd9ebd4a361805607baea3e038652f207575ced8.tar.gz
cpython-fd9ebd4a361805607baea3e038652f207575ced8.tar.bz2
Clarify concatenation behaviour of immutable strings, and remove explicit
mention of the CPython optimization hack.
-rw-r--r--Doc/faq/programming.rst26
-rw-r--r--Doc/library/stdtypes.rst21
2 files changed, 38 insertions, 9 deletions
diff --git a/Doc/faq/programming.rst b/Doc/faq/programming.rst
index d1a3daf..f157a94 100644
--- a/Doc/faq/programming.rst
+++ b/Doc/faq/programming.rst
@@ -989,6 +989,32 @@ What does 'UnicodeDecodeError' or 'UnicodeEncodeError' error mean?
See the :ref:`unicode-howto`.
+What is the most efficient way to concatenate many strings together?
+--------------------------------------------------------------------
+
+:class:`str` and :class:`bytes` objects are immutable, therefore concatenating
+many strings together is inefficient as each concatenation creates a new
+object. In the general case, the total runtime cost is quadratic in the
+total string length.
+
+To accumulate many :class:`str` objects, the recommended idiom is to place
+them into a list and call :meth:`str.join` at the end::
+
+ chunks = []
+ for s in my_strings:
+ chunks.append(s)
+ result = ''.join(chunks)
+
+(another reasonably efficient idiom is to use :class:`io.StringIO`)
+
+To accumulate many :class:`bytes` objects, the recommended idiom is to extend
+a :class:`bytearray` object using in-place concatenation (the ``+=`` operator)::
+
+ result = bytearray()
+ for b in my_bytes_objects:
+ result += b
+
+
Sequences (Tuples/Lists)
========================
diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index af1e44a..5b54b09 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -964,15 +964,18 @@ Notes:
If *k* is ``None``, it is treated like ``1``.
(6)
- .. impl-detail::
-
- If *s* and *t* are both strings, some Python implementations such as
- CPython can usually perform an in-place optimization for assignments of
- the form ``s = s + t`` or ``s += t``. When applicable, this optimization
- makes quadratic run-time much less likely. This optimization is both
- version and implementation dependent. For performance sensitive code, it
- is preferable to use the :meth:`str.join` method which assures consistent
- linear concatenation performance across versions and implementations.
+ Concatenating immutable strings always results in a new object. This means
+ that building up a string by repeated concatenation will have a quadratic
+ runtime cost in the total string length. To get a linear runtime cost,
+ you must switch to one of the alternatives below:
+
+ * if concatenating :class:`str` objects, you can build a list and use
+ :meth:`str.join` at the end;
+
+ * if concatenating :class:`bytes` objects, you can similarly use
+ :meth:`bytes.join`, or you can do in-place concatenation with a
+ :class:`bytearray` object. :class:`bytearray` objects are mutable and
+ have an efficient overallocation mechanism.
.. _string-methods: