summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMiss Islington (bot) <31488909+miss-islington@users.noreply.github.com>2023-10-25 14:08:10 (GMT)
committerGitHub <noreply@github.com>2023-10-25 14:08:10 (GMT)
commited05bf600687ee506c37a5ee3bb63f37c42d3161 (patch)
tree12f1193ffd0ff0af3588c9e46bf3863f2f4c4560
parent3d67b69820e2e612b4c0b108da0ab7b864b74103 (diff)
downloadcpython-ed05bf600687ee506c37a5ee3bb63f37c42d3161.zip
cpython-ed05bf600687ee506c37a5ee3bb63f37c42d3161.tar.gz
cpython-ed05bf600687ee506c37a5ee3bb63f37c42d3161.tar.bz2
[3.12] gh-108590: Improve sqlite3 docs on encoding issues and how to handle those (GH-108699) (#111324)
Add a guide for how to handle non-UTF-8 text encodings. Link to that guide from the 'text_factory' docs. (cherry picked from commit 1262e41842957c3b402fc0cf0a6eb2ea230c828f) Co-authored-by: Erlend E. Aasland <erlend@python.org> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM> Co-authored-by: Corvin <corvin@corvin.dev> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
-rw-r--r--Doc/library/sqlite3.rst83
1 files changed, 50 insertions, 33 deletions
diff --git a/Doc/library/sqlite3.rst b/Doc/library/sqlite3.rst
index 87ea465..3cb2f45 100644
--- a/Doc/library/sqlite3.rst
+++ b/Doc/library/sqlite3.rst
@@ -1123,6 +1123,10 @@ Connection objects
f.write('%s\n' % line)
con.close()
+ .. seealso::
+
+ :ref:`sqlite3-howto-encoding`
+
.. method:: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
@@ -1189,6 +1193,10 @@ Connection objects
.. versionadded:: 3.7
+ .. seealso::
+
+ :ref:`sqlite3-howto-encoding`
+
.. method:: getlimit(category, /)
Get a connection runtime limit.
@@ -1410,39 +1418,8 @@ Connection objects
and returns a text representation of it.
The callable is invoked for SQLite values with the ``TEXT`` data type.
By default, this attribute is set to :class:`str`.
- If you want to return ``bytes`` instead, set *text_factory* to ``bytes``.
- Example:
-
- .. testcode::
-
- con = sqlite3.connect(":memory:")
- cur = con.cursor()
-
- AUSTRIA = "Österreich"
-
- # by default, rows are returned as str
- cur.execute("SELECT ?", (AUSTRIA,))
- row = cur.fetchone()
- assert row[0] == AUSTRIA
-
- # but we can make sqlite3 always return bytestrings ...
- con.text_factory = bytes
- cur.execute("SELECT ?", (AUSTRIA,))
- row = cur.fetchone()
- assert type(row[0]) is bytes
- # the bytestrings will be encoded in UTF-8, unless you stored garbage in the
- # database ...
- assert row[0] == AUSTRIA.encode("utf-8")
-
- # we can also implement a custom text_factory ...
- # here we implement one that appends "foo" to all strings
- con.text_factory = lambda x: x.decode("utf-8") + "foo"
- cur.execute("SELECT ?", ("bar",))
- row = cur.fetchone()
- assert row[0] == "barfoo"
-
- con.close()
+ See :ref:`sqlite3-howto-encoding` for more details.
.. attribute:: total_changes
@@ -1601,7 +1578,6 @@ Cursor objects
COMMIT;
""")
-
.. method:: fetchone()
If :attr:`~Cursor.row_factory` is ``None``,
@@ -2580,6 +2556,47 @@ With some adjustments, the above recipe can be adapted to use a
instead of a :class:`~collections.namedtuple`.
+.. _sqlite3-howto-encoding:
+
+How to handle non-UTF-8 text encodings
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default, :mod:`!sqlite3` uses :class:`str` to adapt SQLite values
+with the ``TEXT`` data type.
+This works well for UTF-8 encoded text, but it might fail for other encodings
+and invalid UTF-8.
+You can use a custom :attr:`~Connection.text_factory` to handle such cases.
+
+Because of SQLite's `flexible typing`_, it is not uncommon to encounter table
+columns with the ``TEXT`` data type containing non-UTF-8 encodings,
+or even arbitrary data.
+To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
+encoded text, for example a table of Czech-English dictionary entries.
+Assuming we now have a :class:`Connection` instance :py:data:`!con`
+connected to this database,
+we can decode the Latin-2 encoded text using this :attr:`~Connection.text_factory`:
+
+.. testcode::
+
+ con.text_factory = lambda data: str(data, encoding="latin2")
+
+For invalid UTF-8 or arbitrary data in stored in ``TEXT`` table columns,
+you can use the following technique, borrowed from the :ref:`unicode-howto`:
+
+.. testcode::
+
+ con.text_factory = lambda data: str(data, errors="surrogateescape")
+
+.. note::
+
+ The :mod:`!sqlite3` module API does not support strings
+ containing surrogates.
+
+.. seealso::
+
+ :ref:`unicode-howto`
+
+
.. _sqlite3-explanation:
Explanation