diff options
author | Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> | 2023-10-25 14:08:10 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-25 14:08:10 (GMT) |
commit | ed05bf600687ee506c37a5ee3bb63f37c42d3161 (patch) | |
tree | 12f1193ffd0ff0af3588c9e46bf3863f2f4c4560 | |
parent | 3d67b69820e2e612b4c0b108da0ab7b864b74103 (diff) | |
download | cpython-ed05bf600687ee506c37a5ee3bb63f37c42d3161.zip cpython-ed05bf600687ee506c37a5ee3bb63f37c42d3161.tar.gz cpython-ed05bf600687ee506c37a5ee3bb63f37c42d3161.tar.bz2 |
[3.12] gh-108590: Improve sqlite3 docs on encoding issues and how to handle those (GH-108699) (#111324)
Add a guide for how to handle non-UTF-8 text encodings.
Link to that guide from the 'text_factory' docs.
(cherry picked from commit 1262e41842957c3b402fc0cf0a6eb2ea230c828f)
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Corvin <corvin@corvin.dev>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
-rw-r--r-- | Doc/library/sqlite3.rst | 83 |
1 files changed, 50 insertions, 33 deletions
diff --git a/Doc/library/sqlite3.rst b/Doc/library/sqlite3.rst index 87ea465..3cb2f45 100644 --- a/Doc/library/sqlite3.rst +++ b/Doc/library/sqlite3.rst @@ -1123,6 +1123,10 @@ Connection objects f.write('%s\n' % line) con.close() + .. seealso:: + + :ref:`sqlite3-howto-encoding` + .. method:: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250) @@ -1189,6 +1193,10 @@ Connection objects .. versionadded:: 3.7 + .. seealso:: + + :ref:`sqlite3-howto-encoding` + .. method:: getlimit(category, /) Get a connection runtime limit. @@ -1410,39 +1418,8 @@ Connection objects and returns a text representation of it. The callable is invoked for SQLite values with the ``TEXT`` data type. By default, this attribute is set to :class:`str`. - If you want to return ``bytes`` instead, set *text_factory* to ``bytes``. - Example: - - .. testcode:: - - con = sqlite3.connect(":memory:") - cur = con.cursor() - - AUSTRIA = "Österreich" - - # by default, rows are returned as str - cur.execute("SELECT ?", (AUSTRIA,)) - row = cur.fetchone() - assert row[0] == AUSTRIA - - # but we can make sqlite3 always return bytestrings ... - con.text_factory = bytes - cur.execute("SELECT ?", (AUSTRIA,)) - row = cur.fetchone() - assert type(row[0]) is bytes - # the bytestrings will be encoded in UTF-8, unless you stored garbage in the - # database ... - assert row[0] == AUSTRIA.encode("utf-8") - - # we can also implement a custom text_factory ... - # here we implement one that appends "foo" to all strings - con.text_factory = lambda x: x.decode("utf-8") + "foo" - cur.execute("SELECT ?", ("bar",)) - row = cur.fetchone() - assert row[0] == "barfoo" - - con.close() + See :ref:`sqlite3-howto-encoding` for more details. .. attribute:: total_changes @@ -1601,7 +1578,6 @@ Cursor objects COMMIT; """) - .. method:: fetchone() If :attr:`~Cursor.row_factory` is ``None``, @@ -2580,6 +2556,47 @@ With some adjustments, the above recipe can be adapted to use a instead of a :class:`~collections.namedtuple`. +.. _sqlite3-howto-encoding: + +How to handle non-UTF-8 text encodings +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +By default, :mod:`!sqlite3` uses :class:`str` to adapt SQLite values +with the ``TEXT`` data type. +This works well for UTF-8 encoded text, but it might fail for other encodings +and invalid UTF-8. +You can use a custom :attr:`~Connection.text_factory` to handle such cases. + +Because of SQLite's `flexible typing`_, it is not uncommon to encounter table +columns with the ``TEXT`` data type containing non-UTF-8 encodings, +or even arbitrary data. +To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2) +encoded text, for example a table of Czech-English dictionary entries. +Assuming we now have a :class:`Connection` instance :py:data:`!con` +connected to this database, +we can decode the Latin-2 encoded text using this :attr:`~Connection.text_factory`: + +.. testcode:: + + con.text_factory = lambda data: str(data, encoding="latin2") + +For invalid UTF-8 or arbitrary data in stored in ``TEXT`` table columns, +you can use the following technique, borrowed from the :ref:`unicode-howto`: + +.. testcode:: + + con.text_factory = lambda data: str(data, errors="surrogateescape") + +.. note:: + + The :mod:`!sqlite3` module API does not support strings + containing surrogates. + +.. seealso:: + + :ref:`unicode-howto` + + .. _sqlite3-explanation: Explanation |