diff options
author | Raymond Hettinger <rhettinger@users.noreply.github.com> | 2022-07-10 07:40:27 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-07-10 07:40:27 (GMT) |
commit | ef61b259e35a0249840184b59f43d8a7f9b095bc (patch) | |
tree | 794e963854b81c51bf1c881594f631d169e320f3 /Doc | |
parent | 264b3ddfd561d97204ffb30be6a7d1fb0555e560 (diff) | |
download | cpython-ef61b259e35a0249840184b59f43d8a7f9b095bc.zip cpython-ef61b259e35a0249840184b59f43d8a7f9b095bc.tar.gz cpython-ef61b259e35a0249840184b59f43d8a7f9b095bc.tar.bz2 |
GH-77265: Document NaN handling in statistics functions that sort or count (#94676)
* Document NaN handling in functions that sort or count
* Update Doc/library/statistics.rst
Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
* Update Doc/library/statistics.rst
Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
* Fix trailing whitespace and rewrap text
Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/statistics.rst | 29 |
1 files changed, 29 insertions, 0 deletions
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index 347a1be..5aef6f6 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -35,6 +35,35 @@ and implementation-dependent. If your input data consists of mixed types, you may be able to use :func:`map` to ensure a consistent result, for example: ``map(float, input_data)``. +Some datasets use ``NaN`` (not a number) values to represent missing data. +Since NaNs have unusual comparison semantics, they cause surprising or +undefined behaviors in the statistics functions that sort data or that count +occurrences. The functions affected are ``median()``, ``median_low()``, +``median_high()``, ``median_grouped()``, ``mode()``, ``multimode()``, and +``quantiles()``. The ``NaN`` values should be stripped before calling these +functions:: + + >>> from statistics import median + >>> from math import isnan + >>> from itertools import filterfalse + + >>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4] + >>> sorted(data) # This has surprising behavior + [20.7, nan, 14.4, 18.3, 19.2, nan] + >>> median(data) # This result is unexpected + 16.35 + + >>> sum(map(isnan, data)) # Number of missing values + 2 + >>> clean = list(filterfalse(isnan, data)) # Strip NaN values + >>> clean + [20.7, 19.2, 18.3, 14.4] + >>> sorted(clean) # Sorting now works as expected + [14.4, 18.3, 19.2, 20.7] + >>> median(clean) # This result is now well defined + 18.75 + + Averages and measures of central location ----------------------------------------- |