diff options
author | Raymond Hettinger <rhettinger@users.noreply.github.com> | 2019-04-23 07:06:35 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2019-04-23 07:06:35 (GMT) |
commit | 9013ccf6d8037f6ae78145a42d194141cb10d332 (patch) | |
tree | 9a1bf5b8739569012d9d3ecbf50b739936b730e2 /Doc/library | |
parent | d437012cdd4a38b5b3d05f139d5f0a28196e4769 (diff) | |
download | cpython-9013ccf6d8037f6ae78145a42d194141cb10d332.zip cpython-9013ccf6d8037f6ae78145a42d194141cb10d332.tar.gz cpython-9013ccf6d8037f6ae78145a42d194141cb10d332.tar.bz2 |
bpo-36546: Add statistics.quantiles() (#12710)
Diffstat (limited to 'Doc/library')
-rw-r--r-- | Doc/library/statistics.rst | 54 |
1 files changed, 51 insertions, 3 deletions
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index 8bb2bdf..b62bcfd 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -48,6 +48,7 @@ or sample. :func:`median_grouped` Median, or 50th percentile, of grouped data. :func:`mode` Single mode (most common value) of discrete or nominal data. :func:`multimode` List of modes (most common values) of discrete or nomimal data. +:func:`quantiles` Divide data into intervals with equal probability. ======================= =============================================================== Measures of spread @@ -499,6 +500,53 @@ However, for reading convenience, most of the examples show sorted sequences. :func:`pvariance` function as the *mu* parameter to get the variance of a sample. +.. function:: quantiles(dist, *, n=4, method='exclusive') + + Divide *dist* into *n* continuous intervals with equal probability. + Returns a list of ``n - 1`` cut points separating the intervals. + + Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles. Set + *n* to 100 for percentiles which gives the 99 cuts points that separate + *dist* in to 100 equal sized groups. Raises :exc:`StatisticsError` if *n* + is not least 1. + + The *dist* can be any iterable containing sample data or it can be an + instance of a class that defines an :meth:`~inv_cdf` method. + Raises :exc:`StatisticsError` if there are not at least two data points. + + For sample data, the cut points are linearly interpolated from the + two nearest data points. For example, if a cut point falls one-third + of the distance between two sample values, ``100`` and ``112``, the + cut-point will evaluate to ``104``. Other selection methods may be + offered in the future (for example choose ``100`` as the nearest + value or compute ``106`` as the midpoint). This might matter if + there are too few samples for a given number of cut points. + + If *method* is set to *inclusive*, *dist* is treated as population data. + The minimum value is treated as the 0th percentile and the maximum + value is treated as the 100th percentile. If *dist* is an instance of + a class that defines an :meth:`~inv_cdf` method, setting *method* + has no effect. + + .. doctest:: + + # Decile cut points for empirically sampled data + >>> data = [105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110, + ... 100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129, + ... 106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86, + ... 111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95, + ... 103, 107, 101, 81, 109, 104] + >>> [round(q, 1) for q in quantiles(data, n=10)] + [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0] + + >>> # Quartile cut points for the standard normal distibution + >>> Z = NormalDist() + >>> [round(q, 4) for q in quantiles(Z, n=4)] + [-0.6745, 0.0, 0.6745] + + .. versionadded:: 3.8 + + Exceptions ---------- @@ -606,7 +654,7 @@ of applications in statistics. <http://www.iceaaonline.com/ready/wp-content/uploads/2014/06/MM-9-Presentation-Meet-the-Overlapping-Coefficient-A-Measure-for-Elevator-Speeches.pdf>`_ between two normal distributions, giving a measure of agreement. Returns a value between 0.0 and 1.0 giving `the overlapping area for - two probability density functions + the two probability density functions <https://www.rasch.org/rmt/rmt101r.htm>`_. Instances of :class:`NormalDist` support addition, subtraction, @@ -649,8 +697,8 @@ of applications in statistics. For example, given `historical data for SAT exams <https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores are normally distributed with a mean of 1060 and a standard deviation of 192, -determine the percentage of students with scores between 1100 and 1200, after -rounding to the nearest whole number: +determine the percentage of students with test scores between 1100 and +1200, after rounding to the nearest whole number: .. doctest:: |