summaryrefslogtreecommitdiffstats
path: root/Doc/library/statistics.rst
diff options
context:
space:
mode:
authorRaymond Hettinger <rhettinger@users.noreply.github.com>2019-03-11 06:43:33 (GMT)
committerMiss Islington (bot) <31488909+miss-islington@users.noreply.github.com>2019-03-11 06:43:33 (GMT)
commitcc353a0cd95d9b0c93ed0b60ba762427a94c790d (patch)
tree84a1730bcd3d2d9e87b582ed8efa7c464b36863d /Doc/library/statistics.rst
parent491ef53c1548c2b593d3c35d1e7bf25ccb443019 (diff)
downloadcpython-cc353a0cd95d9b0c93ed0b60ba762427a94c790d.zip
cpython-cc353a0cd95d9b0c93ed0b60ba762427a94c790d.tar.gz
cpython-cc353a0cd95d9b0c93ed0b60ba762427a94c790d.tar.bz2
Various refinements to the NormalDist examples and recipes (GH-12272)
Diffstat (limited to 'Doc/library/statistics.rst')
-rw-r--r--Doc/library/statistics.rst49
1 files changed, 26 insertions, 23 deletions
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst
index 3e14434..81119da 100644
--- a/Doc/library/statistics.rst
+++ b/Doc/library/statistics.rst
@@ -510,10 +510,9 @@ of applications in statistics.
.. classmethod:: NormalDist.from_samples(data)
- Class method that makes a normal distribution instance
- from sample data. The *data* can be any :term:`iterable`
- and should consist of values that can be converted to type
- :class:`float`.
+ Makes a normal distribution instance computed from sample data. The
+ *data* can be any :term:`iterable` and should consist of values that
+ can be converted to type :class:`float`.
If *data* does not contain at least two elements, raises
:exc:`StatisticsError` because it takes at least one point to estimate
@@ -536,11 +535,10 @@ of applications in statistics.
the given value *x*. Mathematically, it is the ratio ``P(x <= X <
x+dx) / dx``.
- Note the relative likelihood of *x* can be greater than `1.0`. The
- probability for a specific point on a continuous distribution is `0.0`,
- so the :func:`pdf` is used instead. It gives the probability of a
- sample occurring in a narrow range around *x* and then dividing that
- probability by the width of the range (hence the word "density").
+ The relative likelihood is computed as the probability of a sample
+ occurring in a narrow range divided by the width of the range (hence
+ the word "density"). Since the likelihood is relative to other points,
+ its value can be greater than `1.0`.
.. method:: NormalDist.cdf(x)
@@ -568,7 +566,8 @@ of applications in statistics.
>>> temperature_february * (9/5) + 32 # Fahrenheit
NormalDist(mu=41.0, sigma=4.5)
- Dividing a constant by an instance of :class:`NormalDist` is not supported.
+ Dividing a constant by an instance of :class:`NormalDist` is not supported
+ because the result wouldn't be normally distributed.
Since normal distributions arise from additive effects of independent
variables, it is possible to `add and subtract two independent normally
@@ -581,8 +580,10 @@ of applications in statistics.
>>> birth_weights = NormalDist.from_samples([2.5, 3.1, 2.1, 2.4, 2.7, 3.5])
>>> drug_effects = NormalDist(0.4, 0.15)
>>> combined = birth_weights + drug_effects
- >>> f'mean: {combined.mean :.1f} standard deviation: {combined.stdev :.1f}'
- 'mean: 3.1 standard deviation: 0.5'
+ >>> round(combined.mean, 1)
+ 3.1
+ >>> round(combined.stdev, 1)
+ 0.5
.. versionadded:: 3.8
@@ -595,14 +596,15 @@ of applications in statistics.
For example, given `historical data for SAT exams
<https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores
are normally distributed with a mean of 1060 and a standard deviation of 192,
-determine the percentage of students with scores between 1100 and 1200:
+determine the percentage of students with scores between 1100 and 1200, after
+rounding to the nearest whole number:
.. doctest::
>>> sat = NormalDist(1060, 195)
>>> fraction = sat.cdf(1200 + 0.5) - sat.cdf(1100 - 0.5)
- >>> f'{fraction * 100 :.1f}% score between 1100 and 1200'
- '18.4% score between 1100 and 1200'
+ >>> round(fraction * 100.0, 1)
+ 18.4
What percentage of men and women will have the same height in `two normally
distributed populations with known means and standard deviations
@@ -616,18 +618,19 @@ distributed populations with known means and standard deviations
To estimate the distribution for a model than isn't easy to solve
analytically, :class:`NormalDist` can generate input samples for a `Monte
-Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_ of the
-model:
+Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
.. doctest::
+ >>> def model(x, y, z):
+ ... return (3*x + 7*x*y - 5*y) / (11 * z)
+ ...
>>> n = 100_000
- >>> X = NormalDist(350, 15).samples(n)
- >>> Y = NormalDist(47, 17).samples(n)
- >>> Z = NormalDist(62, 6).samples(n)
- >>> model_simulation = [x * y / z for x, y, z in zip(X, Y, Z)]
- >>> NormalDist.from_samples(model_simulation) # doctest: +SKIP
- NormalDist(mu=267.6516398754636, sigma=101.357284306067)
+ >>> X = NormalDist(10, 2.5).samples(n)
+ >>> Y = NormalDist(15, 1.75).samples(n)
+ >>> Z = NormalDist(5, 1.25).samples(n)
+ >>> NormalDist.from_samples(map(model, X, Y, Z)) # doctest: +SKIP
+ NormalDist(mu=19.640137307085507, sigma=47.03273142191088)
Normal distributions commonly arise in machine learning problems.