diff options
author | Raymond Hettinger <rhettinger@users.noreply.github.com> | 2019-03-07 07:23:55 (GMT) |
---|---|---|
committer | Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> | 2019-03-07 07:23:55 (GMT) |
commit | 1f58f4fa6a0e3c60cee8df4a35c8dcf3903acde8 (patch) | |
tree | 000fd3c9b3cdbb73af20de9c9ad76a589f007b78 /Doc/library/statistics.rst | |
parent | 318d537daabf2bd5f781255c7e25bfce260cf227 (diff) | |
download | cpython-1f58f4fa6a0e3c60cee8df4a35c8dcf3903acde8.zip cpython-1f58f4fa6a0e3c60cee8df4a35c8dcf3903acde8.tar.gz cpython-1f58f4fa6a0e3c60cee8df4a35c8dcf3903acde8.tar.bz2 |
Refine statistics.NormalDist documentation and improve test coverage (GH-12208)
Diffstat (limited to 'Doc/library/statistics.rst')
-rw-r--r-- | Doc/library/statistics.rst | 52 |
1 files changed, 24 insertions, 28 deletions
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index be0215a..157500e 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -479,7 +479,7 @@ measurements as a single entity. Normal distributions arise from the `Central Limit Theorem <https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range -of applications in statistics, including simulations and hypothesis testing. +of applications in statistics. .. class:: NormalDist(mu=0.0, sigma=1.0) @@ -492,19 +492,19 @@ of applications in statistics, including simulations and hypothesis testing. .. attribute:: mean - A read-only property representing the `arithmetic mean + A read-only property for the `arithmetic mean <https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal distribution. .. attribute:: stdev - A read-only property representing the `standard deviation + A read-only property for the `standard deviation <https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal distribution. .. attribute:: variance - A read-only property representing the `variance + A read-only property for the `variance <https://en.wikipedia.org/wiki/Variance>`_ of a normal distribution. Equal to the square of the standard deviation. @@ -584,8 +584,8 @@ of applications in statistics, including simulations and hypothesis testing. Dividing a constant by an instance of :class:`NormalDist` is not supported. Since normal distributions arise from additive effects of independent - variables, it is possible to `add and subtract two normally distributed - random variables + variables, it is possible to `add and subtract two independent normally + distributed random variables <https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_ represented as instances of :class:`NormalDist`. For example: @@ -607,15 +607,15 @@ of applications in statistics, including simulations and hypothesis testing. For example, given `historical data for SAT exams <https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores -are normally distributed with a mean of 1060 and standard deviation of 192, +are normally distributed with a mean of 1060 and a standard deviation of 192, determine the percentage of students with scores between 1100 and 1200: .. doctest:: >>> sat = NormalDist(1060, 195) - >>> fraction = sat.cdf(1200) - sat.cdf(1100) + >>> fraction = sat.cdf(1200 + 0.5) - sat.cdf(1100 - 0.5) >>> f'{fraction * 100 :.1f}% score between 1100 and 1200' - '18.2% score between 1100 and 1200' + '18.4% score between 1100 and 1200' What percentage of men and women will have the same height in `two normally distributed populations with known means and standard deviations @@ -644,20 +644,12 @@ model: Normal distributions commonly arise in machine learning problems. -Wikipedia has a `nice example with a Naive Bayesian Classifier -<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge -is to guess a person's gender from measurements of normally distributed -features including height, weight, and foot size. +Wikipedia has a `nice example of a Naive Bayesian Classifier +<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge is to +predict a person's gender from measurements of normally distributed features +including height, weight, and foot size. -The `prior probability <https://en.wikipedia.org/wiki/Prior_probability>`_ of -being male or female is 50%: - -.. doctest:: - - >>> prior_male = 0.5 - >>> prior_female = 0.5 - -We also have a training dataset with measurements for eight people. These +We're given a training dataset with measurements for eight people. The measurements are assumed to be normally distributed, so we summarize the data with :class:`NormalDist`: @@ -670,8 +662,8 @@ with :class:`NormalDist`: >>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10]) >>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9]) -We observe a new person whose feature measurements are known but whose gender -is unknown: +Next, we encounter a new person whose feature measurements are known but whose +gender is unknown: .. doctest:: @@ -679,19 +671,23 @@ is unknown: >>> wt = 130 # weight >>> fs = 8 # foot size -The posterior is the product of the prior times each likelihood of a -feature measurement given the gender: +Starting with a 50% `prior probability +<https://en.wikipedia.org/wiki/Prior_probability>`_ of being male or female, +we compute the posterior as the prior times the product of likelihoods for the +feature measurements given the gender: .. doctest:: + >>> prior_male = 0.5 + >>> prior_female = 0.5 >>> posterior_male = (prior_male * height_male.pdf(ht) * ... weight_male.pdf(wt) * foot_size_male.pdf(fs)) >>> posterior_female = (prior_female * height_female.pdf(ht) * ... weight_female.pdf(wt) * foot_size_female.pdf(fs)) -The final prediction is awarded to the largest posterior -- this is known as -the `maximum a posteriori +The final prediction goes to the largest posterior. This is known as the +`maximum a posteriori <https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP: .. doctest:: |