top of page
  • Eren Özkan

A brief explanation about the Central Limit Theorem



The Central Limit Theorem is a fundamental statistical theorem that states that the distribution of the means of a random sample taken from any population with a finite mean and variance will be approximately normally distributed if the sample size is large enough. The distribution of the population does not matter here. It is sufficient that it has a finite mean and variance.


Before we get into the central limit theorem, let's talk briefly about what the normal distribution is.



You've probably heard of the Galton box. The Galton box is a great example of a normal distribution. You drop a series of balls from above and they hit the spikes and collect at the bottom in the form of a normal distribution graph, in other words, a Gaussian distribution. We see this distribution in many different places in nature. For example, if we graph the heights of a large number of random people, we can see that it shows a distribution that is close to a normal distribution.



You want to implement a simple application for the central limit theorem. For example, let's say we have a fair die. The probability of rolling any side of this die is 1/6. You roll the die 10 times, and you take the average of these results. Then, you repeat this process 3000 times. You want to implement this in a short Python code.


import matplotlib.pyplot as plt
import numpy as np
import random
def random_dice():
    return random.randint(1, 6)


def dice():
    mean_list = []  
    for _ in range(3000):
        summary = 0
        for _ in range(10):
            summary += random_dice()
        mean = summary / 10
        mean_list.append(mean)
    return mean_list

dice_list = dice()

ax,fig = plt.subplots(figsize=[9,5])
plt.hist(zar_listesi, bins=np.arange(1,6,0.1), edgecolor='black')
plt.xlabel('Summary of Dice Value ')
plt.ylabel('Frequency of Dice Value')
plt.title('Frequency of Die')
plt.show()

Ve sonuç..




Looks like a normal distribution graph, doesn't it?


We can do this on a cheating dice if we want. For example, let's say the probability of the dice faces coming up is 0.2, 0.3, 0.3, 0.1, 0.05, 0.05. If we do the same operations on this one...

def cheating_dice():
    probs = [0.2, 0.3, 0.3, 0.1, 0.05,0.05]
    return random.choices(range(1, 7), weights=probs)[0]

def cheating_dice_mean():
    mean_list = []  
    for _ in range(3000):
        toplam = 0
        for _ in range(10):
            summary += hileli_zar_at()
        mean = summary / 10
        mean_list.append(mean)
    return mean_list

...we see that it gives the same graph again.





Let's calculate our expected value for this fair dice with the formula below.


The formula for calculating the average in the probability distribution:



(0.2 1) + (0.32) + (0.3*3) + (0.1*4) + (0.05*5) + (0.05*6) = 2.65

So our expected value is 2.65.



If we calculate this for the dice we throw 1000 times, we find the value 2.6587. As you can see, these two values ​​are very close to each other.


The central limit theorem explains this. It shows that regardless of the distribution, the result will always be normally distributed.

Comments


bottom of page