Random Variables
A random variable is a quantity that is produced by a random process.
👍 In probability, a random variable can take on one of many possible values, e.g. events from the state space.A specific value or set of values for a random variable can be assigned a probability.
A random variable is often denoted as a capital letter, e.g. X, and values of the random variable are denoted as a lowercase letter and an index, e.g. x1 , x2 , x3.
The values that a random variable can take are called its domain, and the domain of a random variable may be discrete or continuous.
Various types of Random Variables:
- Discrete RV - data are drawn from a finite set of states
- Boolean RV - data are drawn from the set of {True, False}
- Continuous RV - data are drawn from a range of real-valued numerical values.
💪 Two important properties of a probability distribution are the expected value and the variance. Mathematically, these are referred to as the first and second moments of the distribution
, others include the skewness(3rd moment) and the kurtosis(4th moment)
The expected value is the average or mean value of a random variable X, which is the most likely value or the outcome with the highest probability. It is typically denoted as a function of the uppercase letter with square brackets, for example, for the expected value of or where the function is used to sample a value from the domain of .
The variance is the spread of the values of a random variable from the mean which is often denoted as a function , e.g. is the variance of the random variable or for the variance of values drawn from the domain of using the function . The square root of the variance normalizes the value is referred to as the standard deviation. The variance between two variables is called the covariance and it summarizes the linear relationship for how two random variables change together.
The structure of the probability distribution will vary depending on whether the random variable is discrete or continuous.
Discrete Probability Distributions
A discrete probability distribution summarizes the probabilities for a discrete random variable.
-
PMF - The probability mass function, defines the probability distribution for a discrete random variable.
-
CDF - The cumulative distribution function, is a function that assigns a probability, which a discrete random variable will have a value of less than ore equal to a specific discrete value.
The sum of probabilities in the PMF equals one.
Discrete probability distributions are widely used in machine learning, most notably in the modeling of binary and multiclass classification problems, it also in evaluating the performance for binary classification models, such as the calculation of confidence intervals, as well as in the modeling of the distribution of words in the text for natural language processing.
Knowledge of DPD is also required in the choice of activation functions in the output layer, which plays a vital role in deep learning neural networks for classification tasks, as well as for selecting an appropriate loss function.
There are two types of discrete random variables most commonly used in machine learning, which are binary and categorical.
- Binary Random Variable
-
Categorical Random Variable
K is the total number of unique outcomes
The relationship between those outcomes or events for a discrete random variable and their probabilities is called the discrete probability distribution, which is summarized by a probability mass function; For outcomes that can be ordered, the probability of an event equal to or less than a given value is defined by the cumulative distribution function. The inverse of the CDF is called the percentage-point function which will give the discrete outcome that is less than or equal to a probability.
The most common are the Bernoulli and Multinoulli distributions for binary and categorical discrete random variables respectively, and the Binomial and Multinomial distributions that generalize each to multiple independent trials.
Continuous Probability Distributions
A continuous probability distribution summarizes the probability for a continuous random variable.
-
PDF - The probability distribution function defines the probability distribution for a continuous random variable. Note the difference between PDF and PMF
-
CDF - Like a discrete probability distribution, the continuous probability distribution also has a cumulative distribution function, which defines the probability of a value less than or equal to a specific numerical value from the domain.
Bernoulli Distribution
Bernoulli Distribution covers a case where an event will have a binary outcome as either a 0 or 1.
If x = 1, then P(x=1) = p, P(x=0) = 1 - p
网友评论