Logistic Regression as Maximum Likelihood

max $\Sigma_{i=1}^n$ log $P(y_i | x_i;h)$

h - modeling hypothesis

max... - maximizes the likelihood function

In the case of logistic regression, a Binomial probability distribution is assumed for the data sample, where each example is one outcome of a Bernoulli trial. The bernoulli distribution has a single parameter, that is the probability of a successful outcome(p)

$P(y=1)=p$
$P(y=0)=1-p$

“

The probability distribution that is most often used when there are two classes is the binomial distribution. This distribution has a single parameter, p, that is the probability of an event or a specific class.
— Page 283, Applied Predictive Modeling, 2013.

”

The expected value(mean) of the Bernoulli distribution can be calculated as follows

$mean=P(y=1)*1+P(y=0)*0$

or
$mean=p*1+(1-p)*0$

https://www.investopedia.com/terms/e/expected-value.asp

In statistics and probability analysis, the expected value is calculated by multiplying each of the possible outcomes by the likelihood each outcome will occur and then summing all of those values.
$EV = \Sigma P(X_i) * X_i$ One can treat it as the weighted mean

thus

$likelihood=\hat y * y + (1 - \hat y) * (1-y)$

log-likelihood

$log-likelihood=log(\hat y) * y + log(1 - \hat y) * (1-y)$

Sum the likelihood function across all examples
$max \Sigma ^n_{i=1}log(\hat y) * y + log(1 - \hat y) * (1-y)$

Minimize the cost function for optimization by inverting the function so that we minimize the negative log-likelihood
$min \Sigma ^n_{i=1}-\Bigg(log(\hat y) * y + log(1 - \hat y) * (1-y)\Bigg)$

**Computing the negative of the log-likelihood function for the Bernoulli distribution is equivalent to calculating the cross-entropy function **
$cross-entropy = -\Bigg(log(q(class_0)) * p(class_0) + log(q(class_1)) * p(class_1)\Bigg)$

# test of Bernoulli likelihood function
# likelihood function for Bernoulli distribution
def likelihood(y, yhat):
    return yhat * y + (1 - yhat) * (1 - y)

# test for y=1
y, yhat = 1, 0.9
print('y=%.1f, yhat=%.1f, likelihood: %.3f' % (y, yhat, likelihood(y, yhat)))
y, yhat = 1, 0.1
print('y=%.1f, yhat=%.1f, likelihood: %.3f' % (y, yhat, likelihood(y, yhat)))

# test for y=0
y, yhat = 0, 0.1
print('y=%.1f, yhat=%.1f, likelihood: %.3f' % (y, yhat, likelihood(y, yhat)))
y, yhat = 0, 0.9
print('y=%.1f, yhat=%.1f, likelihood: %.3f' % (y, yhat, likelihood(y, yhat)))