The Wilson Score - Ranking Algorithm

The Secret for Zhihu Rank

March 5, 2023

Wilson Score sorting algorithm, Wilson Score, is used for quality sorting. The data contains positive and negative reviews. Taking into account the number of comments and praise rate, the higher the score, the higher the quality.

n = u + v \\ p = \frac{u}{v} \\ S = \frac{(p + \frac{z_{α}^{2}}{2n} - \frac{z_{α}^{2}}{2n}\sqrt{4n(1-p)p+z_{α}^{2}} )}{(1+\frac{z_{α}^{2}}{n})}

$u$ represents the number of positive examples (good reviews), $v$ represents the number of negative examples (bad reviews), $n$ represents the total number of instances (total number of reviews), $p$ represents the positive rate, and $z$ is the score of the normal distribution. Number of bits (parameter), $S$ represents the final Wilson score. $z$ generally takes a value of 2, which is 95% confidence.

Here is the quantile table for the normal distribution:

Algorithm traits:

Properties: The range of score $S$ is $[0,1)$ , effect: normalized, suitable for sorting
Properties: When the number of positive examples $u$ is $0$ , $p$ is $0$ , and the score $S$ is $0$ ; effect: no positive comments, the score is the lowest;
Properties: When the number of negative examples $v$ is $0$ , $p$ is $1$ , degenerates to $1/(1 + z^2 / n)$ , and the score $S$ is always less than $1$ ; effect: the score has permanent comparability;
Properties: When $p$ remains unchanged, the larger $n$ is, the reduction speed of the numerator is smaller than the reduction speed of the denominator, and the greater the score $S$ , and vice versa; Effect: The positive rate $p$ is the same, the total number of instances is $n The more$ , the more $S$ you score;
Properties: When $n$ tends to infinity, it degenerates to $p$ , and the score $S$ is determined by $p$ ; Effect: When the total number of comments $n$ is more, the positive rate $p$ will bring the score $S The improvement of$ is more obvious;
Properties: When the quantile $z$ is larger, the total number of reviews $n$ is more important, and the praise rate $p$ is less important, and vice versa; Effect: The larger $z$ is, the total number of reviews $n$ is more important. The degree of discrimination is low; the smaller $z$ is, the more important the praise rate $p$ is;

Click here to get source code

Python Implementation.

def wilson_score(pos, total, p_z=2.):
    pos_rat = pos * 1. / total * 1.  # 正例比率
    score = (pos_rat + (np.square(p_z) / (2. * total))
             - ((p_z / (2. * total)) * np.sqrt(4. * total * (1. - pos_rat) * pos_rat + np.square(p_z)))) / \
            (1. + np.square(p_z) / total)
    return score

Distribution plot of Wilson score algorithm

Let's try to put this theory into use with following example.

Example: Suppose Doctor A has 100 reviews, 1 negative review and 99 positive reviews. Doctor B has 2 reviews, both of which are good reviews. Which one should be ranked first?

When $z=2$ , that is, 95% confidence level, doctor A's score is 0.9440, doctor B's score is 0.3333, and doctor A ranks in front.

PS: Rating Level Question: What should we do if we have a five-star evaluation system or a hundred percent evaluation system?

Just change the Wilson score formula from Bernoulli distribution to normal distribution.

S = \frac{(μ + \frac{z_{α}^{2}}{2n} - \frac{z_{α}^{2}}{2n}\sqrt{4nσ^{2}+z_{α}^{2}} )}{(1+\frac{z_{α}^{2}}{n})} \\ μ ∈(0,1)

Note: The mean and variance are both normalized values.

Python Implementation.

def wilson_score_norm(mean, var, total, p_z=2.):
    # The mean variance needs to be normalized to fit the quantiles of the normal distribution
    score = (mean + (np.square(p_z) / (2. * total))
             - ((p_z / (2. * total)) * np.sqrt(4. * total * var + np.square(p_z)))) / \
            (1 + np.square(p_z) / total)
    return score

Example of normalization:

def test_of_values():
    max = 5.  # Maximum evaluation value.
    min = 1.  # Minimum evaluation value.
    values = np.array([1., 2., 3., 4., 5.])  # Example

    norm_values = (values - min) / (max - min)  # Normalize
    total = norm_values.size
    mean = np.mean(norm_values)
    var = np.var(norm_values)
    return total, mean, var

PS: About the z parameter, that is, the normal quantile. The normal quantile affects the distribution of Wilson scores, and the value of the $z$ parameter is based on the magnitude of the number of samples. For example: the same 100 samples, 90 positive reviews, $z$ value is 2 or 6, the scores are very different, and the number of samples accommodated (or distinguished) by the system is also very different (the same is 0.82 points and 90% positive reviews, $z=2$ requires 100 samples, $z=6$ requires 1000 samples), generally speaking, the larger the magnitude of the number of samples, the larger the value of z.

print 'score: %s' % wilson_score(90, 90 + 10, p_z=2.)
print 'score: %s' % wilson_score(90, 90 + 10, p_z=6.)
print 'score: %s' % wilson_score(900, 900 + 100, p_z=6.)

# take 2-100：score: 0.823802352689
# take 6-100：score: 0.606942322627
# take 6-1000：score: 0.828475631056

References:

NOTE & BLOG