Normalized Entropy (NE)

Normalized Entropy (NE), also known as 归一化熵 or 标准度序列熵, is a metric widely used in machine learning and information theory to evaluate model performance, particularly in tasks like classification and CTR (Click-Through Rate) prediction. Below is a detailed explanation based on its principles, applications, and implementation:

1. Definition and Formula

Normalized Entropy measures the relative uncertainty reduction of a model compared to the intrinsic randomness of the dataset. It is derived by normalizing the cross-entropy loss with the entropy of the dataset's baseline (background) distribution.
• Formula:

$\text{NE} = \frac{\text{Cross-Entropy Loss}}{\text{Background Entropy}}$
• Cross-Entropy Loss: Quantifies the difference between predicted probabilities and true labels.

• Background Entropy: The entropy of the dataset's empirical CTR (e.g., average click probability). For a dataset with average CTR $p$ , the background entropy is $-p \log p - (1-p) \log (1-p)$ .

Example: If the baseline CTR is 10%, the background entropy represents the inherent uncertainty. A lower NE indicates better model performance, as it reduces uncertainty more effectively.

2. Key Applications

2.1 Model Evaluation in CTR Prediction

• NE is extensively used in ad ranking systems (e.g., Facebook's ad CTR prediction) to assess how well models distinguish between clicked and non-clicked ads .

• Advantage over AUC: Unlike AUC, NE accounts for calibration errors. For instance, if predicted probabilities are globally scaled (e.g., multiplied by 0.5), AUC remains unchanged, but NE reflects this miscalibration .

2.2 Feature Engineering

• In decision trees and gradient boosting (e.g., GBDT), NE helps evaluate feature importance by measuring entropy reduction after splitting nodes .

3. Implementation Steps

3.1 Data Normalization
Before calculating NE, features are normalized to ensure comparability:
• For positive indicators (higher values are better):

$x' = \frac{x - \min(x)}{\max(x) - \min(x)}$
• For negative indicators (lower values are better):

$x' = \frac{\max(x) - x}{\max(x) - \min(x)}$
This step aligns with entropy-based weight calculation in entropy weight methods .

3.2 Entropy and Weight Calculation

Probability Proportion: Compute the proportion $p_{ij}$ of each sample $i$ under feature $j$ :
$p_{ij} = \frac{x'_{ij}}{\sum_{i=1}^n x'_{ij}}$
Entropy Value:
$e_j = -k \sum_{i=1}^n p_{ij} \log(p_{ij}), \quad k = 1/\log(n)$
Normalized Entropy:
$\text{NE}_j = \frac{e_j}{\text{Background Entropy}}$
This step highlights the entropy reduction relative to the dataset's randomness .

4. Advantages and Limitations

Advantages
• Objectivity: NE avoids subjective biases by relying on data-driven entropy values .

• Sensitivity to Calibration: Reflects both ranking quality and probability calibration, unlike AUC .

Limitations
• Dependency on Data Quality: Sensitive to outliers and data preprocessing steps (e.g., normalization) .

• Complexity: Requires careful handling of zero probabilities to avoid numerical instability (e.g., adding a small $\epsilon$ ) .

5. Case Study: Facebook's Ad Ranking

In Facebook's GBDT+LR model, NE evaluates the combined model's performance:
• GBDT generates non-linear features (leaf node indices of decision trees), which are fed into LR for probability prediction.

• NE improved from 80.21% to 96.25% after introducing normalized entropy-based enhancement, demonstrating its effectiveness in capturing informative features .

References
• Normalized Entropy in CTR Prediction:

• Entropy Calculation and Normalization:

• Data Preprocessing:

For implementation details, refer to code examples in Webpages 1, 3, and 9.

Normalized Entropy (NE)

指标相关文章

最近热门

最常浏览