Normalized Entropy (NE), also known as 归一化熵 or 标准度序列熵, is a metric widely used in machine learning and information theory to evaluate model performance, particularly in tasks like classification and CTR (Click-Through Rate) prediction. Below is a detailed explanation based on its principles, applications, and implementation:
1. Definition and Formula
Normalized Entropy measures the relative uncertainty reduction of a model compared to the intrinsic randomness of the dataset. It is derived by normalizing the cross-entropy loss with the entropy of the dataset's baseline (background) distribution.
• Formula:
• Cross-Entropy Loss: Quantifies the difference between predicted probabilities and true labels.
• Background Entropy: The entropy of the dataset's empirical CTR (e.g., average click probability). For a dataset with average CTR , the background entropy is .
Example: If the baseline CTR is 10%, the background entropy represents the inherent uncertainty. A lower NE indicates better model performance, as it reduces uncertainty more effectively.
2. Key Applications
2.1 Model Evaluation in CTR Prediction
• NE is extensively used in ad ranking systems (e.g., Facebook's ad CTR prediction) to assess how well models distinguish between clicked and non-clicked ads .
• Advantage over AUC: Unlike AUC, NE accounts for calibration errors. For instance, if predicted probabilities are globally scaled (e.g., multiplied by 0.5), AUC remains unchanged, but NE reflects this miscalibration .
2.2 Feature Engineering
• In decision trees and gradient boosting (e.g., GBDT), NE helps evaluate feature importance by measuring entropy reduction after splitting nodes .
3. Implementation Steps
3.1 Data Normalization
Before calculating NE, features are normalized to ensure comparability:
• For positive indicators (higher values are better):
• For negative indicators (lower values are better):
This step aligns with entropy-based weight calculation in entropy weight methods .
3.2 Entropy and Weight Calculation
-
Probability Proportion: Compute the proportion of each sample under feature :
-
Entropy Value:
-
Normalized Entropy:
This step highlights the entropy reduction relative to the dataset's randomness .
4. Advantages and Limitations
Advantages
• Objectivity: NE avoids subjective biases by relying on data-driven entropy values .
• Sensitivity to Calibration: Reflects both ranking quality and probability calibration, unlike AUC .
Limitations
• Dependency on Data Quality: Sensitive to outliers and data preprocessing steps (e.g., normalization) .
• Complexity: Requires careful handling of zero probabilities to avoid numerical instability (e.g., adding a small ) .
5. Case Study: Facebook's Ad Ranking
In Facebook's GBDT+LR model, NE evaluates the combined model's performance:
• GBDT generates non-linear features (leaf node indices of decision trees), which are fed into LR for probability prediction.
• NE improved from 80.21% to 96.25% after introducing normalized entropy-based enhancement, demonstrating its effectiveness in capturing informative features .
References
• Normalized Entropy in CTR Prediction:
• Entropy Calculation and Normalization:
• Data Preprocessing:
For implementation details, refer to code examples in Webpages 1, 3, and 9.