Normalized Entropy (NE), also known as 归一化熵 or 标准度序列熵, is a metric widely used in machine learning and information theory to evaluate model performance, particularly in tasks like classification and CTR (Click-Through Rate) prediction. Below is a detailed explanation based on its principles, applications, and implementation:


1. Definition and Formula

Normalized Entropy measures the relative uncertainty reduction of a model compared to the intrinsic randomness of the dataset. It is derived by normalizing the cross-entropy loss with the entropy of the dataset's baseline (background) distribution.
• Formula:


• Cross-Entropy Loss: Quantifies the difference between predicted probabilities and true labels.

• Background Entropy: The entropy of the dataset's empirical CTR (e.g., average click probability). For a dataset with average CTR , the background entropy is .

Example: If the baseline CTR is 10%, the background entropy represents the inherent uncertainty. A lower NE indicates better model performance, as it reduces uncertainty more effectively.


2. Key Applications

2.1 Model Evaluation in CTR Prediction

• NE is extensively used in ad ranking systems (e.g., Facebook's ad CTR prediction) to assess how well models distinguish between clicked and non-clicked ads .

• Advantage over AUC: Unlike AUC, NE accounts for calibration errors. For instance, if predicted probabilities are globally scaled (e.g., multiplied by 0.5), AUC remains unchanged, but NE reflects this miscalibration .

2.2 Feature Engineering

• In decision trees and gradient boosting (e.g., GBDT), NE helps evaluate feature importance by measuring entropy reduction after splitting nodes .


3. Implementation Steps

3.1 Data Normalization
Before calculating NE, features are normalized to ensure comparability:
• For positive indicators (higher values are better):


• For negative indicators (lower values are better):


This step aligns with entropy-based weight calculation in entropy weight methods .

3.2 Entropy and Weight Calculation

  1. Probability Proportion: Compute the proportion of each sample under feature :

  2. Entropy Value:

  3. Normalized Entropy:

    This step highlights the entropy reduction relative to the dataset's randomness .


4. Advantages and Limitations

Advantages
• Objectivity: NE avoids subjective biases by relying on data-driven entropy values .

• Sensitivity to Calibration: Reflects both ranking quality and probability calibration, unlike AUC .

Limitations
• Dependency on Data Quality: Sensitive to outliers and data preprocessing steps (e.g., normalization) .

• Complexity: Requires careful handling of zero probabilities to avoid numerical instability (e.g., adding a small ) .


5. Case Study: Facebook's Ad Ranking

In Facebook's GBDT+LR model, NE evaluates the combined model's performance:
• GBDT generates non-linear features (leaf node indices of decision trees), which are fed into LR for probability prediction.

• NE improved from 80.21% to 96.25% after introducing normalized entropy-based enhancement, demonstrating its effectiveness in capturing informative features .


References
• Normalized Entropy in CTR Prediction:

• Entropy Calculation and Normalization:

• Data Preprocessing:

For implementation details, refer to code examples in Webpages 1, 3, and 9.