Description: Overall privacy budget allocation and usage.
Remaining Budget: 1.5 ε
Per Query Budget: 0.1 ε
Non-Ideal Range: > 20
Further reading: For a comprehensive overview of differential privacy concepts and guidance on setting privacy budgets, please refer to this primer: [link]
Description: Epsilon (ε) represents the privacy budget. It is the primary privacy parameter controlling information leakage. A value of 2.0 suggests a moderate balance between privacy and accuracy. Lower epsilon values provide stronger privacy but lower utility.
Non-Ideal Range: > 20
Description: Delta (δ) represents the probability of privacy failure. A smaller delta indicates a lower chance of privacy leakage. In this case, δ = 1e-7 ensures very low leakage probability.
Typical Range: 1e-5 - 1e-8
Impact: Ensures extremely low probability of privacy failure
Description: Used in zero-concentrated differential privacy (zCDP). It provides conservative privacy guarantees A value of 3.85 reflects the trade-off between privacy and accuracy under zCDP mechanisms.
Description: Defines the basic unit of privacy protection, i.e., what is being protected. The Unit of Privacy here measures privacy per user per day, meaning that the privacy guarantee applies to the data submitted by an individual user on a single day. This ensures that any data a user contributes during a single day is protected, without aggregating contributions across multiple days or users.
Non-Ideal Range: Anything other than individual, with a time range of less than a month.
Further reading: The most common unit of privacy is “one person” — meaning the privacy guarantee protects the whole person, forever. But other definitions are possible; Apple's implementation of differential privacy uses a “person-day” unit of privacy, meaning that the guarantee applies to the data submitted by one person on a single day. [link]
Description: Captures the trade-off between privacy and utility. Utility is measured through relative error.
Typical Range: Lower error values mean higher accuracy.
Description: The Laplace mechanism ensures differential privacy by applying noise proportional to sensitivity.
Other mechanisms: Gaussian Mechanism, Exponential Mechanism, etc.
Further reading: [link]
Description: Scales the amount of noise added to gradients during training. A noise multiplier of 1.2 ensures adequate privacy protection.
Further reading: For more insights into setting and understanding algorithm hyperparameters in differential privacy, refer to this paper: [link].
Description: Limits individual data contributions to ensure privacy. A threshold of 1.0 means contributions are capped at a maximum of 1.0.
Description: Defines the size of data batches, e.g., 512 records per batch.
Description: The number of passes through the dataset during training. 5 epochs allow the model to train adequately while maintaining privacy. Total queries performed, e.g., 5 queries at ε = 0.4 for a total privacy budget of ε = 2
Description: Controls the magnitude of model updates during training. A lower learning rate like 0.01 ensures stability and better convergence.
Description: Centralized model assumes trust in the data curator. The sensitive data is collected centrally in a single dataset. In this setting, we assume that the analyst is malicious, but that there is a trusted data curator who holds the dataset and correctly executes the DP mechanisms the analyst specifies.
Types: Local, Central, Shuffle
Description: Empirical metrics help measure real-world privacy risks.
Status: Not currently defined
Recommendation: Consider adding empirical privacy guarantees.
Description: Provides a semantic interpretation of privacy guarantees.
Status: Not currently defined
Recommendation: Consider adding privacy semantic interpretations.
Description: Includes metadata for released data; e.g., software used and implementation and vetting details, average-case guarantees, guidance for normal ranges of parameters, public information used, links to additional materials like whitepapers on arxiv, source code, publication date, version number/ID, goals of the release, data creation pipeline/contextual integrity.
Further reading: [link]