Designing Fair Ranking Systems: Mitigating Bias in Data
Designing Fair Ranking Schemes
Abstract
Items from a database are often ranked based on a combination of criteria. The weight given to each criterion in the combination can greatly affect the fairness of the produced ranking, for example, preferring men over women. A user may have the flexibility to choose combinations that weigh these criteria differently, within limits. In this paper, we develop a system that helps users choose criterion weights that lead to greater fairness.
Introduction
A prominent family of ranking schemes are score-based rankers, which compute the score of each individual from some database D, sort the individuals in decreasing order of score, and finally return either the full ranked list, or its highest-scoring sub-set, the top-k. Many score-based rankers compute the score as a linear combination of attribute values, with non-negative weights.
We desire a ranking scheme that is fair, in the sense that it mitigates pre-existing bias with respect to a protected feature embodied in the data.
Numerous fairness definitions have been considered in the recent literature. A useful dichotomy is between individual fairness and group fairness, also known as statistical parity.
We focus on group fairness in this paper.
Preliminaries
Data model: We consider a dataset D of n items, each with d scalar scoring attributes.
Ranking model: We focus on the class of linear scoring functions that use a weight vector w.
Problem Statement
A given query f, with a weight vector w, may not satisfy the required fairness constraints. Our problem is to propose a scoring function f′, with a weight vector similar to that of f, that does satisfy the constraints, if one exists.
High-level idea: From the system’s viewpoint, the challenge is to propose similar weight vectors that satisfy the fairness constraints, in interactive time. To accomplish this, our solution will operate with an offline phase and then an online phase. In the offline phase, we will process the (representative sample) dataset, and develop data structures that will be useful in the online phase.
The Two-Dimensional Case
Each item in a 2-dimensional dataset can be represented as a point in R2, and each ranking function f can be represented as a ray starting from the origin.
Offline Processing
- Offline processing is for identifying and indexing the satisfactory functions, for efficient answering of online queries.
Online Processing
- Having the sorted list of 2D satisfactory regions constructed in the offline phase allows us to design an efficient algorithm for online answering of queries.
The Multi-Dimensional Case
When there are three or more attributes, there are many ways in which we can perturb a given weight vector.
Conclusion
- In this paper, we studied the problem of designing fair ranking schemes. Our system computes the score of each item as a weighted linear combination of its attribute values, and assists users in choosing attribute weights that lead to fair ranked outcomes. Creating proper indexes in an offline manner enables interactive response to user queries.
- We designed techniques for a general fairness model that takes an ordering of the items as input and decides whether it meets the fairness requirements.