Characterizing the Demographic Effects Relative to Race, Gender, and Skin Tone on the Accuracy of Deep Convolutional Neural Network Based Face Recognition Systems
Abstract
Automated face recognition (AFR) is an important tool used in national security
and public safety operations. Numerous cases exist in which AFR was successfully
used to search large repositories containing millions of face images, such as
identifying a suspect in a mass shooting, a victim of a crime, or even a person
attempting to acquire a driver’s license under an assumed identity fraudulently.
However, concerns have been raised about how this technology impacts privacy
and civil liberty. The studies from the Georgetown Law Center on Privacy and
Technology and the American Civil Liberties Union (ACLU) have noted different
rates of face recognition accuracy for individuals with darker skin. The New York
Times article “Facial Recognition Is Accurate if You’re a White Guy” also raises
concerns about differences in face recognition accuracy of persons with different
skin types. Since AFR is a powerful technology widely deployed in many security
and privacy applications, identifying and addressing the challenges is necessary.
The current state-of-the-art face recognition system demonstrated difficulties in
recognizing faces with high inter-class similarity (often leading to false matches)
and high intra-class variations (often leading to false non-matches). We hypothesize
that facial searches leveraging demographics are one of the best avenues for
performance improvement in face recognition applications. In this dissertation, we conduct a study of face recognition accuracy variations relative to different demographic
factors such as race, gender, and skin tone. The first step characterizes
variations in face recognition false match and false non-match errors relative to
race. The initial results from this study suggest that African Americans have a
higher false match rate and Caucasians have a higher false non-match rate at a
given threshold. The second step characterizes variations in face recognition error
rates relative to gender. Our results showed a higher false match rate and false
non-match rate for women than men, with African American women being the
most disadvantaged group assessed.
Many recent news reports and research articles suggest that dark skin tone is
a cause of increased facial recognition errors. To address the dearth of data available
to support meaningful research on face recognition accuracy varying with skin
tone, we took two approaches to annotate images with Fitzpatrick skin tone ratings.
The first approach was to use human raters to provide an assessment of skin
tones from a face image. These human ratings were used to isolate the effect of
skin tones on false match rates. The results of these experiments do not support
a general conclusion that darker skin tone causes an increased false match rate.
In the second approach, we propose and provide an implementation of automated
skin type labeling of face images. The effort to automate the process of skin tone
assessment from images for which there is no color calibration source present in
the image is quite challenging. These characterizations contribute to foundational
knowledge to solve problems in the domain of biometrics and face recognition.