The Impact of Culturally Different Types of Faking on Test Properties
Abstract
Personality measures have become a key component of the human capital
strategies of organizations (Hough & Oswald, 2008). Multinationals often conduct
employee selection across countries, and opportunities have expanded for using
personality tests in selection across cultures. However, few studies have yet
investigated applicant faking behavior in cross-cultural contexts, despite research
suggesting that psychometric assessment tools are vulnerable to this (Salgado, 2016).
One of main causes of the limited number of cross-cultural faking studies is the
difficulty in data collection. Simulation is one of the more promising methods, and
current research of similar problems often uses simulation (e.g., Jin & Wang, 2014;
Johnson & Bolt, 2010; Plieninger, 2017). However, existing simulation methods for
research on faking are mainly for scale-level investigation, and unsuitable for
exploring the effect of faking on cross-cultural assessments; item-level investigation is
necessary for scrutinizing test equivalence across countries.
The present study explored the effect of culturally different types of faking,
which were hypothesized based on findings of past studies regarding faking and/or cultures. Item-level Monte Carlo simulations assumed that two countries had different
types of fakers. The simulations employed five parameters including sample size,
faking severity, faking style, percentage of items faked, and faking prevalence, to
investigate occurrences of differential item functioning (DIF), changes in IRT
parameters, and mean shifts.
Simulation results found considerable numbers of DIFs in most conditions.
Additionally, the IRT discrimination parameter either did not change or slightly
degraded by faking manipulation up to 50% of faked items, while the discrimination
parameter improved after 50% of faked items across conditions of faking severity and
style. This improvement of the discrimination parameter could be explained by the
fact that the construct the items were assessing varied; up to 50% faked items, the
items assessed the construct they were designed to assess, but after 50% of faked
items, the items measured a different construct, which was not perfectly different to
the original but could be transformed to assess some faking variance. Furthermore, the
simulations showed that some faking styles cancelled out the effect of others on test
properties, and as a result, DIF occurrences and mean shifts were underestimated.
Implications, limitations, and suggestions for future research are discussed.