The Impact of Culturally Different Types of Faking on Test Properties
MetadataShow full item record
Personality measures have become a key component of the human capital strategies of organizations (Hough & Oswald, 2008). Multinationals often conduct employee selection across countries, and opportunities have expanded for using personality tests in selection across cultures. However, few studies have yet investigated applicant faking behavior in cross-cultural contexts, despite research suggesting that psychometric assessment tools are vulnerable to this (Salgado, 2016). One of main causes of the limited number of cross-cultural faking studies is the difficulty in data collection. Simulation is one of the more promising methods, and current research of similar problems often uses simulation (e.g., Jin & Wang, 2014; Johnson & Bolt, 2010; Plieninger, 2017). However, existing simulation methods for research on faking are mainly for scale-level investigation, and unsuitable for exploring the effect of faking on cross-cultural assessments; item-level investigation is necessary for scrutinizing test equivalence across countries. The present study explored the effect of culturally different types of faking, which were hypothesized based on findings of past studies regarding faking and/or cultures. Item-level Monte Carlo simulations assumed that two countries had different types of fakers. The simulations employed five parameters including sample size, faking severity, faking style, percentage of items faked, and faking prevalence, to investigate occurrences of differential item functioning (DIF), changes in IRT parameters, and mean shifts. Simulation results found considerable numbers of DIFs in most conditions. Additionally, the IRT discrimination parameter either did not change or slightly degraded by faking manipulation up to 50% of faked items, while the discrimination parameter improved after 50% of faked items across conditions of faking severity and style. This improvement of the discrimination parameter could be explained by the fact that the construct the items were assessing varied; up to 50% faked items, the items assessed the construct they were designed to assess, but after 50% of faked items, the items measured a different construct, which was not perfectly different to the original but could be transformed to assess some faking variance. Furthermore, the simulations showed that some faking styles cancelled out the effect of others on test properties, and as a result, DIF occurrences and mean shifts were underestimated. Implications, limitations, and suggestions for future research are discussed.