|In many countries defined by multilingualism, language has been identified as a great influence during psychological and educational testing. In South Africa (SA), factors such as changes in policies and social inequalities also influence testing. Literature supports the translation and adaptation of tests used in such contexts in order to avoid bias caused by language. Different language versions of tests then need to be evaluated for equivalence, to ensure that scores across the different language versions have the same meaning. Differences in dialects may also impact on the results of such tests.Results of an isiXhosa version of the Woodcock Muñoz Language Survey (WMLS),which is a test used to measure isiXhosa learners’ language proficiency, show significant mean score differences on the test scores across rural and urban firstlanguage speakers of isiXhosa. These results have indicated a possible problem regarding rural and urban dialects during testing. This thesis evaluates the item bias of the subtests in this version of the WMLS across rural and urban isiXhosa learners. This was accomplished by evaluating the reliability and item characteristics for group differences, and by evaluating differential item functioning across these two groups on the subtests of the WMLS. The sample in this thesis comprised of 260 isiXhosa learners from the Eastern Cape Province in grade 6 and grade 7, both males and females. This sample was collected in two phases: (1) secondary data from 49 rural and 133 urban isiXhosa learners was included in the sample; (2) adding to the secondary data, a primary data collection from 78 rural isiXhosa learners was made to equalise the two
sample groups. All ethical considerations were included in this thesis. The results were surprising and unexpected. Two of the subtests in the WMLS showed evidence of scalar equivalence as only a few items were identified as problematic. However, two of the subtests demonstrated more problematic items. These results mean that two subtests of the WMLS that demonstrated evidence of scalar equivalence can be used to measure the construct of language proficiency, while the other two sub-tests that showed problematic items need to be further investigated, as the responses given by learners on these items seem to be determined by their group membership and not by their ability.