Pertanika Journal

Go to Pertanika

Go to JTAS Home

Go to Pertanika Facebook

Home / Special Issue / / J

J

Pertanika Journal of Social Science and Humanities, Volume J, Issue J, January J

Keywords: J

Published on: J

Abstract

References

Alderson, J. C., Clapman, C., & Wall, D. (1995). Language test construction and evaluation. Ernst Klett Sprachen.
Attali, Y. (2015). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99-115. https://doi.org/10.1177/0265532215582283
Azizah, N., Suseno, M., & Hayat, B. (2020). Severity-leniency in writing assessment and its causal factors. In International Conference on Humanities, Education and Social Sciences (IC-HEDS) 2019 (pp. 173-185). Knowledge E. https://doi.org/10.18502/kss.v4i14.7870
Barkaoui, K. (2011). Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in Education: Principles, Policy and Practice, 18(3), 279-293. https://doi.org/10.1080/0969594X.2010.526585
Best, J. W. (1977). Research in education (3rd ed.). Prentice-Hall Inc.
Brown, A. (2000). An investigation of the rating process in the IELTS oral interview. In International English Language Testing System (IELTS) Research Reports 20003 (Vol. 3, pp 49-84). IELTS Australia
Brown, G. (2009). The reliability of essay scores: The necessity of rubrics and moderation. In S. D. L. H. Meyer, H. Anderson, R. Fletcher, P. M. Johnston & M. Rees (Eds.), Tertiary assessment and higher education student outcomes: Policy, practice and research (pp. 40-48). Ako Aotearoa.
Chan, S. H., & Wong, B. E. (2004). Assessing oral skills of pre-tertiary students: The nature of the communicative act. In Proceedings of the International Conference on English Instruction and Assessment (pp. 33-48). National Chung Cheng University.
Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7(1), 31-51. https://doi.org/10.1177/026553229000700104
Eckes, T. (2008). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221. https://doi.org/10.1207/s15434311laq0203_2
Elder, C., Barkhuizen, G., Knoch, U., & Von Randow, J. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24(1), 37-64. https://doi.org/10.1177/0265532207071511
Ellington, J. K., & Wilson, M. A. (2017). The performance appraisal milieu: A multilevel analysis of context effects in performance ratings. Journal of Business and Psychology, 32(1), 87-100. https://doi.org/10.1007/s10869-016-9437-x
Engelhard, J. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5(3), 171-191. https://doi.org/10.1207/s15324818ame0503_1
Engelhard, J. (1996) Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33(1), 56-70. https://doi.org/10.1111/j.1745-3984.1996.tb00479.x
Erlam, R., Ellis, R., & Batstone, R. (2013). Oral corrective feedback on L2 writing: Two approaches compared. System, 41(2), 257-268. https://doi.org/10.1016/j.system.2013.03.004
Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. International Journal of Language Testing, 1(1), 1-16.
Freedman, S. W. (1981). Influences on evaluators of expository essays: Beyond the text. Research in the Teaching of English, 15(3), 245-255.
Furneaux, C., & Rignall, M. (2002) The effect of standardisation-training on rater judgments for the IELTS Writing Module. Cambridge University Press.
Green, A. (2003). Test impact and English for academic purposes: A comparative study in backwash between IELTS preparation and university professional courses [Unpublished Doctoral dissertation]. University of Surrey.
Gyagenda, I. S., & Engelhard, G. (2009). Using classical and modern measurement theories to explore rater, domain, and gender influences on student writing ability. Journal of Applied Measurement, 10(3), 225-246.
Hamilton, J., Reddel, S., & Spratt, M. (2001). Teachers’ perception of on-line rater training and monitoring. System, 29(2001), 505-520. https://doi.org/10.1016/S0346-251X(01)00036-7
Hamp-Lyons, L. (1990). Second language writing: Assessment issues. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 69-87). Cambridge University Press. https://doi.org/10.1017/CBO9781139524551.009
Hodges, T. S., Wright, K. L., Wind, S. A., Matthews, S. D., Zimmer, W. K., & McTigue, E. (2019). Developing and examining validity evidence for the Writing Rubric to Inform Teacher Educators (WRITE). Assessing Writing, 40, 1-13. https://doi.org/10.1016/j.asw.2019.03.001
Hoskens, M., & Wilson, M. (2001). Real-time feedback on rater drift in constructed-response items: An example from the golden state examination. Journal of Educational Measurement, 38, 121-145. https://doi.org/10.1111/j.1745-3984.2001.tb01119.x
IELTS. (2003). IELTS annual review 2001/2002. IELTS Australia.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(20), 130-144. https://doi.org/10.1016/j.edurev.2007.05.002
Kim, Y. S. G., Schatschneider, C., Wanzek, J., Gatlin, B., & Al Otaiba, S. (2017). Writing evaluation: Rater and task effects on the reliability of writing scores for children in Grades 3 and 4. Reading and writing, 30(6), 1287-1310. https://doi.org/10.1007/s11145-017-9724-6
Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing writing, 12(1), 26-43. https://doi.org/10.1016/j.asw.2007.04.001
Kondo, Y. (2010). Examination of rater training effect and rater eligibility in L2 performance assessment. Journal of Pan-Pacific Association of Applied Linguistics, 14(2), 1-23.
Leckie, G., & Baird, J. A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28(4), 543-560. https://doi.org/10.1177/0265532211406422
Lodico, M. G., Spaulding, D. T., & Voegtle, K. H. (2010). Methods in educational research: From theory to practice (Vol. 28). John Wiley & Sons.
Lumley, T. (2000). The process of assessment of writing performance: The rater’s perspective. [Unpublished Doctoral dissertation]. University of Melbourne.
Lumley, T. (2002). Assessment criteria in large scale writing test: What do they really mean to raters? Language Testing, 19(3), 246-276. https://doi.org/10.1191/0265532202lt230oa
McIntyre, P. N. (1993). The importance and effectiveness of moderation training on the reliability of teacher assessments of ESL writing samples [Master’s Research thesis, University of Melbourne]. Minerva Access. http://cat.lib.unimelb.edu.au/record=b1849170
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23. https://doi.org/10.3102/0013189X023002013
Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371-389. https://doi.org/10.1111/j.1745-3984.2009.00088.x
O’Sullivan, B., & Rignall, M. (2002). A longitudinal analysis of the effect of feedback on rater performance on the IELTS General Training writing module. Cambridge ESOL/The British Council/ IDA Australia: IELTS Research Report.
O’Sullivan, B., & Rignall, M. (2001). Assessing the value of bias analysis feedback to raters for the IELTS writing module. Cambridge ESOL/The British Council/ IDA Australia: IELTS Research Report.
Raczynski, K. R., Cohen, A. S., Engelhard Jr, G., & Lu, Z. (2015). Comparing the effectiveness of self‐paced and collaborative frame‐of‐reference training on rater accuracy in a large‐scale writing assessment. Journal of Educational Measurement, 52(3), 301-318. https://doi.org/10.1111/jedm.12079
Ragupathi, K., & Lee, A. (2020). Beyond fairness and consistency in grading: The role of rubrics in higher education. In Diversity and inclusion in global higher education (pp. 73-95). Palgrave Macmillan. https://doi.org/10.1007/978-981-15-1628-3_3
Reddy, Y. M., & Andrade, H. (2010). A review of rubric use in higher education. Assessment & Evaluation in Higher Education, 35(4), 435-448. https://doi.org/10.1080/02602930902862859
Reed, D. J., & Cohen, A. D. (2001). Revisiting raters and ratings in oral language assessment. In C. Elder, A. Brown, E. Grove, K. Hill, N. Iwashita, T. Lumley, T. McNamara, & K. O’Loughlin (Eds.), Experimenting with uncertainty: Language testing essays in honour of Alan Davies (pp. 82-96). Cambridge University Press.
Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing writing, 15(1), 18-39. https://doi.org/10.1016/j.asw.2010.01.003
Rosales-Sánchez, C., Díaz-Cabrera, D., & Hernández-Fernaud, E. (2019). Does effectiveness in performance appraisal improve with rater training? PloS one, 14(9), Article e0222694. https://doi.org/10.1371/journal.pone.0222694
Schoepp, K., Danaher, M., & Kranov, A. A. (2018). An effective rubric norming process. Practical Assessment, Research, and Evaluation, 23(1), 1-12.
Shabani, E. A., & Panahi, J. (2020). Examining consistency among different rubrics for assessing writing. Language Testing Asia, 10(1), 1-25. https://doi.org/10.1186/s40468-020-00111-4
Shaw, S. (2002). The effect of training and standardisation on rater judgement and inter-rater reliability. Research Notes, 8, 13-17.
Shohamy, E., Gordon, C. M., & Kraemer, R. (1992). The effect of raters’ background and training on the reliability of direct writing tests. The Modern Language Journal, 76(1), 27-33. https://doi.org/10.1111/j.1540-4781.1992.tb02574.x
Tajeddin, Z., & Alemi, M. (2014). Pragmatic rater training: Does it affect non-native L2 teachers’ rating accuracy and bias. Iranian Journal of Language Testing, 4(1), 66-83.
Tziner, A., Joanis, C., & Murphy, K. R. (2000). A comparison of three methods of performance appraisal with regard to goal properties, goal perception, and ratee satisfaction. Group & Organization Management, 25(2), 175-190. https://doi.org/10.1177/1059601100252005
Wang, J., Engelhard Jr, G., Raczynski, K., Song, T., & Wolfe, E. W. (2017). Evaluating rater accuracy and perception for integrated writing assessments using a mixed-methods approach. Assessing Writing, 33, 36-47. https://doi.org/10.1016/j.asw.2017.03.003
Wei, J., & Llosa, L. (2015). Investigating differences between American and Indian raters in assessing TOEFL iBT speaking tasks. Language Assessment Quarterly, 12(3), 283-304. https://doi.org/10.1080/15434303.2015.1037446
Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing, 11(2), 197-223. https://doi.org/10.1177/026553229401100206
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205
Weigle, S. C. (1999). Investigating Rater/Prompt Interactions in Writing assessment: Quantitative and Qualitative Approaches. Assessing Writing, 6(2), 145-178. https://doi.org/10.1016/S1075-2935(00)00010-6
Weigle, S. C. (2002). Assessing writing. Ernst Klett Sprachen. https://doi.org/10.1017/CBO9780511732997
Wigglesworth, G. (1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305-335. https://doi.org/10.1177/026553229301000306
Wolfe, E. W., & McVay, A. (2010). Rater effects as a function of rater training context (White paper). Pearson Assessments.
Wolfe, E. W., Kao, C. W., & Ranney, M. (1998). Cognitive differences in proficient and nonproficient essay scorers. Written Communication, 15, 465-492. https://doi.org/10.1177/0741088398015004002
Xie, Q. (2015). “I must impress the raters!” An investigation of Chinese test-takers’ strategies to manage rater impressions. Assessing Writing, 25, 22-37. https://doi.org/10.1016/j.asw.2015.05.001
Alderson, J. C., Clapman, C., & Wall, D. (1995). Language test construction and evaluation. Ernst Klett Sprachen.
Attali, Y. (2015). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99-115. https://doi.org/10.1177/0265532215582283
Azizah, N., Suseno, M., & Hayat, B. (2020). Severity-leniency in writing assessment and its causal factors. In International Conference on Humanities, Education and Social Sciences (IC-HEDS) 2019 (pp. 173-185). Knowledge E. https://doi.org/10.18502/kss.v4i14.7870
Barkaoui, K. (2011). Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in Education: Principles, Policy and Practice, 18(3), 279-293. https://doi.org/10.1080/0969594X.2010.526585
Best, J. W. (1977). Research in education (3rd ed.). Prentice-Hall Inc.
Brown, A. (2000). An investigation of the rating process in the IELTS oral interview. In International English Language Testing System (IELTS) Research Reports 20003 (Vol. 3, pp 49-84). IELTS Australia
Brown, G. (2009). The reliability of essay scores: The necessity of rubrics and moderation. In S. D. L. H. Meyer, H. Anderson, R. Fletcher, P. M. Johnston & M. Rees (Eds.), Tertiary assessment and higher education student outcomes: Policy, practice and research (pp. 40-48). Ako Aotearoa.
Chan, S. H., & Wong, B. E. (2004). Assessing oral skills of pre-tertiary students: The nature of the communicative act. In Proceedings of the International Conference on English Instruction and Assessment (pp. 33-48). National Chung Cheng University.
Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7(1), 31-51. https://doi.org/10.1177/026553229000700104
Eckes, T. (2008). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221. https://doi.org/10.1207/s15434311laq0203_2
Elder, C., Barkhuizen, G., Knoch, U., & Von Randow, J. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24(1), 37-64. https://doi.org/10.1177/0265532207071511
Ellington, J. K., & Wilson, M. A. (2017). The performance appraisal milieu: A multilevel analysis of context effects in performance ratings. Journal of Business and Psychology, 32(1), 87-100. https://doi.org/10.1007/s10869-016-9437-x
Engelhard, J. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5(3), 171-191. https://doi.org/10.1207/s15324818ame0503_1
Engelhard, J. (1996) Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33(1), 56-70. https://doi.org/10.1111/j.1745-3984.1996.tb00479.x
Erlam, R., Ellis, R., & Batstone, R. (2013). Oral corrective feedback on L2 writing: Two approaches compared. System, 41(2), 257-268. https://doi.org/10.1016/j.system.2013.03.004
Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. International Journal of Language Testing, 1(1), 1-16.
Freedman, S. W. (1981). Influences on evaluators of expository essays: Beyond the text. Research in the Teaching of English, 15(3), 245-255.
Furneaux, C., & Rignall, M. (2002) The effect of standardisation-training on rater judgments for the IELTS Writing Module. Cambridge University Press.
Green, A. (2003). Test impact and English for academic purposes: A comparative study in backwash between IELTS preparation and university professional courses [Unpublished Doctoral dissertation]. University of Surrey.
Gyagenda, I. S., & Engelhard, G. (2009). Using classical and modern measurement theories to explore rater, domain, and gender influences on student writing ability. Journal of Applied Measurement, 10(3), 225-246.
Hamilton, J., Reddel, S., & Spratt, M. (2001). Teachers’ perception of on-line rater training and monitoring. System, 29(2001), 505-520. https://doi.org/10.1016/S0346-251X(01)00036-7
Hamp-Lyons, L. (1990). Second language writing: Assessment issues. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 69-87). Cambridge University Press. https://doi.org/10.1017/CBO9781139524551.009
Hodges, T. S., Wright, K. L., Wind, S. A., Matthews, S. D., Zimmer, W. K., & McTigue, E. (2019). Developing and examining validity evidence for the Writing Rubric to Inform Teacher Educators (WRITE). Assessing Writing, 40, 1-13. https://doi.org/10.1016/j.asw.2019.03.001
Hoskens, M., & Wilson, M. (2001). Real-time feedback on rater drift in constructed-response items: An example from the golden state examination. Journal of Educational Measurement, 38, 121-145. https://doi.org/10.1111/j.1745-3984.2001.tb01119.x
IELTS. (2003). IELTS annual review 2001/2002. IELTS Australia.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(20), 130-144. https://doi.org/10.1016/j.edurev.2007.05.002
Kim, Y. S. G., Schatschneider, C., Wanzek, J., Gatlin, B., & Al Otaiba, S. (2017). Writing evaluation: Rater and task effects on the reliability of writing scores for children in Grades 3 and 4. Reading and writing, 30(6), 1287-1310. https://doi.org/10.1007/s11145-017-9724-6
Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing writing, 12(1), 26-43. https://doi.org/10.1016/j.asw.2007.04.001
Kondo, Y. (2010). Examination of rater training effect and rater eligibility in L2 performance assessment. Journal of Pan-Pacific Association of Applied Linguistics, 14(2), 1-23.
Leckie, G., & Baird, J. A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28(4), 543-560. https://doi.org/10.1177/0265532211406422
Lodico, M. G., Spaulding, D. T., & Voegtle, K. H. (2010). Methods in educational research: From theory to practice (Vol. 28). John Wiley & Sons.
Lumley, T. (2000). The process of assessment of writing performance: The rater’s perspective. [Unpublished Doctoral dissertation]. University of Melbourne.
Lumley, T. (2002). Assessment criteria in large scale writing test: What do they really mean to raters? Language Testing, 19(3), 246-276. https://doi.org/10.1191/0265532202lt230oa
McIntyre, P. N. (1993). The importance and effectiveness of moderation training on the reliability of teacher assessments of ESL writing samples [Master’s Research thesis, University of Melbourne]. Minerva Access. http://cat.lib.unimelb.edu.au/record=b1849170
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23. https://doi.org/10.3102/0013189X023002013
Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371-389. https://doi.org/10.1111/j.1745-3984.2009.00088.x
O’Sullivan, B., & Rignall, M. (2002). A longitudinal analysis of the effect of feedback on rater performance on the IELTS General Training writing module. Cambridge ESOL/The British Council/ IDA Australia: IELTS Research Report.
O’Sullivan, B., & Rignall, M. (2001). Assessing the value of bias analysis feedback to raters for the IELTS writing module. Cambridge ESOL/The British Council/ IDA Australia: IELTS Research Report.
Raczynski, K. R., Cohen, A. S., Engelhard Jr, G., & Lu, Z. (2015). Comparing the effectiveness of self‐paced and collaborative frame‐of‐reference training on rater accuracy in a large‐scale writing assessment. Journal of Educational Measurement, 52(3), 301-318. https://doi.org/10.1111/jedm.12079
Ragupathi, K., & Lee, A. (2020). Beyond fairness and consistency in grading: The role of rubrics in higher education. In Diversity and inclusion in global higher education (pp. 73-95). Palgrave Macmillan. https://doi.org/10.1007/978-981-15-1628-3_3
Reddy, Y. M., & Andrade, H. (2010). A review of rubric use in higher education. Assessment & Evaluation in Higher Education, 35(4), 435-448. https://doi.org/10.1080/02602930902862859
Reed, D. J., & Cohen, A. D. (2001). Revisiting raters and ratings in oral language assessment. In C. Elder, A. Brown, E. Grove, K. Hill, N. Iwashita, T. Lumley, T. McNamara, & K. O’Loughlin (Eds.), Experimenting with uncertainty: Language testing essays in honour of Alan Davies (pp. 82-96). Cambridge University Press.
Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing writing, 15(1), 18-39. https://doi.org/10.1016/j.asw.2010.01.003
Rosales-Sánchez, C., Díaz-Cabrera, D., & Hernández-Fernaud, E. (2019). Does effectiveness in performance appraisal improve with rater training? PloS one, 14(9), Article e0222694. https://doi.org/10.1371/journal.pone.0222694
Schoepp, K., Danaher, M., & Kranov, A. A. (2018). An effective rubric norming process. Practical Assessment, Research, and Evaluation, 23(1), 1-12.
Shabani, E. A., & Panahi, J. (2020). Examining consistency among different rubrics for assessing writing. Language Testing Asia, 10(1), 1-25. https://doi.org/10.1186/s40468-020-00111-4
Shaw, S. (2002). The effect of training and standardisation on rater judgement and inter-rater reliability. Research Notes, 8, 13-17.
Shohamy, E., Gordon, C. M., & Kraemer, R. (1992). The effect of raters’ background and training on the reliability of direct writing tests. The Modern Language Journal, 76(1), 27-33. https://doi.org/10.1111/j.1540-4781.1992.tb02574.x
Tajeddin, Z., & Alemi, M. (2014). Pragmatic rater training: Does it affect non-native L2 teachers’ rating accuracy and bias. Iranian Journal of Language Testing, 4(1), 66-83.
Tziner, A., Joanis, C., & Murphy, K. R. (2000). A comparison of three methods of performance appraisal with regard to goal properties, goal perception, and ratee satisfaction. Group & Organization Management, 25(2), 175-190. https://doi.org/10.1177/1059601100252005
Wang, J., Engelhard Jr, G., Raczynski, K., Song, T., & Wolfe, E. W. (2017). Evaluating rater accuracy and perception for integrated writing assessments using a mixed-methods approach. Assessing Writing, 33, 36-47. https://doi.org/10.1016/j.asw.2017.03.003
Wei, J., & Llosa, L. (2015). Investigating differences between American and Indian raters in assessing TOEFL iBT speaking tasks. Language Assessment Quarterly, 12(3), 283-304. https://doi.org/10.1080/15434303.2015.1037446
Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing, 11(2), 197-223. https://doi.org/10.1177/026553229401100206
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205
Weigle, S. C. (1999). Investigating Rater/Prompt Interactions in Writing assessment: Quantitative and Qualitative Approaches. Assessing Writing, 6(2), 145-178. https://doi.org/10.1016/S1075-2935(00)00010-6
Weigle, S. C. (2002). Assessing writing. Ernst Klett Sprachen. https://doi.org/10.1017/CBO9780511732997
Wigglesworth, G. (1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305-335. https://doi.org/10.1177/026553229301000306
Wolfe, E. W., & McVay, A. (2010). Rater effects as a function of rater training context (White paper). Pearson Assessments.
Wolfe, E. W., Kao, C. W., & Ranney, M. (1998). Cognitive differences in proficient and nonproficient essay scorers. Written Communication, 15, 465-492. https://doi.org/10.1177/0741088398015004002
Xie, Q. (2015). “I must impress the raters!” An investigation of Chinese test-takers’ strategies to manage rater impressions. Assessing Writing, 25, 22-37. https://doi.org/10.1016/j.asw.2015.05.001
Alderson, J. C., Clapman, C., & Wall, D. (1995). Language test construction and evaluation. Ernst Klett Sprachen.
Attali, Y. (2015). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99-115. https://doi.org/10.1177/0265532215582283
Azizah, N., Suseno, M., & Hayat, B. (2020). Severity-leniency in writing assessment and its causal factors. In International Conference on Humanities, Education and Social Sciences (IC-HEDS) 2019 (pp. 173-185). Knowledge E. https://doi.org/10.18502/kss.v4i14.7870
Barkaoui, K. (2011). Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in Education: Principles, Policy and Practice, 18(3), 279-293. https://doi.org/10.1080/0969594X.2010.526585
Best, J. W. (1977). Research in education (3rd ed.). Prentice-Hall Inc.
Brown, A. (2000). An investigation of the rating process in the IELTS oral interview. In International English Language Testing System (IELTS) Research Reports 20003 (Vol. 3, pp 49-84). IELTS Australia
Brown, G. (2009). The reliability of essay scores: The necessity of rubrics and moderation. In S. D. L. H. Meyer, H. Anderson, R. Fletcher, P. M. Johnston & M. Rees (Eds.), Tertiary assessment and higher education student outcomes: Policy, practice and research (pp. 40-48). Ako Aotearoa.
Chan, S. H., & Wong, B. E. (2004). Assessing oral skills of pre-tertiary students: The nature of the communicative act. In Proceedings of the International Conference on English Instruction and Assessment (pp. 33-48). National Chung Cheng University.
Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7(1), 31-51. https://doi.org/10.1177/026553229000700104
Eckes, T. (2008). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197-221. https://doi.org/10.1207/s15434311laq0203_2
Elder, C., Barkhuizen, G., Knoch, U., & Von Randow, J. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24(1), 37-64. https://doi.org/10.1177/0265532207071511
Ellington, J. K., & Wilson, M. A. (2017). The performance appraisal milieu: A multilevel analysis of context effects in performance ratings. Journal of Business and Psychology, 32(1), 87-100. https://doi.org/10.1007/s10869-016-9437-x
Engelhard, J. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5(3), 171-191. https://doi.org/10.1207/s15324818ame0503_1
Engelhard, J. (1996) Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33(1), 56-70. https://doi.org/10.1111/j.1745-3984.1996.tb00479.x
Erlam, R., Ellis, R., & Batstone, R. (2013). Oral corrective feedback on L2 writing: Two approaches compared. System, 41(2), 257-268. https://doi.org/10.1016/j.system.2013.03.004
Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. International Journal of Language Testing, 1(1), 1-16.
Freedman, S. W. (1981). Influences on evaluators of expository essays: Beyond the text. Research in the Teaching of English, 15(3), 245-255.
Furneaux, C., & Rignall, M. (2002) The effect of standardisation-training on rater judgments for the IELTS Writing Module. Cambridge University Press.
Green, A. (2003). Test impact and English for academic purposes: A comparative study in backwash between IELTS preparation and university professional courses [Unpublished Doctoral dissertation]. University of Surrey.
Gyagenda, I. S., & Engelhard, G. (2009). Using classical and modern measurement theories to explore rater, domain, and gender influences on student writing ability. Journal of Applied Measurement, 10(3), 225-246.
Hamilton, J., Reddel, S., & Spratt, M. (2001). Teachers’ perception of on-line rater training and monitoring. System, 29(2001), 505-520. https://doi.org/10.1016/S0346-251X(01)00036-7
Hamp-Lyons, L. (1990). Second language writing: Assessment issues. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 69-87). Cambridge University Press. https://doi.org/10.1017/CBO9781139524551.009
Hodges, T. S., Wright, K. L., Wind, S. A., Matthews, S. D., Zimmer, W. K., & McTigue, E. (2019). Developing and examining validity evidence for the Writing Rubric to Inform Teacher Educators (WRITE). Assessing Writing, 40, 1-13. https://doi.org/10.1016/j.asw.2019.03.001
Hoskens, M., & Wilson, M. (2001). Real-time feedback on rater drift in constructed-response items: An example from the golden state examination. Journal of Educational Measurement, 38, 121-145. https://doi.org/10.1111/j.1745-3984.2001.tb01119.x
IELTS. (2003). IELTS annual review 2001/2002. IELTS Australia.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(20), 130-144. https://doi.org/10.1016/j.edurev.2007.05.002
Kim, Y. S. G., Schatschneider, C., Wanzek, J., Gatlin, B., & Al Otaiba, S. (2017). Writing evaluation: Rater and task effects on the reliability of writing scores for children in Grades 3 and 4. Reading and writing, 30(6), 1287-1310. https://doi.org/10.1007/s11145-017-9724-6
Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing writing, 12(1), 26-43. https://doi.org/10.1016/j.asw.2007.04.001
Kondo, Y. (2010). Examination of rater training effect and rater eligibility in L2 performance assessment. Journal of Pan-Pacific Association of Applied Linguistics, 14(2), 1-23.
Leckie, G., & Baird, J. A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28(4), 543-560. https://doi.org/10.1177/0265532211406422
Lodico, M. G., Spaulding, D. T., & Voegtle, K. H. (2010). Methods in educational research: From theory to practice (Vol. 28). John Wiley & Sons.
Lumley, T. (2000). The process of assessment of writing performance: The rater’s perspective. [Unpublished Doctoral dissertation]. University of Melbourne.
Lumley, T. (2002). Assessment criteria in large scale writing test: What do they really mean to raters? Language Testing, 19(3), 246-276. https://doi.org/10.1191/0265532202lt230oa
McIntyre, P. N. (1993). The importance and effectiveness of moderation training on the reliability of teacher assessments of ESL writing samples [Master’s Research thesis, University of Melbourne]. Minerva Access. http://cat.lib.unimelb.edu.au/record=b1849170
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13-23. https://doi.org/10.3102/0013189X023002013
Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46(4), 371-389. https://doi.org/10.1111/j.1745-3984.2009.00088.x
O’Sullivan, B., & Rignall, M. (2002). A longitudinal analysis of the effect of feedback on rater performance on the IELTS General Training writing module. Cambridge ESOL/The British Council/ IDA Australia: IELTS Research Report.
O’Sullivan, B., & Rignall, M. (2001). Assessing the value of bias analysis feedback to raters for the IELTS writing module. Cambridge ESOL/The British Council/ IDA Australia: IELTS Research Report.
Raczynski, K. R., Cohen, A. S., Engelhard Jr, G., & Lu, Z. (2015). Comparing the effectiveness of self‐paced and collaborative frame‐of‐reference training on rater accuracy in a large‐scale writing assessment. Journal of Educational Measurement, 52(3), 301-318. https://doi.org/10.1111/jedm.12079
Ragupathi, K., & Lee, A. (2020). Beyond fairness and consistency in grading: The role of rubrics in higher education. In Diversity and inclusion in global higher education (pp. 73-95). Palgrave Macmillan. https://doi.org/10.1007/978-981-15-1628-3_3
Reddy, Y. M., & Andrade, H. (2010). A review of rubric use in higher education. Assessment & Evaluation in Higher Education, 35(4), 435-448. https://doi.org/10.1080/02602930902862859
Reed, D. J., & Cohen, A. D. (2001). Revisiting raters and ratings in oral language assessment. In C. Elder, A. Brown, E. Grove, K. Hill, N. Iwashita, T. Lumley, T. McNamara, & K. O’Loughlin (Eds.), Experimenting with uncertainty: Language testing essays in honour of Alan Davies (pp. 82-96). Cambridge University Press.
Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing writing, 15(1), 18-39. https://doi.org/10.1016/j.asw.2010.01.003
Rosales-Sánchez, C., Díaz-Cabrera, D., & Hernández-Fernaud, E. (2019). Does effectiveness in performance appraisal improve with rater training? PloS one, 14(9), Article e0222694. https://doi.org/10.1371/journal.pone.0222694
Schoepp, K., Danaher, M., & Kranov, A. A. (2018). An effective rubric norming process. Practical Assessment, Research, and Evaluation, 23(1), 1-12.
Shabani, E. A., & Panahi, J. (2020). Examining consistency among different rubrics for assessing writing. Language Testing Asia, 10(1), 1-25. https://doi.org/10.1186/s40468-020-00111-4
Shaw, S. (2002). The effect of training and standardisation on rater judgement and inter-rater reliability. Research Notes, 8, 13-17.
Shohamy, E., Gordon, C. M., & Kraemer, R. (1992). The effect of raters’ background and training on the reliability of direct writing tests. The Modern Language Journal, 76(1), 27-33. https://doi.org/10.1111/j.1540-4781.1992.tb02574.x
Tajeddin, Z., & Alemi, M. (2014). Pragmatic rater training: Does it affect non-native L2 teachers’ rating accuracy and bias. Iranian Journal of Language Testing, 4(1), 66-83.
Tziner, A., Joanis, C., & Murphy, K. R. (2000). A comparison of three methods of performance appraisal with regard to goal properties, goal perception, and ratee satisfaction. Group & Organization Management, 25(2), 175-190. https://doi.org/10.1177/1059601100252005
Wang, J., Engelhard Jr, G., Raczynski, K., Song, T., & Wolfe, E. W. (2017). Evaluating rater accuracy and perception for integrated writing assessments using a mixed-methods approach. Assessing Writing, 33, 36-47. https://doi.org/10.1016/j.asw.2017.03.003
Wei, J., & Llosa, L. (2015). Investigating differences between American and Indian raters in assessing TOEFL iBT speaking tasks. Language Assessment Quarterly, 12(3), 283-304. https://doi.org/10.1080/15434303.2015.1037446
Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing, 11(2), 197-223. https://doi.org/10.1177/026553229401100206
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205
Weigle, S. C. (1999). Investigating Rater/Prompt Interactions in Writing assessment: Quantitative and Qualitative Approaches. Assessing Writing, 6(2), 145-178. https://doi.org/10.1016/S1075-2935(00)00010-6
Weigle, S. C. (2002). Assessing writing. Ernst Klett Sprachen. https://doi.org/10.1017/CBO9780511732997
Wigglesworth, G. (1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305-335. https://doi.org/10.1177/026553229301000306
Wolfe, E. W., & McVay, A. (2010). Rater effects as a function of rater training context (White paper). Pearson Assessments.
Wolfe, E. W., Kao, C. W., & Ranney, M. (1998). Cognitive differences in proficient and nonproficient essay scorers. Written Communication, 15, 465-492. https://doi.org/10.1177/0741088398015004002
Xie, Q. (2015). “I must impress the raters!” An investigation of Chinese test-takers’ strategies to manage rater impressions. Assessing Writing, 25, 22-37. https://doi.org/10.1016/j.asw.2015.05.001

ISSN 0128-7702

e-ISSN 2231-8534

Article ID

PDF

Share this article

Make a Submission

Recent Articles

PERTANIKA JOURNAL OF SOCIAL SCIENCES AND HUMANITIES

J