Validity And Reliability Rubric Of Performance Assessment Geometry Study In Junior High School Using The Many Facet Rasch Model Approach

Rivo Panji Yudha(1*),

(1) Universitas 17 Agustus 1945
(*) Corresponding Author


The purpose of this study was to analyze the validity and reliability rubric of performance appraisal on geometry subject matter using the many facet rasch model approach through the Facets software. Data were collected from 100 small-scale students and 250 large-scale students in junior high schools using 3 raters. The performance assessment instrument in the form of a rubric is used to assess the student's process of working on the questions, each question has a different rubric. The Rasch Many Faceted Measurement Model (MFRM) is used to analyze data by looking at three aspects, namely the facet person, rater agreement, and difficulty domain using the Facet program. For the facet person, the rater separation ratio was 4.96, while the reliability of the separation index was 2.15 which indicates that the assessors are separated reliably. The stratum index is 3.21 which indicates that there are three strata of rater severity that differ statistically in the sample of these 4 raters. The rater agrement obtained the reliability of the rater separation of 0.87 and the correlation between each assessor and the other ranged between 0.40 and 0.63, indicating adequate agreement among the raters in assessing test participants with their level of competence. The Difficulty Domain on the variable maps shows that the hard to soft range is from about +1 to −1 logit.


Performance assessment, rubric, many-facet rasch measurement model (MFRM)

Full Text:



Andrade, H., & Du, Y. (2005). Student perspectives on rubric-referenced assessment. Practical Assessment, Research and Evaluation.

Ballantyne, R., Hughes, K., & Mylonas, A. (2002). Developing procedures for implementing peer assessment in large classes using an action research process. Assessment and Evaluation in Higher Education.

Brennan, R. L., Robert L. Brennan, & Brennan, R. L. (2006). Educational Measurement. Fourth Edition. ACE/Praeger Series on Higher Education. In Praeger.

Diller, K. R., & Phelps, S. F. (2008). Learning outcomes, portfolios, and rubrics, oh my! Authentic assessment of an information literacy program. Portal.

Gonzalez, H. B., & Kuenzi, J. J. (2014). Science, technology, engineering, and mathematics (STEM) education: A primer. In Science, Technology, Engineering and Mathematics Education: Trends and Alignment with Workforce Needs.

Habron, G., Goralnik, L., & Thorp, L. (2012). Embracing the learning paradigm to foster systems thinking. International Journal of Sustainability in Higher Education.

Hafner, J. C., & Hafner, P. M. (2003). Quantitative analysis of the rubric as an assessment tool: An empirical study of student peer-group rating. International Journal of Science Education.

Havarneanu, G. (2012). Standardized Educational Test for Diagnose the Development Level of Creative Mathematical Thinking Qualities. International Research Journal of Social Sciences.

Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Methods.

Johnson, R., Penny, J., & Gordon, B. (2009). Assessing performance: Designing, scoring, and validating performance tasks. Journal of Educational Measurement.

Knight, L. A. (2006). Using rubrics to assess information literacy. Reference Services Review.

Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing.

Lawshe, C. H. (1975). A QUANTITATIVE APPROACH TO CONTENT VALIDITY. Personnel Psychology.

Macfarlane, B. (2011). The Morphing of Academic Practice: Unbundling and the Rise of the Para-academic. Higher Education Quarterly.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist.

Myford, C. M., & Wolfe, E. W. (2003). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I. Journal of Applied Measurement.

Myford, C. M., & Wolfe, E. W. (2004). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II. Journal of Applied Measurement.

National Council of Teachers of Mathematics. (2014). Six Principles for School Mathematics. National Council of Teachers of Mathematics.

Nordrum, L., Evans, K., & Gustafsson, M. (2013). Comparing student learning experiences of in-text commentary and rubric-articulated feedback: Strategies for formative assessment. Assessment and Evaluation in Higher Education.

Oakleaf, M. (2009). Rubrics to assess information literacy: An examination of methodology and interrater reliability. Journal of the American Society for Information Science and Technology.

Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment purposes revisited: A review. In Educational Research Review.

Panadero, E., & Romero, M. (2014). To rubric or not to rubric? The effects of self-assessment on self-regulation, performance and self-efficacy. Assessment in Education: Principles, Policy and Practice.

Randler, C., Hummel, E., Gläser-Zikuda, M., Vollmer, C., Bogner, F. X., & Mayring, P. (2011). Reliability and validation of a short scale to measure situational emotions in science education. International Journal of Environmental and Science Education.

Roever, C., & McNamara, T. (2006). Language testing: The social dimension. In International Journal of Applied Linguistics.

Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin.

Timmermana, B. E. C., Strickland, D. C., Johnson, R. L., & Paynec, J. R. (2011). Development of a “universal” rubric for assessing undergraduates’ scientific reasoning skills using scientific writing. Assessment and Evaluation in Higher Education.

Tindal, G. (2012). Large-scale Assessment Programs for All Students. In Large-scale Assessment Programs for All Students.

Valli, L., & Rennert-Ariev, P. (2002). New standards and assessments? Curriculum transformation in teacher education. Journal of Curriculum Studies.

Vogler, K. (2002). The Impact of High-Stakes, State-Mandated Student Performance Assessment on Teachers’ Instructional Practices. Education.

Weir, J. P. (2005). Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. In Journal of Strength and Conditioning Research.

Wolfe, E. (2004). Identifying Rater Effects Using Latent Trait Models. Psychology Science.

DOI: 10.24235/eduma.v9i2.7100

Article Metrics

Abstract view : 0 times
PDF - 0 times


  • There are currently no refbacks.

Copyright (c) 2020 Eduma : Mathematics Education Learning and Teaching

View My Stats


Creative Commons License
EduMa: Mathematics Education Learning and Teaching by is licensed under a Creative Commons Attribution 4.0 International License.