Cómo asegurar evaluaciones válidas y detectar falseamiento en pruebas a distancia síncronas

Palabras clave: fraude académico, evaluación del estudiante, prueba, metodología, COVID-19

Resumen

El pasaje de la enseñanza presencial a la modalidad a distancia, como medida para enfrentar el COVID-19, trajo como consecuencia la necesidad de la validación de los resultados de las pruebas tomadas en formato electrónico. Se considera que los estudiantes tienen mayor facilidad para cometer fraudes en pruebas realizadas a distancia. El objetivo del artículo es presentar el estudio del falseamiento como un aporte al análisis de la validez psicométrica de las pruebas. A través de una revisión bibliográfica se analiza el concepto de falseamiento y sus tipos. Se presentan los principales métodos para detectarlo, que pueden ser utilizados para asegurar la validez de los resultados en las pruebas síncronas, tipo opción múltiple. Se describen los usos, potencialidades y limitaciones de los métodos presentados. Por último, se plantean los principales desafíos por superar para la validación de los resultados de pruebas síncronas realizadas a distancia.

Descargas

La descarga de datos todavía no está disponible.

Citas

Abad, F. J., Olea, J., Ponsoda, V., y García, C. (2011). Medición en ciencias del comportamiento y de la salud. Madrid: Editorial Síntesis.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA, APA, & NCME]. (2014). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

Amigud, A., Arnedo-Moreno, J., Daradoumis, T. & Guerrero-Roldán, A. (2017). Open Proctor: An Academic Integrity Tool for the Open Learning Environment. En: Barolli L., Woungang I., Hussain O. (Eds.) Advances in Intelligent Networking & Collaborative Systems. Lecture Notes on Data Engineering and Communications Technologies, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-65636-6_23

Arnold, I. J. (2016). Cheating at online formative tests: Does it pay off? Internet and Higher Education, 29, 98–106.

Arthur, W., Glaze, R. M., Villado, A. J. & Taylor, J. E. (2010). The Magnitude and Extent of Cheating and Response Distortion Effects on Unproctored Internet-Based Tests of Cognitive Ability and Personality. International Journal of Selection and Assessment, 18 (1), 1-16.

Baró-Solé, X., Guerrero-Roldan, A.E., Prieto-Blázquez, J., Rozeva, A. Marinov, O., Kiennert, Ch., Rocher, P.O., Garcia-Alfaro, J. (2018). Integration of an adaptive trust-based e-assessment system into virtual learning environments—The TeSLA project experience. Internet TechnologyLetters, 1:e56. https://doi.org/10.1002/itl2.56

Banco Interamericano de Desarrollo [BID] (2020). La educación superior en tiempos de Covid-19. Aportes de la segunda reunión del Diálogo Virtual con Rectores de Universidades Líderes de América Latina. BID. https://publications.iadb.org/publications/spanish/document/La-educacion-superior-en-tiempos-de-COVID-19-Aportes-de-la-Segunda-Reunion-del-Di%C3%A1logo-Virtual-con-Rectores-de-Universidades-Lideres-de-America-Latina.pdf

Bird, C. (1927). The detection of cheating in objective examinations. School and Society, 25(635), 261–262.

Bird, C. (1929). An improved method of detecting cheating in objective examinations. The Journal of Educational Research, 19(5), 341–348.

Bellezza, F. S., & Bellezza, S. F. (1989). Detection of cheating on multiple-choice tests by using error similarity analysis. Teaching of Psychology, 16(3), 151–155.

Belov, D. I. & Armstrong, R. D. (2010). Automatic Detection of Answer Copying via Kullback-Leibler Divergence and K-Index. Applied Psychological Measurement, 34(6) 379–392. https://doi.org/10.1177/0146621610370453

Belov, D. I. (2011). Detection of answer copying based on the structure of a high-stakes test. Applied Psychological Measurement, 35(7), 495–517.

Belov, D. I. (2015). Comparing the performance of eight item preknowledge detection statistics. Applied Psychological Measurement, 40(2), 83-97. https://doi.org/10.1177/0146621615603327

Brimble, M. (2016). Why students cheat: An exploration of the motivators of student academic dishonesty in Higher Education. En T. Bretag (Ed.), Handbook of academic integrity (pp. 365-382). Springer-Nature: Springer Science-Business Media Singapore.

Cizek, G. J. (2012). Ensuring the integrity of test scores: Shared responsibilities. Annual Meeting of the American Educational Research Association, Vancouver, British Columbia.

Cizek, G. J. & Wollack, J. A. (2017). Exploring cheating on tests. En G. J. Cizek y J. A. Wollack (Eds). Handbook of quantitative methods for detecting cheating on tests (pp. 3-19). New York: Routledge.

Conferencia de Rectores de Universidades Españolas [CRUE] (2020). Informe sobre el impacto normativo de los procedimientos de evaluación online: protección de datos y garantía de los derechos de los y las estudiantes. https://www.usal.es/files/informe_procedimientos_evaluacion_no_presencial_crue_16-04-2020.pdf.pdf

Chirumamilla, A., Sindre, G. & Nguyen-Duc, A. (2020): Cheating in e-exams and paper exams: the perceptions of engineering students and teachers in Norway. Assessment & Evaluation in Higher Education. https://doi.org/10.1080/02602938.2020.1719975.

de la Torre, J., & Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45(2), 159–177.

Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86.

Doyoung, K., Woo, A. & Dickison, P. (2017). Identifying and investigating aberrant responses using psychometrics-based and machine learning based approaches. En G. J. Cizek y J. A. Wollack. Handbook of quantitative methods for detecting cheating on tests (pp. 70-98). New York: Routledge.

Eckerly, C. A. (2017) Detecting preknowledge and item compromise. En G. J. Cizek & J. A. Wollack (Eds.) Handbook of quantitative methods for detecting cheating on tests (pp. 214-231). New York: Routledge.

Eckerly, C. A., Babcock, B., & Wollack, J. A. (2015) Preknowledge detection using a scale-purified deterministic gated IRT model. Annual meeting of the National Conference on Measurement in Education, Chicago, IL.

Guo, J., & Drasgow, F. (2010). Identifying cheating on Unproctored Internet Tests: the Z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18(4), 351-364.

James, R. (2016). Tertiary student attitudes to invigilated, online summative examinations. International Journal of Educational Technology in Higher Education, 13-19. https://doi.org/10.1186/s41239-016-0015-0.

Friedman, A., Blau, I., & Eshet-Alkalai, Y. (2016). Cheating and feeling honest: Committing and punishing analog versus digital academic dishonesty behaviors in higher education. Interdisciplinary Journal of e-Skills and Life Long Learning, 12, 193-205. http://www.informingscience.org/Publications/3629

Hernandez-Ortega, J., Daza, R., Morales, A., Fierrez, J., & Ortega-Garcia, J. (2019). edBB: Biometrics and behavior for assessing remote education. https://arxiv.org/pdf/1912.04786.pdf

Karabastos, G. (2003). Comparing the Aberrant Response Detection Performance of Thirty-Six Person-Fit Statistics. Applied Measurement in Education, 16 (4), 277-298. https://doi.org/0.1207/S15324818AME1604_2

Kasli, M., Zopluoglu, C. & Toton, S. (2020). A deterministic gated lognormal response time model to identify examinees with item preknowledge. PsyArXiv, 9. https://doi.org/10.31234/osf.io/bqa3t

Haney, W. M., & Clarke, M. J. (2007). Cheating on tests: Prevalence, detection, and implications for online testing. In Psychology of academic cheating (pp. 255-287). Academic Press. https://doi.org/10.1016/B978-012372541-7/50015-2

Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18, 133–146.

Man, K., Harring, J. R. & Sinharay, S. (2019). Use of data mining methods to detect test fraud. Journal of Educational Measurement, 56 (2), 251-279.

Maynes, D. D. (2014). Detection of non-independent test taking by similarity analysis. In N. M. Kingston and A. K. Clark (Eds.) Test fraud: Statistical detection and methodology. Routledge: New York, NY, pp. 53–82.

Maynes, D.D. (2017). Detecting potential collusion among individual examinees using similarity analysis. En G. A. Cizek y J. A. Wollack (Eds.) Handbook of quantitative methods for detecting cheating on tests (pp. 47-69). New York: Routledge.

Muñiz, J. & Fonseca-Pedrero, E. (2019). Diez pasos para la construcción de un test. Psicothema, 31 (1), 7-16. https://doi.org/10.7334/psicothema2018.291

Noguera I., Guerrero-Roldán A.E., Rodríguez M.E. (2017) Assuring authorship and authentication across the e-assessment process. En: D. Joosten-ten Brinke, M. Laanpere (Eds.) Technology Enhanced Assessment. TEA 2016. Communications in Computer and Information Science (pp.86-92), vol 653. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-57744-9_8

Ortega Torres, L. & Chávez Álvarez, C. A. (2020). Eliminación del tercer distractor de ítems de opción múltiple en exámenes a gran escala. Revista de Educación, 388, 133-165. https://doi.org/10.4438/1988-592X-RE-2020-388-450

Prieto, G., & Delgado, A. R. (1996). Construcción de ítems. En J. Muñiz (Coord.). Psicometría (pp.105-138). Madrid: Universitas.

Qian, H., Staniewska, D., Reckase, M. & Woo, A. (2016). Using response time to detect item preknowledge in computerbased licensure examinations. Educational Measurement: Issues and Practice, 35 (1), 38-47.

Quality Assurance Agency for Higher Education (QAA) (2020). Assessing with Integrity in Digital Delivery. Covid-19 supporting resources. https://www.qaa.ac.uk/docs/qaa/guidance/assessing-with-integrity-in-digital-delivery.pdf

Rodríguez, P. & Luzardo, M. (2014). Study the quality of items using isotone nonparametric regression in a mathematics test. International Meeting of Psychometric Society. Madison, Wisconsin, Estados Unidos.

Rodríguez Morales, P. (2017). Creación, desarrollo y resultados de la aplicación de pruebas de evaluación basadas en estándares para diagnosticar competencias en Matemática y Lectura al ingreso a la Universidad. Revista Iberoamericana de Evaluación Educativa, 10 (1), 89 – 107. https://doi.org/10.15366/riee2017.10.1.005

Rodríguez Morales, P. (2020). Evaluación de Aprendizajes a Distancia. Desafíos y dificultades. Seminario de Desafíos de la evaluación de los procesos de aprendizaje y proyección de los nuevos escenarios de enseñanza en la Universidad. https://www.cse.udelar.edu.uy/blog/2020/05/21/seminario-virtual-sobre-los-desafios-de-la-evaluacion-y-los-nuevos-escenarios-de-la-ensenanza/

Sanzvelasco, S., Luzardo, M., García, C. & Abad. F., (2020). Comparing statistics to detect cheating on recruitment contexts: an application for small items’ banks. Psicothema, 32 (4), 549-558. https://doi.org/10.7334/psicothema2020.86.

Saretsky, G. D. (1984). The treatment of scores of questionable validity: The origins and development of the ETS Board of Review (ETS Occasional Paper). Princeton, NJ: Educational Testing Service. http://files.eric.ed.gov/fulltext/ED254538.pdf.

Sinharay, S. & Johnson, M. S. (2019). The use of item scores and response times to detect examinees who may have beneted from item preknowledge. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12187

Shu, Z., Leucht, R., & Henson, R. (2013). Using deterministic, gated item response theory model to detect test cheating due to item compromise. Psychometrika, 78, 481–497.

Sijtsma, K., & Meijer, R. R. (1992). A method for investigating the intersection of item response functions in Mokken’s non-parametric IRT model. Applied Psychological Measurement, 16(2), 149–157.

Sindre, G., & A. Vegendla. 2015. E-Exams versus Paper Exams: A Comparative Analysis of Cheating Related Security Threats and Countermeasures. Norwegian Information Security Conference (NISK) 8 (1): 34 - 45.

Sotaridona, L. S., & Meijer, R. R. (2002). Statistical properties of the K-index for detecting answer copying. Journal of Educational Measurement, 39(2), 115–132.

Sotaridona, L. S., & Meijer, R. R. (2003). Two new statistics to detect answer copying. Journal of Educational Measurement, 40(1), 53–69.

Sureda Negre, J., Comas Forgas, R. y Gili Planas, M. (2009). Prácticas académicas deshonestas en el desarrollo de exámenes entre el alumnado universitario español. Estudios sobre Educación, 17, 103-122.

Sutherland-Smith, W. (2016). Authorship, ownership, and plagiarism in the Digital Age. In T. Bretag (Ed.), Handbook of academic integrity (pp. 575-589). SpringerNature: Springer Science-Business Media Singapore.

Trabin, T. E., & Weiss, D. J. (1983). The person response curve: Fit of individuals to item response theory models. En D. J. Weiss (Ed.) New horizons in testing (pp. 83–108). New York, NY: Academic Press.

Unidad de Apoyo a la Enseñanza (UAE) (2020). Pautas para la evaluación a distancia. Maldonado: CURE. https://www.cse.udelar.edu.uy/recursos/wp-content/uploads/sites/16/2020/05/EVALUACION-EN-LINEA-final_UAECURE.pdf

Universidad de la República/Comisión Sectorial de Enseñanza [Udelar/CSE] (2020a). Udelar en línea. Orientaciones básicas para el desarrollo de la enseñanza y la evaluación. Montevideo: Comisión Sectorial de Enseñanza. https://www.cse.udelar.edu.uy/wp-content/uploads/2020/04/UdelarEnLinea-OrientacionesBasicas.pdf

Universidad de la República/Comisión Sectorial de Enseñanza [Udelar/CSE] (2020b). Enseñanza en línea. Orientaciones para la aplicación de pruebas objetivas masivas en línea. Montevideo: Comisión Sectorial de Enseñanza. https://www.cse.udelar.edu.uy/wp-content/uploads/2020/07/PautasEvaluacionEnLinea-v2.pdf

Universidad de la República [Udelar] (2020). Propuesta al país 2020-2024. Plan estratégico de desarrollo de la Universidad de la República. https://udelar.edu.uy/portal/wp-content/uploads/sites/48/2020/09/Presupuesto_2020-2024.pdf

van der Flier, H. (1980). Vergelijkbaarheid van individuele testprestaties [Comparability of individual test performance]. Lisse: Swets & Zeitlinger.

van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181-204.

van der Linden, W. J., & Sotaridona L. S. (2006). Detecting answer copying when the regular response process follows a known response model. Journal of Educational and Behavioral Statistics, 31, 283–304.

van der Linden, W. J., Klein Entink, R. H. & Fox, J.-P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5): 327-347.Yates, M. C., Godbey, J. & Fendler, R. (2017). Observed Cheating and the Effects of Random Seat Assignment. SoTL Commons Conference, GA, USA.

Whitley, B. E. (1998). Factors associated with cheating among college students: A review. Research in Higher Education, 39(3), 235–274.

Wollack, J. A. (1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21(4), 307–320.

Wollack, J. A. & Cizek, G. J. (2017). The future of quantitative methods for detecting cheating. En G. A. Cizek y J. A. Wollack (Eds.) Handbook of quantitative methods for detecting cheating on tests (pp. 390-399). New York: Routledge.

Wollack, J. A., & Fremer, J. J. (2013). Handbook of test security. New York: Routledge.

Zopluoglu, C. (2016). Classification performance of answer-copying indices under different types of irt models. Applied psychological measurement, 40(8), 592-607. https://doi.org/10.1177/0146621616664724

Zopluoglu, C. (2017) Similarity, answer copying and aberrance. En G. A. Cizek y J. A. Wollack (Eds.) Handbook of quantitative methods for detecting cheating on tests (pp. 25-46). New York: Routledge.

Zopluoglu, C. (2019a). Computation of the Response Similarity Index M4 in R under the Dichotomous and Nominal Item Response Models. International Journal of Assessment Tools in Education, 6 (5), 1–19. https://doi.org/10.21449/ijate.527299

Zopluoglu, C. (2019b). Detecting examinees with item preknowledge in large-scale testing using extreme gradient boosting (xgboost). Educational and Psychological Measurement, 79(5):931-961.

Publicado
2020-11-14
Cómo citar
Rodríguez Morales, P., & Luzardo Verde, M. (2020). Cómo asegurar evaluaciones válidas y detectar falseamiento en pruebas a distancia síncronas. Revista Digital De Investigación En Docencia Universitaria, 14(2), e1240. https://doi.org/10.19083/ridu.2020.1240