Cheating Detection in Capture the Flag Competitions Using Two-Stage Similarity Analysis and Tiered Weighted Risk Scoring

Authors

  • Dimas Maulana Politeknik Nageri Bali
  • I Wayan Candra Winetra Politeknik Negeri Bali
  • I Nyoman Rai Widartha Kesuma Politeknik Negeri Bali

DOI:

https://doi.org/10.46880/jmika.Vol10No1.pp366-372

Keywords:

Cheating Detection, Information System Integrity Audit, Capture the Flag, Weighted Risk Scoring, Collusion

Abstract

Flag sharing, inter-team cooperation (teaming), and prohibited tools such as AI undermine how validly a Capture the Flag (CTF) event measures skills, corrupting the integrity of its scoring information system. GZCTF's only cheating signal is the dynamic flag that catches flag theft. Nothing else feeds a per-team risk profile. This study adds a detection module to the GZCTF backend built on two components. The first is Two-Stage Similarity Analysis: pairwise Longest Common Subsequence and Jaccard scores are blended into a Relative Sequence Index (RSI), after which Confidence Screening Detector (CSD) screening confirms suspicious groups. The second is a tiered Weighted Risk Scoring model that assigns 38 indicators to four evidence tiers (Hard, Strong, Behavioral, Context), caps every non-Hard tier, and gives network or identity correlations no direct score. Evaluation used a controlled simulation of ten team participations on a live instance, with ground-truth labels fixed at design time. Precision reached 1.000 with zero false positives, accuracy 0.900 and F1 0.909, and recall 0.833. The single miss came from collusion evidence attributed to only one member of a pair. An RSI threshold of 0.85 split every colluding pair from benign ones, and teams with purely network or identity correlation scored zero.

References

Balon, T., & Baggili, I. (2023). Cybercompetitions: A survey of competitions, tools, and systems to support cybersecurity education. Education and Information Technologies, 28(9), 11759–11791. https://doi.org/10.1007/s10639-022-11451-4

Burket, J., Chapman, P., Becker, T., Ganas, C., & Brumley, D. (2015). Automatic problem generation for capture-the-flag competitions. 2015 USENIX Summit on Gaming, Games, and Gamification in Security Education (3GSE 15). https://www.usenix.org/conference/3gse15/summit-program/presentation/burket

Chetwyn, R. A., & Erdődi, L. (2021). Cheat detection in cyber security capture the flag games – An automated cyber threat hunting approach. Proceedings of the 28th C&ESAR, 175–190. https://ceur-ws.org/Vol-3056/paper-11.pdf

FIRST. (2019). Common Vulnerability Scoring System version 3.1: Specification document. https://www.first.org/cvss/v3.1/specification-document

Greenberg, R. I. (2002). Fast and simple computation of all longest common subsequences. arXiv. https://arxiv.org/abs/cs/0211001

GZTimeWalker. (2024). GZCTF: The GZ::CTF project, an open source CTF platform [Computer software]. GitHub. https://github.com/GZTimeWalker/GZCTF

Langebein, J., Massing, T., Klenke, J., Striewe, M., Goedicke, M., Hanck, C., & Reckmann, N. (2023). A data mining approach for detecting collusion in unproctored online exams. Proceedings of the 16th International Conference on Educational Data Mining, 6–16. https://doi.org/10.5281/zenodo.8115649

Laperdrix, P., Bielova, N., Baudry, B., & Avoine, G. (2020). Browser fingerprinting: A survey. ACM Transactions on the Web, 14(2), 1–33. https://doi.org/10.1145/3386040

Lin, X., Araujo, F., Taylor, T., Jang, J., & Polakis, J. (2023). Fashion faux pas: Implicit stylistic fingerprints for bypassing browsers' anti-fingerprinting defenses. 2023 IEEE Symposium on Security and Privacy (SP), 987–1004. https://doi.org/10.1109/SP46215.2023.10179437

Maulana, D. (2026). GZCTF with integrated cheating detection [Computer software]. GitHub. https://github.com/dimasma0305/GZCTF

National Institute of Standards and Technology. (2012). Guide for conducting risk assessments (NIST Special Publication 800-30 Rev. 1). https://doi.org/10.6028/NIST.SP.800-30r1

Peng, L. (2024). Comparing clustering methods in group-level test collusion detection. Proceedings of the 17th International Conference on Educational Data Mining, 893–897. https://doi.org/10.5281/zenodo.12729989

Pieterse, H. (2024). Friend or foe – The impact of ChatGPT on capture the flag competitions. International Conference on Cyber Warfare and Security, 19(1), 268–276. https://doi.org/10.34190/iccws.19.1.1992

Provos, N., & Holz, T. (2007). Virtual honeypots: From botnet tracking to intrusion detection. Addison-Wesley.

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002

Spitzner, L. (2002). Honeypots: Tracking hackers. Addison-Wesley.

Švábenský, V., Čeleda, P., Vykopal, J., & Brišáková, S. (2021). Cybersecurity knowledge and skills taught in capture the flag challenges. Computers & Security, 102, 102154. https://doi.org/10.1016/j.cose.2020.102154

Travieso, G., Benatti, A., & Costa, L. da F. (2024). An analytical approach to the Jaccard similarity index. arXiv. https://arxiv.org/abs/2410.16436

Xu, Y., Cui, Y., Wang, X., Huang, M., & Luo, F. (2023). Confidence screening detector: A new method for detecting test collusion. Applied Psychological Measurement, 47(3), 237–252. https://doi.org/10.1177/01466216231165299

Downloads

Published

2026-06-24

Issue

Section

METHOMIKA: Jurnal Manajemen Informatika & Komputersisasi Akuntansi