Cheating Detection in Capture the Flag Competitions Using Two-Stage Similarity Analysis and Tiered Weighted Risk Scoring
DOI:
https://doi.org/10.46880/jmika.Vol10No1.pp366-372Keywords:
Cheating Detection, Information System Integrity Audit, Capture the Flag, Weighted Risk Scoring, CollusionAbstract
Flag sharing, inter-team cooperation (teaming), and prohibited tools such as AI undermine how validly a Capture the Flag (CTF) event measures skills, corrupting the integrity of its scoring information system. GZCTF's only cheating signal is the dynamic flag that catches flag theft. Nothing else feeds a per-team risk profile. This study adds a detection module to the GZCTF backend built on two components. The first is Two-Stage Similarity Analysis: pairwise Longest Common Subsequence and Jaccard scores are blended into a Relative Sequence Index (RSI), after which Confidence Screening Detector (CSD) screening confirms suspicious groups. The second is a tiered Weighted Risk Scoring model that assigns 38 indicators to four evidence tiers (Hard, Strong, Behavioral, Context), caps every non-Hard tier, and gives network or identity correlations no direct score. Evaluation used a controlled simulation of ten team participations on a live instance, with ground-truth labels fixed at design time. Precision reached 1.000 with zero false positives, accuracy 0.900 and F1 0.909, and recall 0.833. The single miss came from collusion evidence attributed to only one member of a pair. An RSI threshold of 0.85 split every colluding pair from benign ones, and teams with purely network or identity correlation scored zero.References
Balon, T., & Baggili, I. (2023). Cybercompetitions: A survey of competitions, tools, and systems to support cybersecurity education. Education and Information Technologies, 28(9), 11759–11791. https://doi.org/10.1007/s10639-022-11451-4
Burket, J., Chapman, P., Becker, T., Ganas, C., & Brumley, D. (2015). Automatic problem generation for capture-the-flag competitions. 2015 USENIX Summit on Gaming, Games, and Gamification in Security Education (3GSE 15). https://www.usenix.org/conference/3gse15/summit-program/presentation/burket
Chetwyn, R. A., & Erdődi, L. (2021). Cheat detection in cyber security capture the flag games – An automated cyber threat hunting approach. Proceedings of the 28th C&ESAR, 175–190. https://ceur-ws.org/Vol-3056/paper-11.pdf
FIRST. (2019). Common Vulnerability Scoring System version 3.1: Specification document. https://www.first.org/cvss/v3.1/specification-document
Greenberg, R. I. (2002). Fast and simple computation of all longest common subsequences. arXiv. https://arxiv.org/abs/cs/0211001
GZTimeWalker. (2024). GZCTF: The GZ::CTF project, an open source CTF platform [Computer software]. GitHub. https://github.com/GZTimeWalker/GZCTF
Langebein, J., Massing, T., Klenke, J., Striewe, M., Goedicke, M., Hanck, C., & Reckmann, N. (2023). A data mining approach for detecting collusion in unproctored online exams. Proceedings of the 16th International Conference on Educational Data Mining, 6–16. https://doi.org/10.5281/zenodo.8115649
Laperdrix, P., Bielova, N., Baudry, B., & Avoine, G. (2020). Browser fingerprinting: A survey. ACM Transactions on the Web, 14(2), 1–33. https://doi.org/10.1145/3386040
Lin, X., Araujo, F., Taylor, T., Jang, J., & Polakis, J. (2023). Fashion faux pas: Implicit stylistic fingerprints for bypassing browsers' anti-fingerprinting defenses. 2023 IEEE Symposium on Security and Privacy (SP), 987–1004. https://doi.org/10.1109/SP46215.2023.10179437
Maulana, D. (2026). GZCTF with integrated cheating detection [Computer software]. GitHub. https://github.com/dimasma0305/GZCTF
National Institute of Standards and Technology. (2012). Guide for conducting risk assessments (NIST Special Publication 800-30 Rev. 1). https://doi.org/10.6028/NIST.SP.800-30r1
Peng, L. (2024). Comparing clustering methods in group-level test collusion detection. Proceedings of the 17th International Conference on Educational Data Mining, 893–897. https://doi.org/10.5281/zenodo.12729989
Pieterse, H. (2024). Friend or foe – The impact of ChatGPT on capture the flag competitions. International Conference on Cyber Warfare and Security, 19(1), 268–276. https://doi.org/10.34190/iccws.19.1.1992
Provos, N., & Holz, T. (2007). Virtual honeypots: From botnet tracking to intrusion detection. Addison-Wesley.
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Spitzner, L. (2002). Honeypots: Tracking hackers. Addison-Wesley.
Švábenský, V., Čeleda, P., Vykopal, J., & Brišáková, S. (2021). Cybersecurity knowledge and skills taught in capture the flag challenges. Computers & Security, 102, 102154. https://doi.org/10.1016/j.cose.2020.102154
Travieso, G., Benatti, A., & Costa, L. da F. (2024). An analytical approach to the Jaccard similarity index. arXiv. https://arxiv.org/abs/2410.16436
Xu, Y., Cui, Y., Wang, X., Huang, M., & Luo, F. (2023). Confidence screening detector: A new method for detecting test collusion. Applied Psychological Measurement, 47(3), 237–252. https://doi.org/10.1177/01466216231165299
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Dimas Maulana

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.










