<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="article">
<front>
    <journal-meta>
        <journal-id journal-id-type="publisher-id">INFEDU</journal-id>
        <journal-title-group>
            <journal-title>Informatics in Education</journal-title>
        </journal-title-group>
        <issn pub-type="epub">1648-5831</issn>
        <issn pub-type="ppub">1648-5831</issn>
        <publisher>
            <publisher-name>VU</publisher-name>
        </publisher>
    </journal-meta>
    <article-meta>
                <article-id pub-id-type="publisher-id">INFEDU.2019.15</article-id>
                        <article-id pub-id-type="doi">10.15388/infedu.2019.15</article-id>
                        <article-categories>
            <subj-group subj-group-type="heading">
                <subject>Article</subject>
            </subj-group>
        </article-categories>
                        <title-group>
            <article-title>Source Code Plagiarism Detection in Academia with Information Retrieval: Dataset and the Observation</article-title>
        </title-group>
                        <contrib-group>
                                        <contrib contrib-type="author">
                                                <name>
                    <surname>KARNALIM</surname>
                    <given-names>Oscar</given-names>
                </name>
                                <email xlink:href="mailto:oscar.karnalim@it.maranatha.edu">oscar.karnalim@it.maranatha.edu</email>
                                                <xref ref-type="aff" rid="j_INFEDU_aff_000"/>
                                            </contrib>
                        <aff id="j_INFEDU_aff_000">Faculty of Information Technology, Maranatha Christian University, Bandung, Indonesia</aff>
                                                    <contrib contrib-type="author">
                                                <name>
                    <surname>BUDI</surname>
                    <given-names>Setia</given-names>
                </name>
                                <email xlink:href="mailto:setia.budi@it.maranatha.edu">setia.budi@it.maranatha.edu</email>
                                                <xref ref-type="aff" rid="j_INFEDU_aff_001"/>
                                            </contrib>
                        <aff id="j_INFEDU_aff_001">Faculty of Information Technology, Maranatha Christian University, Bandung, Indonesia</aff>
                                                    <contrib contrib-type="author">
                                                <name>
                    <surname>TOBA</surname>
                    <given-names>Hapnes</given-names>
                </name>
                                <email xlink:href="mailto:hapnestoba@it.maranatha.edu">hapnestoba@it.maranatha.edu</email>
                                                <xref ref-type="aff" rid="j_INFEDU_aff_002"/>
                                            </contrib>
                        <aff id="j_INFEDU_aff_002">Faculty of Information Technology, Maranatha Christian University, Bandung, Indonesia</aff>
                                                    <contrib contrib-type="author">
                                                <name>
                    <surname>JOY</surname>
                    <given-names>Mike</given-names>
                </name>
                                <email xlink:href="mailto:m.s.joy@warwick.ac.uk">m.s.joy@warwick.ac.uk</email>
                                                <xref ref-type="aff" rid="j_INFEDU_aff_003"/>
                                            </contrib>
                        <aff id="j_INFEDU_aff_003">Department of Computer Science, University of Warwick, Coventry, United Kingdom</aff>
                                </contrib-group>
                                                                                                                                                                <volume>18</volume>
                                <issue>2</issue>
                                    <fpage>321</fpage>
                        <lpage>344</lpage>
                                <pub-date pub-type="epub">
                        <day>16</day>
                                    <month>10</month>
                        <year>2019</year>
        </pub-date>
                                        <abstract>
                        <p>Source code plagiarism is an emerging issue in computer science education. As a result, a number of techniques have been proposed to handle this issue. However, comparing these techniques may be challenging, since they are evaluated with their own private dataset(s). This paper contributes in providing a public dataset for comparing these techniques. Specifically, the dataset is designed for evaluation with an Information Retrieval (IR) perspective. The dataset consists of 467 source code files, covering seven introductory programming assessment tasks. Unique to this dataset, both intention to plagiarise and advanced plagiarism attacks are considered in its construction. The dataset&#039;s characteristics were observed by comparing three IR-based detection techniques, and it is clear that most IR-based techniques are less effective than a baseline technique which relies on Running-Karp-Rabin Greedy-String-Tiling, even though some of them are far more time-efficient.</p>
                    </abstract>
                <kwd-group>
            <label>Keywords</label>
                        <kwd>source code plagiarism</kwd>
                        <kwd>dataset</kwd>
                        <kwd>programming</kwd>
                        <kwd>computer science education</kwd>
                    </kwd-group>
    </article-meta>
</front>
</article>
