With the continuous progress and development of science and technology, the academic community attaches more and more importance to academic integrity. Among them, as an important measure of academic integrity, plagiarism check is widely used in the review and publication of graduate students. As one of the well-known academic resource databases and plagiarism checking systems in China, CNKI has high accuracy and reliability in the first plagiarism check. However, we found that there are certain difficulties and deficiencies in the CNKI** plagiarism checking system for identifying the content of the appendix.
As an important part of graduate students' graduation, appendices often contain detailed supplementary materials such as research data, charts, experimental methods, and algorithms. However, CNKI's plagiarism checking system is unable to accurately identify and identify the content of the appendix. This is mainly due to the fact that the CNKI duplicate checking system mainly considers the main part of the text when processing the text, and the processing of non-subject content such as appendices is not precise enough. As a result, the appendices are often ignored or cannot be fully matched and identified during the plagiarism check, which affects the accuracy and credibility of the plagiarism check results.
The main causes of this problem are as follows. First of all, the CNKI duplicate checking system did not design special processing algorithms and rules for the appendix content when it was designed. Secondly, the content of the appendix is often of a research nature, and the presentation forms are diverse and complex, which brings certain difficulties to the identification and matching of the system. Thirdly, the CNKI system mainly relies on the text similarity algorithm to check for duplication, and the processing of non-text content is not perfect enough.
In view of the above problems, we put forward some improvement strategies to improve the ability of CNKI's plagiarism checking system to identify the content of the appendix. First of all, we suggest that CNKI's plagiarism checking system should be designed with a specific appendix processing module to accurately identify and match appendices by extracting the key information of appendices. Secondly, the system can introduce technologies such as image recognition and semantic analysis to process and identify the non-text data in the appendix to improve the accuracy of the plagiarism check results. In addition, we also suggest that during the use of CNKI** plagiarism checking system, researchers should label and annotate the content of the appendix to help the system correctly identify and process the content of the appendix.
To sum up, the inability of CNKI's duplicate checking system to accurately identify the content of the appendix is a problem. In order to solve this problem, we put forward some improvement strategies, hoping to improve the system's ability to identify the content of the appendix and the accuracy of the plagiarism check results. In the future, we can further research and improve to improve the overall performance and reliability of the plagiarism checking system, and provide better academic integrity for the academic community.