AUTOMATION OF THE SEARCH OF UNCLEAR DUPLICATE TEXT ELECTRONIC DOCUMENTS

  • L. Gumeniuk
  • V. Lotysh
  • Y. Vashkurak
  • P. Humeniuk
Keywords: automation, software, algorithm, shingles, duplicates, comparison

Abstract

The process of searching for duplicate text documents in the Ukrainian language is automated. The existing
approaches to the determination of duplicate text documents are analyzed. The software implementation of the main
indicators of text similarity has been carried out. The shingle algorithm and its software implementation are presented. The
algorithm for automating the search for fuzzy duplicates is implemented in software. The resulting software is tested on test
cases

Published
2023-02-27