AUTOMATION OF THE SEARCH OF UNCLEAR DUPLICATE TEXT ELECTRONIC DOCUMENTS
Keywords:
automation, software, algorithm, shingles, duplicates, comparison
Abstract
The process of searching for duplicate text documents in the Ukrainian language is automated. The existing
approaches to the determination of duplicate text documents are analyzed. The software implementation of the main
indicators of text similarity has been carried out. The shingle algorithm and its software implementation are presented. The
algorithm for automating the search for fuzzy duplicates is implemented in software. The resulting software is tested on test
cases