Searching from a text.

There is another way of searching for a piece of information in a body of documents: you can analyze a text and use the results to find the documents that have comparable contents.

With version 6.2, Zoom offers a search function for similar texts that automates this process by using a vocabulary controlled by the Scenario. This function consists in calculating the overlap between a reference text and all the documents of a Zoom search index. At the end of the search, the software displays every document that has an overlap rate higher than a given threshold (above which they are considered similar). In other words, Zoom analyzes a specified document with the Scenario you have determined when indexing the folder, then extracts the relevant semantic groups and uses these to carry out the search.

To make this comparison, the semantic search engine uses exclusively the vocabulary included in the Scenario. The Scenario offers many advantages: to carry out the analysis, Zoom uses only the elements of interest to you (secondary content can be dismissed), and you can also control all the terms that will be used to make the comparison. If the Scenario is very specific, the software will focus on this precise content to compare the documents, dismissing every element that is deemed irrelevant (not very specific). Otherwise, if the Scenario is very general (for instance, if you use the Concept Scenario supplied with Tropes Zoom), you will obtain overall comparisons on the vocabulary contained in the documents.

To ensure that Zoom uses all the vocabulary contained in your reference text(s), you can generate a Scenario from this material with the Tropes Scenario creation wizard ([Tools][Scenario] menu, [File][New/Wizard] menu, [Automatically] option), and use this Scenario to re-index your folder.

The similarity between documents can be calculated in two ways:

1 - By shared references: this method enables you to find documents that globally contain one or several fragments of text having the same references.

2 - Exactly: this method enables you to search for documents that contain exactly the same references, by dismissing all documents that diverge too much from the specified text.

These calculations[1] meet different requirements. For example, you will use the shared references method to find a book from one of its chapters, and the "exact" comparison method to find the various versions of the same document.

Calculation parameters are modifiable via the [Similarity] tabsheet of the [Tools][Options] menu of Zoom, that also enables you to obtain the Display threshold of documents (the rate above which a text is deemed similar).

The search for similar texts has numerous concrete applications: hunting for analogies in patents, detecting plagiarism, evaluating proficiency, studying literary texts, comparative press analysis, etc.

[1] In the first case (shared references), similarity (S) is calculated by dividing the number of included semantic groups (Gi) by the number of groups of the specified text (Gt), using the formula: S = Gi / Gt. In the second case (“exact” comparison), we have the same calculation, except that the number of dismissed groups (Gd) is subtracted from the number of included groups (Gi), using the formula: S = (Gi - Gd) / Gt. Included semantic groups (Gi) belong to all the groups of the reference text.

