Main page Tropes, Semantic Text Analysis - Online Reference Manual 
  info@semantic-knowledge.com 
Home | News | Reference | Support | Download | Buy | About 

CHAPTER 1 - Text Analysis (part V)

Analysis options

Use the [Tools][Analysis option] command:

Text Analysis parameters

With this dialog, you can both act on the analysis engine of the software and change some display options.

The Class detection threshold enables you to define the significance level of the Equivalent classes:

- When this threshold is based on a minimum number of words, all the Equivalent classes whose occurrence frequency is below this threshold will be ignored (i.e. they will not be displayed).

- When this threshold is based on a pertinence factor, all the classes whose pertinence factor is below this threshold will be ignored.

The pertinence factor is calculated in ten thousands of number of words. For instance, a pertinence factor of 10 corresponds to a minimum occurrence frequency of 3 words for a 3,000-word text.

You can change the Class detection threshold if you want the software to process only the most frequent classes or, conversely, to take into account the less frequent classes.

Important note: the higher you raise the thresholds, the more information you lose. And vice versa: when you lower the thresholds, you increase the amount of information taken into account by the Equivalent classes.

The analysis options dialog also enables you to change the Construction base for the Relations (using the [Build Relations on] box), i.e. the Equivalent classes level needed to build, display and print Relations, Episodes and Bundles.

For example, if you use Scenarios, it is possible to build the Relations from the content of the current Scenario (see Chapter 2 Semantic Scenarios).

It is also possible to modify the contraction rate by using the cursor, which enables you to adjust the [Quantity of characteristic parts of text] to be displayed.

Since it is uncertain whether the software will or will not be able to detect, with the greatest accuracy, the essential propositions of a corpus, we recommend that you start with a rather low threshold (i.e. that you display many Characteristic Parts of Text), and then raise it gradually until you strike a balance between the amount of displayed propositions and the pertinence of the result.

The [Use the Scenario on all word categories] option enables you to require the software to convert all the words entered in the Scenario into substantives, or References (see related chapter).

When this box is checked, it is possible to enter (as items) and display in the Scenario words that do not belong to the substantive category. For example, you can group together adjectives according to various themes (colors, tastes, etc.) You can also use this option to force the software to consider as substantives words that are not filed under this particular category (for instance, if you analyze a text presenting two characters, “Mr. White” and “Mrs. Red”, you can enter “White” and “Red” in your Scenario so that they will be counted as References).

To validate your choice, press the [Accept] button. Otherwise, if you do not wish to modify your analysis options, press the [Cancel] button.

Caution: when you check the [Use the Scenario on all word categories] box, all words subsequently entered in the Scenario will be converted into substantives (References) at the end of the automatic analysis of the text. This has several consequences on the operation of the software:

1 - The Scenario has priority, then, over the other classifications: all the non-substantives entered (as items) in the Scenario will be removed from their original categories. For example, if you enter the item “here” in the Scenario, the corresponding adverb of place will no longer be displayed in the Modality category (in which it will nevertheless be counted).

2 - The Scenario takes precedence, then, over the syntactic analysis of the text; if an ambiguous word (simultaneously used in various grammatical categories) appears in a text, and if this word is placed in the Scenario, then all its various forms will be counted in the Scenario. For example, if you have entered the word “book” in a Scenario, later used to analyze a text containing the following sentences: “Book these goods on my account” and “We have received the books”, then the two occurrences of “book” (the verb and the common noun) will be counted in the Scenario.

3 - The Scenario has been designed to operate with substantives; if you wish to use verbs in a Scenario, enter all the conjugated forms appearing throughout your texts; similarly, if you wish to use adjectives, enter the masculine and feminine forms (the software takes care of the plural).

4 - The above observations only apply when you enter words in the Scenario; if your Scenario is built from Equivalent classes, these will pose no classification problem (the Equivalent classes have lexical and semantic ambiguity solving).


Note on Equivalent classes

In this manual, the term Equivalent classes refers equally to Reference fields and to References.

For further details about Equivalent classes, consult Chapter 4: “Introduction to text analysis”.

Note about the dictionaries:

- Since it is neither possible nor relevant to classify all of the English substantives (names, forenames and proper nouns), the software automatically generates Equivalent classes for all the words that are not referenced in the dictionary. Such generated classes are visible only in the References.

- The generated classes (“Other”, for instance) are preceded by a blue square, whereas the Equivalent classes detected by the software are preceded by a red square.

- To group generated classes together with Equivalent classes, and so create your own personal classification, you have to use a Scenario.

Printing the results

Use the [File][Print] command:

To print results, check the related boxes, then press [Print]; otherwise press [Exit].

Printing the report of the Equivalent classes and of the Scenarios discloses a utilization rate, expressed in a percentage that corresponds to the number of words contained in each class divided by the total number of words contained in the text.

Printing the report of the Relations discloses an additional item of information, not displayed in the [Relations] of the results dialog: the connection rate. This rate is obtained by dividing the number of observed Relations by the highest number of possible Relations. A connection rate of 100 % shows that one of the two terms of the Relation is always presented with the other. A connection rate that is close to zero shows that the two terms are almost never presented together.

In the printing options, various buttons enable you to change the configuration of the printer and to select the font you wish to use for the printing.

The [Color] box enables you to print in color, if you have a color printer.



First page Previous Next Last page

Copyright Acetic and Semantic Knowledge, all rights reserved
www.semantic-knowledge.com