Main page Tropes, Semantic Text Analysis - Online Reference Manual 
Home | News | Reference | Support | Download | Buy | About 



Corpus segmentation

Tropes is equipped with a tool called “Border”, designed to automatically segment a corpus. Within the same text, Borders can be used to automatically separate multiple actors, an interviewer and an interviewee, populations, chapters of a book, etc.

The use of Borders may require a preliminary coding of the documents.

A Border starts from where it has been located in the text and ends where the next Border has been located. For example, if “ALGERNON_” and “LANE_” are Borders, the sequence below delimits six parts of a text:



ALGERNON_.  And, speaking of the science of Life, have you got the cucumber sandwiches cut for Lady Bracknell?


LANE_.  Yes, sir.  [Hands them on a salver.]


ALGERNON_.  [Inspects them, takes two, and sits down on the sofa.] Oh! . . . by the way, Lane, I see from your book that on Thursday night, when Lord Shoreman and Mr. Worthing were dining with me, eight bottles of champagne are entered as having been consumed.


LANE_.  Yes, sir; eight bottles and a pint.


ALGERNON_.  Why is it that at a bachelor's establishment the servants invariably drink the champagne?  I ask merely for information.


LANE_.  I attribute it to the superior quality of the wine, sir.  I have often observed that in married households the champagne is rarely of a first-rate brand.



These four parts correspond to a conversation between two characters, Lane and Algernon (see the Oscar Wilde.txt file in the example texts supplied with the software). If you wish to automatically separate the discourse of Algernon (parts 1, 3 and 5) from that of Lane (parts 2, 4 and 6), then you have to use Borders.

Creating Borders

To create a Borders file, use the [Tools][Borders] command:

To create a new entry in the Borders, write a word in the upper field, then press the [Add] button. This word has to be representative of a sequence of the text analyzed. In the example shown above, the codes “Start_of_gut_header” and “End_of_gut_header” have been used to identify the descriptive parts and the introduction of the play, while the various speech turns of the characters have been coded by putting the code “_” after the names of the characters.

To delete an existing Border, select it and press [Delete].

Once you have created your Borders, you can choose which parts of the text you wish to ignore by checking the related Borders: when a Border is checked, then all the following text will be ignored; if not, the rest of the text will remain visible.

To disable all Borders, use the [Include all] button. To exclude all the text taken into account by the Borders, use the [Exclude all] button. This function enables you, for instance, to check whether or not the whole text is correctly delimited.

Once you have finished, use the [Apply] button to restart the analysis of the text. This time, the analysis will ignore all the parts of the text you have chosen to remove (when using Borders for the first time, the software will ask you to give a file name in order to save them).

When the [Show Borders in the text] option is checked, Tropes will count all Borders as words of the text and display them. Otherwise, if this option is not checked, Tropes will not display the related codes, which will not be counted as words.

Note: if your Borders are compounds, unrecognized by the software, you must link together the words forming these compounds with the underline character (for example “Miss_Prism”).

Borders files

Use the [File] menu to:

Borders are automatically saved when using the [File][Save] menu or pressing the [Apply] button. To quit this tool without saving your modifications, click on [Cancel].

First page Previous Next Last page

Copyright Acetic and Semantic Knowledge, all rights reserved