Main page Tropes, Semantic Text Analysis - Online Reference Manual 
  info@semantic-knowledge.com 
Home | News | Reference | Support | Download | Buy | About 

CHAPTER 4

Introduction to text analysis (Part III)


Analysis of heterogeneous utterances: open questions, dispatches, enumerations, etc.

The following notes concern all heterogeneous corpora, obtained by collecting within the same file utterances coming from numerous individuals, and without linear coherence (i.e. it is not the discourse of a single narrator, or an interaction between several interlocutors respecting a logical sequence or a strict chronological order).

Since the propositional hashing made by Tropes is based entirely on grammatical rules, you must use a non-ambiguous punctuation mark (question or exclamation mark) to force the software to separate the different utterances (for example, you can add an exclamation mark at the end of every answer to an Open question (Market Research), in order to separate it from the next one).

If the corpus includes answers to several questions, you will have to use Borders to group the answers together and/or to separate the answers from the questions. You will then analyze each answer separately.

If you have an indicator enabling you to form your corpus according to an external variable (geographic area, type of population, period, etc.), you may use it to split up the utterances into several files (each file containing, for example, the utterances corresponding to only one variable), that you can analyze separately in order to compare them. You can also code these variables inside the texts and then use them as Borders.

If the corpus has no linear coherence (for example, when the utterances contained in each file have been compiled at random, without following a particular logic), the results depending on the chronological analyses of Tropes (Most characteristic parts of text, Bundles, Episodes, Distribution graphs) will not be significant; do not try to interpret these results.


The processing of substantial textual corpora

Designed to extract information from substantial press corpora (Text Mining), this method can be used for Business Intelligence purposes, Press Reviews, Historical and/or Sociological Studies, and/or to generate the keywords of a thesaurus. To apply it, you have to use Tropes Zoom.

Suggested method:

  1. Store in a folder as many documents as possible, all relating to the same theme (retrieve information on the Internet, capture websites with the Robot, subscribe to data communication services, export the contents of a CD-ROM, etc.)
    If possible, split up these documents into sub-folders indicating the date and the origin of the files (for example, "C:\My documents\Newspapers\Herald Tribune\2005").
  2. Index the folder structure with Zoom search engine.
  3. Search something with Zoom using a sufficiently extensive criterion (in order to guarantee exhaustive results), then use the [Select all texts] command and start Tropes text analysis (with Zoom toolbar or menus).
  4. Analyze the results with Tropes. Identify the problems in the data obtained.
  5. Customize a Scenario with Tropes, removing errors (if necessary) and including the most representative themes (adding trademarks and proper names) of what you intend to analyze.
  6. If necessary, re-index your folder with Zoom using your customized Scenario, go back to step 3 and restart the analysis process until you find the texts offering the best explanatory value.
  7. Extract or print the most interesting documents, build a report with the Report Writer and/or publish results with Tropes Web Module.

Analysis of discourses and conversations

When analyzing a text file containing the transcript of the discourse of several individuals, start with an overall analysis of the corpus, then use Borders to process the utterances of the various characters separately, and compare the results obtained. When comparing the results, ask yourself the following questions: have all the participants been talking about the same thing? Did they use the same Actants? If not, then why not? Has anyone refused to reply to certain questions? Has anyone been trying to convince another participant? Why? Have they succeeded? Etc.

If you have time, you can solve the anaphoras manually, i.e. replace each personal pronoun by who it refers to. Let us imagine, for example, that you wish to analyze the discourse of two characters - Peter and Paul - who have been talking about three other persons - Alan, Mary and Jane -, and that the text contains many personal pronouns ("I", "you", "he", "she", etc.) Use the [find/replace] command of your word processor to replace some of these pronouns as follows: "I" and "You" by "Peter" or "Paul", "She" by "Mary" or "Jane", "He" by Alan, etc. You will thus be able to count very precisely how many times this or that person has been mentioned, to know whether they are Actants or Acted, etc.

When you make transcripts from the spoken form, it is necessary that you include punctuation in the text, otherwise the software will not be able to carry out the propositional hashing properly and the processing of the analysis will be altered.

TropesOntology GEXF exports are specifically designed for discourse analysis of several documents.

Literary studies

To analyze a play, use the above method (analyzing the utterances of multiple actors is almost equivalent to analyzing a conversation between different interlocutors).

When studying long texts, such as an entire book, first analyze each chapter separately, then you can make a synthesis by processing the whole text (see Reflections on the size of the texts above).


Comparing two texts

Comparing two texts comes down to making an analysis both of the contents (i.e. of the Equivalent classes) and of the Setting (i.e. of the Word categories).

For example, you can compare:



First page Previous Next Last page

Copyright Acetic and Semantic Knowledge, all rights reserved
www.semantic-knowledge.com