Package org.jscience.ml.tigerxml

TIGER API 1.8 - A Java interface to the TIGER corpus - front end classes.

See:
          Description

Class Summary
Corpus Represents the corpus including all syntax trees in the TIGER annotation.
GraphNode Represents a node in the syntax tree, either a terminal node or a non-terminal node.
NT Represents a non-terminal node a syntax tree.
Path Represents a path leading through the syntax tree that connects two nodes.
Sentence Represents a sentence in a corpus.
T Represents a terminal node in a syntax tree.
 

Package org.jscience.ml.tigerxml Description

TIGER API 1.8 - A Java interface to the TIGER corpus - front end classes.

TIGER API is a library which allows Java programmers to easily access the structure of any corpus given as a TIGER-XML file. The underlying API specifies a Java object model for corpora encoded in TIGER-XML and provides methods for traversing syntax trees and accessing elements such as sentences, syntax graph nodes, and their attributes.

The provided parser processes a given TIGER-XML file, builds data structures representing the corpus and renders a Java Corpus object. This Corpus object represents the given corpus and its structures and serves as an entry point for accessing its syntax trees, its nodes and their attributes.

The library further provides a set of tools for processing the given corpus, which covers common tasks such as extracting text strings, retrieving index features of phrases and terminals, generating graphical ASCII representations of the trees, determining tree structural relations between nodes and other tasks of syntactic processing.

The API can also be used for converting TIGER-XML encoded corpora to other formats; a sample converter is included. In order to access other corpus formats than TIGER-XML the TIGERRegistry (included in TIGERSearch) can be used to convert them into the TIGER-XML format.

The API usage is easy and intuitive: no manual processing of XML is required. The TIGER API completely abstracts the corpus object model from its XML representation.

Sample Usage

import java.util.*;

import org.jscience.ml.tigerxml.*;

public class TestTigerAPI {

  public static void main(String[] args) {

    // Create a Corpus object by parsing the given xml file
    Corpus corpus = new Corpus("sample_TIGER.xml");

    // Use the corpus object to print parts of the structure
    System.out.println("Corpus.getId: " + corpus.getId());

    // All-sentences-loop
    for (int i = 0; i < corpus.getSentenceCount(); i++) {
      Sentence sent = corpus.getSentence(i);
      System.out.println("Sentence ID: " + sent.getId());
      System.out.println("NonTerminals: ");

      // All-NTs-loop
      for (int j = 0; j < sent.getNTCount(); j++) {
        NT nt = sent.getNT(j);
        System.out.println("NT ID: " + nt.getId());
        System.out.println("   CAT: " + nt.getCat());
        System.out.println("   MOTHER: " + nt.getMother());
        System.out.println("   Edge2Mother: " + nt.getEdge2Mother());
      } // for j
    } // for i
  } // main
} // class