org.jscience.ml.tigerxml.tools
Class SyntaxGeneralizer

java.lang.Object
  extended by org.jscience.ml.tigerxml.tools.SyntaxGeneralizer

public class SyntaxGeneralizer
extends java.lang.Object

The purpose of this class is to generalize over some distinctions made in Tiger Syntax. The distinctions concern phrase type, part of speech and grammatical function.


Constructor Summary
SyntaxGeneralizer()
          Creates a SyntaxGeneralizer object with predefinned generalization settings.
SyntaxGeneralizer(java.util.HashMap type2gen_type, java.util.HashMap label2gen_label, java.util.HashMap tag2gen_tag)
          Creates a SyntaxGeneralizer object with user-definned generalization settings.
 
Method Summary
 java.util.ArrayList getDaughtersByGeneralLabel(NT node, java.lang.String gen_label)
          Returns an ArrayList of the daughter nodes with a given general edge label.
 java.util.ArrayList getDescendantsByGeneralLabel(NT node, java.lang.String gen_label)
          Returns an ArrayList of the descendant nodes with a given general edge label.
 NT getDominatingNode(GraphNode node, java.lang.String gen_cat)
          Returns the nearest dominating node that has the general category gen_cat, and is not identical with the input node itself.
protected  java.lang.String getGeneralLabel(java.lang.String label)
          DOCUMENT ME!
protected  java.lang.String getGeneralTag(java.lang.String tag)
          DOCUMENT ME!
protected  java.lang.String getGeneralType(java.lang.String type)
          DOCUMENT ME!
 java.lang.String getGrammaticalFunction(GraphNode node)
          Returns the (general) grammatical function of this node.
 java.lang.String getPhraseType(NT node)
          Returns the (general) phrase type of this node.
 java.lang.String getPos(T node)
          Returns the (general) POS tag of this terminal.
protected  boolean isCaseOf(java.lang.String item, java.lang.String general_item)
          DOCUMENT ME!
 boolean isDominatedBy(GraphNode node, java.lang.String gen_cat)
          Returns true if there is a dominating node that has the general category "cat"
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SyntaxGeneralizer

public SyntaxGeneralizer(java.util.HashMap type2gen_type,
                         java.util.HashMap label2gen_label,
                         java.util.HashMap tag2gen_tag)
Creates a SyntaxGeneralizer object with user-definned generalization settings. Each SyntaxGeneralizer is initialized with three HashMaps. The HashMaps map strings that are used to create regular expressions into strings that represent general linguisticallly relevant categories. The hash maps are used to group together concepts of Tiger syntax and thereby abstract away from their distinctions.

The first hash maps Tiger phrase type designators (like "S", "CS", "CVP" and so forth) into general phrase type labels. For example you might find it useful to group together the two Tiger phrase type tags "S" and "CS" into a single general phrase type tag called "S". Then your first hash should contain the mapping from "^(S|CS)$" to "S".

The second hash maps Tiger edge labels (like "SB", "HD", and "MO") into general edge labels.

The third hash maps Tiger part of speech tags (like "NN", "NE", and "VVFIN" into general POS tags.

Types, labels or tags that are not taken into account by the hash maps will automatically mapped onto a tag "OTHER".

See Also:
Pattern

SyntaxGeneralizer

public SyntaxGeneralizer()
Creates a SyntaxGeneralizer object with predefinned generalization settings. As for phrase type, "NP", "CNP", "NM", and "PN" are mapped onto "NP"; "PP", and "CPP" are mapped onto "PP"; "AVP", "AA", and "CAVP" are mapped onto "AVP"; "AP", "MTA", and "CAP" are mapped onto "AP"; "CVP", and "VP" are mapped onto "VP"; "S", "CS", and "DL" are mapped onto "S".

As for edge label, "NN", "NE", "NNE", "PNC", "PRF", "PDS", "PIS", "PPER", "PPOS", "PRELS", and "PWS" are mapped onto "NP"; "PROAV", and "PWAV" are mapped onto "PP" "ADJA", "PDAT", "PIAT", "PPOSAT", "PRELAT", "PWAT", and "PRELS" are mapped onto "AP"; "ADJD", and "ADV" are mapped onto "AVP";

As for POS tag, "EP", "SB", and "SP" are mapped onto "SB"; "DA" is mapped onto "DA"; "OA", and "OA2" are mapped onto "OA"; "OG" is mapped onto "OG"; "OP" is mapped onto "OP"; "NK" is mapped onto "NK"; "HD" is mapped onto "HD"; "PD", "CVC", "MO", "SBP", "AMS", and "CC" is mapped onto "MO"; "GR", "GL", "AG", "PG", and "MNR" are mapped onto "MNR"; "RC" is mapped onto "RC"; "OC", "RE", "RS", and "DH" are mapped onto "OC"; "DA" is mapped onto "DA".

Method Detail

getGeneralType

protected java.lang.String getGeneralType(java.lang.String type)
DOCUMENT ME!

Parameters:
type - DOCUMENT ME!
Returns:
DOCUMENT ME!

getGeneralTag

protected java.lang.String getGeneralTag(java.lang.String tag)
DOCUMENT ME!

Parameters:
tag - DOCUMENT ME!
Returns:
DOCUMENT ME!

getGeneralLabel

protected java.lang.String getGeneralLabel(java.lang.String label)
DOCUMENT ME!

Parameters:
label - DOCUMENT ME!
Returns:
DOCUMENT ME!

isCaseOf

protected boolean isCaseOf(java.lang.String item,
                           java.lang.String general_item)
DOCUMENT ME!

Parameters:
item - DOCUMENT ME!
general_item - DOCUMENT ME!
Returns:
DOCUMENT ME!

getDaughtersByGeneralLabel

public java.util.ArrayList getDaughtersByGeneralLabel(NT node,
                                                      java.lang.String gen_label)
Returns an ArrayList of the daughter nodes with a given general edge label. The list is ordered left to right according to word order.

Parameters:
node - DOCUMENT ME!
gen_label - DOCUMENT ME!
Returns:
DOCUMENT ME!

getDescendantsByGeneralLabel

public java.util.ArrayList getDescendantsByGeneralLabel(NT node,
                                                        java.lang.String gen_label)
Returns an ArrayList of the descendant nodes with a given general edge label. The descendants are ordered breadth first then left to right.

Parameters:
node - DOCUMENT ME!
gen_label - DOCUMENT ME!
Returns:
DOCUMENT ME!

isDominatedBy

public boolean isDominatedBy(GraphNode node,
                             java.lang.String gen_cat)
Returns true if there is a dominating node that has the general category "cat"

Parameters:
node - DOCUMENT ME!
gen_cat - DOCUMENT ME!
Returns:
A truth value indicating whether there is a dominating cat node.

getDominatingNode

public NT getDominatingNode(GraphNode node,
                            java.lang.String gen_cat)
Returns the nearest dominating node that has the general category gen_cat, and is not identical with the input node itself. The method returns null if there is no such node.

Parameters:
node - DOCUMENT ME!
gen_cat - DOCUMENT ME!
Returns:
The nearest node with "cat" that dominates this node.

getGrammaticalFunction

public java.lang.String getGrammaticalFunction(GraphNode node)
Returns the (general) grammatical function of this node.

Parameters:
node - DOCUMENT ME!
Returns:
DOCUMENT ME!

getPhraseType

public java.lang.String getPhraseType(NT node)
Returns the (general) phrase type of this node.

Parameters:
node - DOCUMENT ME!
Returns:
DOCUMENT ME!

getPos

public java.lang.String getPos(T node)
Returns the (general) POS tag of this terminal.

Parameters:
node - DOCUMENT ME!
Returns:
DOCUMENT ME!