org.jscience.computing.automaton
Class Automaton

java.lang.Object
  extended by org.jscience.computing.automaton.Automaton
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable

public class Automaton
extends java.lang.Object
implements java.io.Serializable, java.lang.Cloneable

Finite-state automaton with regular expression operations.

Automata are represented using State and Transition objects. Implicitly, all states and transitions of an automaton are reachable from its initial state. If the states or transitions are manipulated manually, the restoreInvariant() and setDeterministic(boolean) methods should be used afterwards to restore certain representation invariants that are assumed by the built-in automata operations.

See Also:
Serialized Form

Field Summary
static int MINIMIZE_BRZOZOWSKI
          Minimize using Brzozowski's O(2n) algorithm.
static int MINIMIZE_HOPCROFT
          Minimize using Hopcroft's O(n log n) algorithm.
static int MINIMIZE_HUFFMAN
          Minimize using Huffman's O(n2) algorithm.
 
Constructor Summary
Automaton()
          Constructs new automaton that accepts the empty language.
 
Method Summary
 void addEpsilons(java.util.Collection<StatePair> pairs)
          Adds epsilon transitions to this automaton.
 java.lang.Object clone()
          Returns a clone of this automaton.
 Automaton complement()
          Returns new (deterministic) automaton that accepts the complement of the language of this automaton.
 Automaton compress(java.lang.String set, char c)
          Returns a new automaton that accepts the compressed language of this automaton.
 Automaton concatenate(Automaton a)
          Returns new automaton that accepts the concatenation of the languages of this and the given automaton.
static Automaton concatenate(java.util.List<Automaton> l)
          Returns new automaton that accepts the concatenation of the languages of the given automata.
 void determinize()
          Determinizes this automaton.
 boolean equals(java.lang.Object obj)
          Returns true if the language of this automaton is equal to the language of the given automaton.
 java.util.Set<State> getAcceptStates()
          Returns the set of reachable accept states.
 java.lang.String getCommonPrefix()
          Returns the longest string that is a prefix of all accepted strings and visits each state at most once.
 java.util.Set<java.lang.String> getFiniteStrings()
          Returns set of accepted strings, assuming this automaton has a finite language.
 java.util.Set<java.lang.String> getFiniteStrings(int limit)
          Returns set of accepted strings, assuming that at most limit strings are accepted.
 java.lang.Object getInfo()
          Returns extra information associated with this automaton.
 State getInitialState()
          Gets initial state.
 java.util.Set<State> getLiveStates()
          Returns set of live states.
 int getNumberOfStates()
          Returns number of states in this automaton.
 int getNumberOfTransitions()
          Returns number of transitions in this automaton.
 java.lang.String getShortestExample(boolean accepted)
          Returns a shortest accepted/rejected string.
 java.lang.String getSingleton()
          Returns the singleton string for this automaton.
 java.util.Set<State> getStates()
          Returns the set states that are reachable from the initial state.
 int hashCode()
          Returns hash code for this automaton.
 Automaton homomorph(char[] source, char[] dest)
          Returns new automaton accepting the homomorphic image of this automaton using the given function.
 Automaton intersection(Automaton a)
          Returns new (deterministic) automaton that accepts the intersection of the languages of this and the given automaton.
 boolean isDeterministic()
          Returns deterministic flag for this automaton.
 boolean isEmpty()
          Returns true if this automaton accepts no strings.
 boolean isEmptyString()
          Returns true if this automaton accepts the empty string and nothing else.
 boolean isFinite()
          Returns true if the language of this automaton is finite.
 boolean isTotal()
          Returns true if this automaton accepts all strings.
static Automaton load(java.io.InputStream stream)
          Retrieves a serialized Automaton from a stream.
static Automaton load(java.net.URL url)
          Retrieves a serialized Automaton located by a URL.
static Automaton makeAnyChar()
          Returns new (deterministic) automaton that accepts any single character.
static Automaton makeAnyString()
          Returns new (deterministic) automaton that accepts all strings.
static Automaton makeChar(char c)
          Returns new (deterministic) automaton that accepts a single character of the given value.
static Automaton makeCharRange(char min, char max)
          Returns new (deterministic) automaton that accepts a single char whose value is in the given interval (including both end points).
static Automaton makeCharSet(java.lang.String set)
          Returns new (deterministic) automaton that accepts a single character in the given set.
static Automaton makeEmpty()
          Returns new (deterministic) automaton with the empty language.
static Automaton makeEmptyString()
          Returns new (deterministic) automaton that accepts only the empty string.
static Automaton makeInterval(int min, int max, int digits)
          Returns new automaton that accepts strings representing decimal non-negative integers in the given interval.
static Automaton makeString(java.lang.String s)
          Returns new (deterministic) automaton that accepts the single given string.
 void minimize()
          Minimizes (and determinizes if not already deterministic) this automaton.
 Automaton minus(Automaton a)
          Returns new (deterministic) automaton that accepts the intersection of the language of this automaton and the complement of the language of the given automaton.
 Automaton optional()
          Returns new automaton that accepts the union of the empty string and the language of this automaton.
 Automaton overlap(Automaton a)
          Returns automaton that accepts the strings in more than one way can be split into a left part being accepted by this automaton and a right part being accepted by the given automaton.
 Automaton projectChars(java.util.Set<java.lang.Character> chars)
          Returns new automaton with projected alphabet.
 void reduce()
          Reduces this automaton.
 void removeDeadTransitions()
          Removes transitions to dead states and calls reduce() (a state is "dead" if no accept state is reachable from it).
 Automaton repeat()
          Returns new automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of this automaton.
 Automaton repeat(int min)
          Returns new automaton that accepts min or more concatenated repetitions of the language of this automaton.
 Automaton repeat(int min, int max)
          Returns new automaton that accepts between min and max (including both) concatenated repetitions of the language of this automaton.
 void restoreInvariant()
          Restores representation invariant.
 boolean run(java.lang.String s)
          Returns true if the given string is accepted by this automaton.
 void setDeterministic(boolean deterministic)
          Sets deterministic flag for this automaton.
 void setInfo(java.lang.Object info)
          Associates extra information with this automaton.
 void setInitialState(State s)
          Sets initial state.
static void setMinimization(int algorithm)
          Selects minimization algorithm (default: MINIMIZE_HOPCROFT).
static void setMinimizeAlways(boolean flag)
          Sets or resets minimize always flag.
 Automaton shuffle(Automaton a)
          Returns new automaton that accepts the shuffle (interleaving) of the languages of this and the given automaton.
static java.lang.String shuffleSubsetOf(java.util.Collection<Automaton> ca, Automaton a, java.lang.Character suspend_shuffle, java.lang.Character resume_shuffle)
          Returns a string that is an interleaving of strings that are accepted by ca but not by a.
 Automaton singleChars()
          Returns new automaton that accepts the single chars that occur in strings that are accepted by this automaton.
 void store(java.io.OutputStream stream)
          Writes this Automaton to the given stream.
 boolean subsetOf(Automaton a)
          Returns true if the language of this automaton is a subset of the language of the given automaton.
 Automaton subst(char c, java.lang.String s)
          Returns new automaton where all transitions of the given char are replaced by a string.
 Automaton subst(java.util.Map<java.lang.Character,java.util.Set<java.lang.Character>> map)
          Returns new automaton where all transition labels have been substituted.
 java.lang.String toDot()
          Returns Graphviz Dot representation of this automaton.
 java.lang.String toString()
          Returns a string representation of this automaton.
 Automaton trim(java.lang.String set, char c)
          Returns a new automaton that accepts the trimmed language of this automaton.
 Automaton union(Automaton a)
          Returns new automaton that accepts the union of the languages of this and the given automaton.
static Automaton union(java.util.Collection<Automaton> l)
          Returns new automaton that accepts the union of the languages of the given automata.
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

MINIMIZE_HUFFMAN

public static final int MINIMIZE_HUFFMAN
Minimize using Huffman's O(n2) algorithm. This is the standard text-book algorithm.

See Also:
setMinimization(int), Constant Field Values

MINIMIZE_BRZOZOWSKI

public static final int MINIMIZE_BRZOZOWSKI
Minimize using Brzozowski's O(2n) algorithm. This algorithm uses the reverse-determinize-reverse-determinize trick, which has a bad worst-case behavior but often works very well in practice (even better than Hopcroft's!).

See Also:
setMinimization(int), Constant Field Values

MINIMIZE_HOPCROFT

public static final int MINIMIZE_HOPCROFT
Minimize using Hopcroft's O(n log n) algorithm. This is regarded as one of the most generally efficient algorithms that exist.

See Also:
setMinimization(int), Constant Field Values
Constructor Detail

Automaton

public Automaton()
Constructs new automaton that accepts the empty language. Using this constructor, automata can be constructed manually from State and Transition objects.

See Also:
setInitialState(State), State, Transition
Method Detail

setMinimization

public static void setMinimization(int algorithm)
Selects minimization algorithm (default: MINIMIZE_HOPCROFT).

Parameters:
algorithm - minimization algorithm

setMinimizeAlways

public static void setMinimizeAlways(boolean flag)
Sets or resets minimize always flag. If this flag is set, then minimize() will automatically be invoked after all operations that otherwise may produce non-minimal automata. By default, the flag is not set.

Parameters:
flag - if true, the flag is set

getSingleton

public java.lang.String getSingleton()
Returns the singleton string for this automaton. An automaton that accepts exactly one string may be represented in singleton mode. In that case, this method may be used to obtain the string.

Returns:
string, null if this automaton is not in singleton mode.

setInitialState

public void setInitialState(State s)
Sets initial state.

Parameters:
s - state

getInitialState

public State getInitialState()
Gets initial state.

Returns:
state

isDeterministic

public boolean isDeterministic()
Returns deterministic flag for this automaton.

Returns:
true if the automaton is definitely deterministic, false if the automaton may be nondeterministic

setDeterministic

public void setDeterministic(boolean deterministic)
Sets deterministic flag for this automaton. This method should (only) be used if automata are constructed manually.

Parameters:
deterministic - true if the automaton is definitely deterministic, false if the automaton may be nondeterministic

setInfo

public void setInfo(java.lang.Object info)
Associates extra information with this automaton.

Parameters:
info - extra information

getInfo

public java.lang.Object getInfo()
Returns extra information associated with this automaton.

Returns:
extra information
See Also:
setInfo(Object)

getStates

public java.util.Set<State> getStates()
Returns the set states that are reachable from the initial state.

Returns:
set of State objects

getAcceptStates

public java.util.Set<State> getAcceptStates()
Returns the set of reachable accept states.

Returns:
set of State objects

restoreInvariant

public void restoreInvariant()
Restores representation invariant. This method must be invoked before any built-in automata operation is performed if automaton states or transitions are manipulated manually.

See Also:
setDeterministic(boolean)

reduce

public void reduce()
Reduces this automaton. An automaton is "reduced" by combining overlapping and adjacent edge intervals with same destination.


getLiveStates

public java.util.Set<State> getLiveStates()
Returns set of live states. A state is "live" if an accept state is reachable from it.

Returns:
set of State objects

removeDeadTransitions

public void removeDeadTransitions()
Removes transitions to dead states and calls reduce() (a state is "dead" if no accept state is reachable from it).


makeEmpty

public static Automaton makeEmpty()
Returns new (deterministic) automaton with the empty language.


makeEmptyString

public static Automaton makeEmptyString()
Returns new (deterministic) automaton that accepts only the empty string.


makeAnyString

public static Automaton makeAnyString()
Returns new (deterministic) automaton that accepts all strings.


makeAnyChar

public static Automaton makeAnyChar()
Returns new (deterministic) automaton that accepts any single character.


makeChar

public static Automaton makeChar(char c)
Returns new (deterministic) automaton that accepts a single character of the given value.


makeCharRange

public static Automaton makeCharRange(char min,
                                      char max)
Returns new (deterministic) automaton that accepts a single char whose value is in the given interval (including both end points).


makeCharSet

public static Automaton makeCharSet(java.lang.String set)
Returns new (deterministic) automaton that accepts a single character in the given set.


makeInterval

public static Automaton makeInterval(int min,
                                     int max,
                                     int digits)
                              throws java.lang.IllegalArgumentException
Returns new automaton that accepts strings representing decimal non-negative integers in the given interval.

Parameters:
min - minimal value of interval
max - maximal value of inverval (both end points are included in the interval)
digits - if >0, use fixed number of digits (strings must be prefixed by 0's to obtain the right length) - otherwise, the number of digits is not fixed
Throws:
java.lang.IllegalArgumentException - if min>max or if numbers in the interval cannot be expressed with the given fixed number of digits

makeString

public static Automaton makeString(java.lang.String s)
Returns new (deterministic) automaton that accepts the single given string.

Complexity: constant.


concatenate

public Automaton concatenate(Automaton a)
Returns new automaton that accepts the concatenation of the languages of this and the given automaton.

Complexity: linear in number of states.


concatenate

public static Automaton concatenate(java.util.List<Automaton> l)
Returns new automaton that accepts the concatenation of the languages of the given automata.

Complexity: linear in total number of states.


optional

public Automaton optional()
Returns new automaton that accepts the union of the empty string and the language of this automaton.

Complexity: linear in number of states.


repeat

public Automaton repeat()
Returns new automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of this automaton.

Complexity: linear in number of states.


repeat

public Automaton repeat(int min)
Returns new automaton that accepts min or more concatenated repetitions of the language of this automaton.

Complexity: linear in number of states and in min.


repeat

public Automaton repeat(int min,
                        int max)
Returns new automaton that accepts between min and max (including both) concatenated repetitions of the language of this automaton.

Complexity: linear in number of states and in min and max.


complement

public Automaton complement()
Returns new (deterministic) automaton that accepts the complement of the language of this automaton.

Complexity: linear in number of states (if already deterministic).


minus

public Automaton minus(Automaton a)
Returns new (deterministic) automaton that accepts the intersection of the language of this automaton and the complement of the language of the given automaton. As a side-effect, this automaton is determinized, if not already deterministic.

Complexity: quadratic in number of states (if already deterministic).


intersection

public Automaton intersection(Automaton a)
Returns new (deterministic) automaton that accepts the intersection of the languages of this and the given automaton. As a side-effect, both this and the given automaton are determinized, if not already deterministic.

Complexity: quadratic in number of states (if already deterministic).


shuffleSubsetOf

public static java.lang.String shuffleSubsetOf(java.util.Collection<Automaton> ca,
                                               Automaton a,
                                               java.lang.Character suspend_shuffle,
                                               java.lang.Character resume_shuffle)
Returns a string that is an interleaving of strings that are accepted by ca but not by a. If no such string exists, null is returned. As a side-effect, a is determinized, if not already deterministic. Only interleavings that respect the suspend/resume markers (two BMP private code points) are considered if the markers are non-null. Also, interleavings never split surrogate pairs.

Complexity: proportional to the product of the numbers of states (if a is already deterministic).


union

public Automaton union(Automaton a)
Returns new automaton that accepts the union of the languages of this and the given automaton.

Complexity: linear in number of states.


union

public static Automaton union(java.util.Collection<Automaton> l)
Returns new automaton that accepts the union of the languages of the given automata.

Complexity: linear in number of states.


determinize

public void determinize()
Determinizes this automaton.

Complexity: exponential in number of states.


minimize

public void minimize()
Minimizes (and determinizes if not already deterministic) this automaton.

See Also:
setMinimization(int)

overlap

public Automaton overlap(Automaton a)
Returns automaton that accepts the strings in more than one way can be split into a left part being accepted by this automaton and a right part being accepted by the given automaton.


singleChars

public Automaton singleChars()
Returns new automaton that accepts the single chars that occur in strings that are accepted by this automaton.


trim

public Automaton trim(java.lang.String set,
                      char c)
Returns a new automaton that accepts the trimmed language of this automaton. The resulting automaton is constructed as follows: 1) Whenever a c character is allowed in the original automaton, one or more set characters are allowed in the new automaton. 2) The automaton is prefixed and postfixed with any number of set characters.

Parameters:
set - set of characters to be trimmed
c - canonical trim character (assumed to be in set)

compress

public Automaton compress(java.lang.String set,
                          char c)
Returns a new automaton that accepts the compressed language of this automaton. Whenever a c character is allowed in the original automaton, one or more set characters are allowed in the new automaton.

Parameters:
set - set of characters to be compressed
c - canonical compress character (assumed to be in set)

subst

public Automaton subst(java.util.Map<java.lang.Character,java.util.Set<java.lang.Character>> map)
Returns new automaton where all transition labels have been substituted.

Each transition labeled c is changed to a set of transitions, one for each character in map(c). If map(c) is null, then the transition is unchanged.

Parameters:
map - map from characters to sets of characters (where characters are Character objects)

subst

public Automaton subst(char c,
                       java.lang.String s)
Returns new automaton where all transitions of the given char are replaced by a string.

Parameters:
c - char
s - string
Returns:
new automaton

homomorph

public Automaton homomorph(char[] source,
                           char[] dest)
Returns new automaton accepting the homomorphic image of this automaton using the given function.

This method maps each transition label to a new value. source and dest are assumed to be arrays of same length, and source must be sorted in increasing order and contain no duplicates. source defines the starting points of char intervals, and the corresponding entries in dest define the starting points of corresponding new intervals.


projectChars

public Automaton projectChars(java.util.Set<java.lang.Character> chars)
Returns new automaton with projected alphabet. The new automaton accepts all strings that are projections of strings accepted by this automaton onto the given characters (represented by Character). If null is in the set, it abbreviates the intervals u0000-uDFFF and uF900-uFFFF (i.e., the non-private code points). It is assumed that all other characters from chars are in the interval uE000-uF8FF.


addEpsilons

public void addEpsilons(java.util.Collection<StatePair> pairs)
Adds epsilon transitions to this automaton. This method adds extra character interval transitions that are equivalent to the given set of epsilon transitions.

Parameters:
pairs - collection of StatePair objects representing pairs of source/destination states where epsilon transitions should be added

run

public boolean run(java.lang.String s)
Returns true if the given string is accepted by this automaton. As a side-effect, this automaton is determinized if not already deterministic.

Complexity: linear in length of string (if automaton is already deterministic) and in number of transitions.

Note: to obtain maximum speed, use the RunAutomaton class.


getNumberOfStates

public int getNumberOfStates()
Returns number of states in this automaton.


getNumberOfTransitions

public int getNumberOfTransitions()
Returns number of transitions in this automaton. This number is counted as the total number of edges, where one edge may be a character interval.


isEmptyString

public boolean isEmptyString()
Returns true if this automaton accepts the empty string and nothing else.


isEmpty

public boolean isEmpty()
Returns true if this automaton accepts no strings.


isTotal

public boolean isTotal()
Returns true if this automaton accepts all strings.


isFinite

public boolean isFinite()
Returns true if the language of this automaton is finite.


getFiniteStrings

public java.util.Set<java.lang.String> getFiniteStrings()
Returns set of accepted strings, assuming this automaton has a finite language. If the language is not finite, null is returned.


getFiniteStrings

public java.util.Set<java.lang.String> getFiniteStrings(int limit)
Returns set of accepted strings, assuming that at most limit strings are accepted. If more than limit strings are accepted, null is returned. If limit<0, then this methods works like getFiniteStrings().


getShortestExample

public java.lang.String getShortestExample(boolean accepted)
Returns a shortest accepted/rejected string. If more than one string is found, the lexicographically first is returned.

Parameters:
accepted - if true, look for accepted strings; otherwise, look for rejected strings
Returns:
the string, null if none found

getCommonPrefix

public java.lang.String getCommonPrefix()
Returns the longest string that is a prefix of all accepted strings and visits each state at most once.

Returns:
common prefix

subsetOf

public boolean subsetOf(Automaton a)
Returns true if the language of this automaton is a subset of the language of the given automaton. Implemented using this.intersection(a.complement()).isEmpty().


equals

public boolean equals(java.lang.Object obj)
Returns true if the language of this automaton is equal to the language of the given automaton. Implemented using hashCode and subsetOf.

Overrides:
equals in class java.lang.Object

shuffle

public Automaton shuffle(Automaton a)
Returns new automaton that accepts the shuffle (interleaving) of the languages of this and the given automaton. As a side-effect, both this and the given automaton are determinized, if not already deterministic.

Complexity: quadratic in number of states (if already deterministic).

Author:
Torben Ruby <ruby@daimi.au.dk>


hashCode

public int hashCode()
Returns hash code for this automaton. The hash code is based on the number of states and transitions in the minimized automaton.

Overrides:
hashCode in class java.lang.Object

toString

public java.lang.String toString()
Returns a string representation of this automaton.

Overrides:
toString in class java.lang.Object

toDot

public java.lang.String toDot()
Returns Graphviz Dot representation of this automaton.


clone

public java.lang.Object clone()
Returns a clone of this automaton.

Overrides:
clone in class java.lang.Object

load

public static Automaton load(java.net.URL url)
                      throws java.io.IOException,
                             java.io.OptionalDataException,
                             java.lang.ClassCastException,
                             java.lang.ClassNotFoundException,
                             java.io.InvalidClassException
Retrieves a serialized Automaton located by a URL.

Parameters:
url - URL of serialized automaton
Throws:
java.io.IOException - if input/output related exception occurs
java.io.OptionalDataException - if the data is not a serialized object
java.io.InvalidClassException - if the class serial number does not match
java.lang.ClassCastException - if the data is not a serialized Automaton
java.lang.ClassNotFoundException - if the class of the serialized object cannot be found

load

public static Automaton load(java.io.InputStream stream)
                      throws java.io.IOException,
                             java.io.OptionalDataException,
                             java.lang.ClassCastException,
                             java.lang.ClassNotFoundException,
                             java.io.InvalidClassException
Retrieves a serialized Automaton from a stream.

Parameters:
stream - input stream with serialized automaton
Throws:
java.io.IOException - if input/output related exception occurs
java.io.OptionalDataException - if the data is not a serialized object
java.io.InvalidClassException - if the class serial number does not match
java.lang.ClassCastException - if the data is not a serialized Automaton
java.lang.ClassNotFoundException - if the class of the serialized object cannot be found

store

public void store(java.io.OutputStream stream)
           throws java.io.IOException
Writes this Automaton to the given stream.

Parameters:
stream - output stream for serialized automaton
Throws:
java.io.IOException - if input/output related exception occurs