org.jscience.linguistics.search
Class StringSearch

java.lang.Object
  extended by org.jscience.linguistics.search.StringSearch
Direct Known Subclasses:
BNDM, BoyerMooreHorspool, BoyerMooreSunday, MismatchSearch, ShiftOr

public abstract class StringSearch
extends java.lang.Object

The base class for String searching implementations. String searching implementations do not maintain state and are thread safe - one instance can be used by as many threads as required.

Most pattern-matching algorithms pre-process the pattern to search for in some way. Subclasses of StringSearch allow retrieving the pre-processed pattern to save the time required to build up character tables.

Some of the Objects returned from processBytes(byte[]), processChars(char[]), processString(String) might implement the Serializable interface and enable you to serialize pre-processed Objects to disk, see concrete implementations for details.


Nested Class Summary
protected static class StringSearch.Dispatch
          The Dispatch class implements the strategy to convert Strings to char arrays and calls the appropriate searchChars method in the given StringSearch instance.
protected static class StringSearch.ReflectionDispatch
          The ReflectionDispatch class is used if Reflection can be used to access the underlying char array in Strings to avoid the cloning overhead.
 
Field Summary
protected static StringSearch.Dispatch activeDispatch
          The Dispatch instance.
protected static boolean useNative
          Stores if the native library should be loaded.
 
Constructor Summary
protected StringSearch()
          Constructor for StringSearch.
 
Method Summary
protected  CharIntMap createCharIntMap(char[] pattern)
          Returns a CharIntMap of the extent of the given pattern, using no default value.
protected  CharIntMap createCharIntMap(char[] pattern, int defaultValue)
          Returns a CharIntMap of the extent of the given pattern, using the specified default value.
 boolean equals(java.lang.Object obj)
          Returns if the Object's class name matches this Object's class name.
 int hashCode()
          Returns the hashCode of the Object's Class because all instances of this Class are equal.
protected  int index(byte idx)
          Converts the given byte to an int.
abstract  java.lang.Object processBytes(byte[] pattern)
          Pre-processes a byte array.
abstract  java.lang.Object processChars(char[] pattern)
          Pre-processes a char array
 java.lang.Object processString(java.lang.String pattern)
          Pre-processes a String.
 int searchBytes(byte[] text, byte[] pattern)
          Returns the position in the text at which the pattern was found.
 int searchBytes(byte[] text, byte[] pattern, java.lang.Object processed)
          Returns the position in the text at which the pattern was found.
 int searchBytes(byte[] text, int textStart, byte[] pattern)
          Returns the position in the text at which the pattern was found.
 int searchBytes(byte[] text, int textStart, byte[] pattern, java.lang.Object processed)
          Returns the position in the text at which the pattern was found.
 int searchBytes(byte[] text, int textStart, int textEnd, byte[] pattern)
          Returns the position in the text at which the pattern was found.
abstract  int searchBytes(byte[] text, int textStart, int textEnd, byte[] pattern, java.lang.Object processed)
          Returns the position in the text at which the pattern was found.
 int searchChars(char[] text, char[] pattern)
          Returns the position in the text at which the pattern was found.
 int searchChars(char[] text, char[] pattern, java.lang.Object processed)
          Returns the index of the pattern in the text using the pre-processed Object.
 int searchChars(char[] text, int textStart, char[] pattern)
          Returns the position in the text at which the pattern was found.
 int searchChars(char[] text, int textStart, char[] pattern, java.lang.Object processed)
          Returns the index of the pattern in the text using the pre-processed Object.
 int searchChars(char[] text, int textStart, int textEnd, char[] pattern)
          Returns the position in the text at which the pattern was found.
abstract  int searchChars(char[] text, int textStart, int textEnd, char[] pattern, java.lang.Object processed)
          Returns the index of the pattern in the text using the pre-processed Object.
 int searchString(java.lang.String text, int textStart, int textEnd, java.lang.String pattern)
          Convenience method to search for patterns in Strings.
 int searchString(java.lang.String text, int textStart, int textEnd, java.lang.String pattern, java.lang.Object processed)
          Convenience method to search for patterns in Strings.
 int searchString(java.lang.String text, int textStart, java.lang.String pattern)
          Convenience method to search for patterns in Strings.
 int searchString(java.lang.String text, int textStart, java.lang.String pattern, java.lang.Object processed)
          Convenience method to search for patterns in Strings.
 int searchString(java.lang.String text, java.lang.String pattern)
          Convenience method to search for patterns in Strings.
 int searchString(java.lang.String text, java.lang.String pattern, java.lang.Object processed)
          Convenience method to search for patterns in Strings.
 java.lang.String toString()
          Returns a String representation of this.
 java.lang.StringBuffer toStringBuffer(java.lang.StringBuffer in)
          Appends a String representation of this to the given StringBuffer or creates a new one if none is given.
 boolean usesNative()
          Returns if this algorithm currently uses the native library - if it could be loaded.
static boolean usesReflection()
          Returns if Reflection is used to access the underlying char array in Strings.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

useNative

protected static boolean useNative
Stores if the native library should be loaded. This package comes with a native library called "NativeSearch". If the system property "org.jscience.linguistics.search.native" is null (not defined) or "true" (ignoring case), an attempt is always made to load the native library. Any other values will prevent the native library from being loaded.


activeDispatch

protected static StringSearch.Dispatch activeDispatch
The Dispatch instance.

Constructor Detail

StringSearch

protected StringSearch()
Constructor for StringSearch. Note that it is not required to create multiple instances.

Method Detail

usesReflection

public static boolean usesReflection()
Returns if Reflection is used to access the underlying char array in Strings.

Returns:
true or false

usesNative

public boolean usesNative()
Returns if this algorithm currently uses the native library - if it could be loaded. If the algorithm has a different strategy concerning native libraries or if it does not use the native library at all, it will return false.

Returns:
true or false

processBytes

public abstract java.lang.Object processBytes(byte[] pattern)
Pre-processes a byte array.

Parameters:
pattern - the byte array containing the pattern, may not be null
Returns:
an Object

processChars

public abstract java.lang.Object processChars(char[] pattern)
Pre-processes a char array

Parameters:
pattern - a char array containing the pattern, may not be null
Returns:
an Object

processString

public final java.lang.Object processString(java.lang.String pattern)
Pre-processes a String. This method should not be used directly because it is implicitly called in the searchString(String,String) methods.

Parameters:
pattern - the String containing the pattern, may not be null
Returns:
an Object
See Also:
processChars(char[])

searchBytes

public final int searchBytes(byte[] text,
                             byte[] pattern)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the byte array containing the text, may not be null
pattern - the byte array containing the pattern, may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchBytes(byte[],int,int,byte[],Object)

searchBytes

public final int searchBytes(byte[] text,
                             byte[] pattern,
                             java.lang.Object processed)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the byte array containing the text, may not be null
pattern - the pattern to search for, may not be null
processed - an Object as returned from processBytes(byte[]), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchBytes(byte[],int,int,byte[],Object)

searchBytes

public final int searchBytes(byte[] text,
                             int textStart,
                             byte[] pattern)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the byte array containing the text, may not be null
textStart - at which position in the text the comparing should start
pattern - the byte array containing the pattern, may not be null
Returns:
int the position in the text or -1 if the pattern was not found
See Also:
searchBytes(byte[],int,int,byte[],Object)

searchBytes

public final int searchBytes(byte[] text,
                             int textStart,
                             byte[] pattern,
                             java.lang.Object processed)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the byte array containing the text, may not be null
textStart - at which position in the text the comparing should start
pattern - the pattern to search for, may not be null
processed -
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchBytes(byte[],int,int,byte[],Object)

searchBytes

public final int searchBytes(byte[] text,
                             int textStart,
                             int textEnd,
                             byte[] pattern)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - text the byte array containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the byte array containing the pattern, may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchBytes(byte[],int,int,byte[],Object)

searchBytes

public abstract int searchBytes(byte[] text,
                                int textStart,
                                int textEnd,
                                byte[] pattern,
                                java.lang.Object processed)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - text the byte array containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the pattern to search for, may not be null
processed - an Object as returned from processBytes(byte[]), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
processBytes(byte[])

searchChars

public final int searchChars(char[] text,
                             char[] pattern)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the character array containing the text, may not be null
pattern - the char array containing the pattern, may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchChars

public final int searchChars(char[] text,
                             char[] pattern,
                             java.lang.Object processed)
Returns the index of the pattern in the text using the pre-processed Object. Returns -1 if the pattern was not found.

Parameters:
text - the character array containing the text, may not be null
pattern - the char array containing the pattern, may not be null
processed - an Object as returned from processChars(char[]) or processString(String), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchChars

public final int searchChars(char[] text,
                             int textStart,
                             char[] pattern)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the character array containing the text, may not be null
textStart - at which position in the text the comparing should start
pattern - the char array containing the pattern, may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchChars

public final int searchChars(char[] text,
                             int textStart,
                             char[] pattern,
                             java.lang.Object processed)
Returns the index of the pattern in the text using the pre-processed Object. Returns -1 if the pattern was not found.

Parameters:
text - the String containing the text, may not be null
textStart - at which position in the text the comparing should start
pattern - the char array containing the pattern, may not be null
processed - an Object as returned from processChars(char[]) or processString(String), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchChars

public final int searchChars(char[] text,
                             int textStart,
                             int textEnd,
                             char[] pattern)
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the character array containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the char array containing the pattern, may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchChars

public abstract int searchChars(char[] text,
                                int textStart,
                                int textEnd,
                                char[] pattern,
                                java.lang.Object processed)
Returns the index of the pattern in the text using the pre-processed Object. Returns -1 if the pattern was not found.

Parameters:
text - the String containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the pattern to search for, may not be null
processed - an Object as returned from processChars(char[]) or processString(String), may not be null
Returns:
the position in the text or -1 if the pattern was not found

searchString

public final int searchString(java.lang.String text,
                              java.lang.String pattern)
Convenience method to search for patterns in Strings. Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the String containing the text, may not be null
pattern - the String containing the pattern, may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchString

public final int searchString(java.lang.String text,
                              java.lang.String pattern,
                              java.lang.Object processed)
Convenience method to search for patterns in Strings. Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the String containing the text, may not be null
pattern - the String containing the pattern, may not be null
processed - an Object as returned from processChars(char[]) or processString(String), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchString

public final int searchString(java.lang.String text,
                              int textStart,
                              java.lang.String pattern)
Convenience method to search for patterns in Strings. Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the String containing the text, may not be null
textStart - at which position in the text the comparing should start
pattern - the String containing the pattern, may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchString

public final int searchString(java.lang.String text,
                              int textStart,
                              java.lang.String pattern,
                              java.lang.Object processed)
Convenience method to search for patterns in Strings. Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the String containing the text, may not be null
textStart - at which position in the text the comparing should start
pattern - the String containing the pattern, may not be null
processed - an Object as returned from processChars(char[]) or processString(String), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[],Object)

searchString

public final int searchString(java.lang.String text,
                              int textStart,
                              int textEnd,
                              java.lang.String pattern)
Convenience method to search for patterns in Strings. Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the String containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the String containing the pattern, may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[])

searchString

public final int searchString(java.lang.String text,
                              int textStart,
                              int textEnd,
                              java.lang.String pattern,
                              java.lang.Object processed)
Convenience method to search for patterns in Strings. Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Parameters:
text - the String containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the String containing the pattern, may not be null
processed - an Object as returned from processChars(char[]) or processString(String), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
searchChars(char[],int,int,char[])

equals

public final boolean equals(java.lang.Object obj)
Returns if the Object's class name matches this Object's class name.

Overrides:
equals in class java.lang.Object
Parameters:
obj - the other Object
Returns:
if the Object is equal to this Object
See Also:
Object.equals(Object)

hashCode

public final int hashCode()
Returns the hashCode of the Object's Class because all instances of this Class are equal.

Overrides:
hashCode in class java.lang.Object
Returns:
an int
See Also:
Object.hashCode()

toString

public final java.lang.String toString()
Returns a String representation of this. Simply returns the name of the Class.

Overrides:
toString in class java.lang.Object
Returns:
a String
See Also:
Object.toString()

toStringBuffer

public java.lang.StringBuffer toStringBuffer(java.lang.StringBuffer in)
Appends a String representation of this to the given StringBuffer or creates a new one if none is given. This method is not final because subclasses might want a different String format.

Parameters:
in - the StringBuffer to append to, may be null
Returns:
a StringBuffer

createCharIntMap

protected CharIntMap createCharIntMap(char[] pattern)
Returns a CharIntMap of the extent of the given pattern, using no default value.

Parameters:
pattern - the pattern
Returns:
a CharIntMap
See Also:
CharIntMap.CharIntMap(int,char)

createCharIntMap

protected CharIntMap createCharIntMap(char[] pattern,
                                      int defaultValue)
Returns a CharIntMap of the extent of the given pattern, using the specified default value.

Parameters:
pattern - the pattern
defaultValue - the default value
Returns:
a CharIntMap
See Also:
CharIntMap.CharIntMap(int,char,int)

index

protected final int index(byte idx)
Converts the given byte to an int.

Parameters:
idx - the byte
Returns:
an int