org.jscience.linguistics.search
Class ShiftOr

java.lang.Object
  extended by org.jscience.linguistics.search.StringSearch
      extended by org.jscience.linguistics.search.ShiftOr
Direct Known Subclasses:
ShiftOrClasses, ShiftOrWildcards

public class ShiftOr
extends StringSearch

An implementation of the Shift-Or algorithm by Ricardo Baeza-Yates and Gaston Gonnet as outlined in "A New Approach to Text Searching" (appeared in Proceedings of the 12th International Conference on Research and Development in Datum Retrieval). The Shift-Or algorithm is a bit-parallel algorithm.

The Shift-Or algorithm is not the fastest and by itself slower than String's indexOf method. It's usefulness comes from it's ability to support character classes and searching with errors at the same speed.

It's searchChars(char[],int,int,char[],Object) method is extremely slow in the Sun Java Virtual Machines. If possible, the searchBytes(byte[],int,int,byte[],Object) methods should be preferred. Because the main loop is also used by the ShiftOrWildcards class, the implementation cannot skip forward until the first character matches (as in the original algorithm).

This implementation currently limited to at most 31 characters because Java has no unsigned int type. An implementation that used long has proved to take twice the amount of time.

 Preprocessing: O(n + ∑) time

Searching : O(mn / log n) (worst case and average)

See Also:
ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/papers/CACM92.ps.gz , http://citeseer.nj.nec.com/50265.html , ShiftOrWildcards, ShiftOrClasses, ShiftOrMismatches

Nested Class Summary
 
Nested classes/interfaces inherited from class org.jscience.linguistics.search.StringSearch
StringSearch.Dispatch, StringSearch.ReflectionDispatch
 
Field Summary
 
Fields inherited from class org.jscience.linguistics.search.StringSearch
activeDispatch, useNative
 
Constructor Summary
ShiftOr()
          Constructor for ShiftOr.
 
Method Summary
 java.lang.Object processBytes(byte[] pattern)
          Pre-processing of the pattern.
 java.lang.Object processChars(char[] pattern)
          Pre-processing of the pattern.
 int searchBytes(byte[] text, int textStart, int textEnd, byte[] pattern, java.lang.Object processed)
          Returns the position in the text at which the pattern was found.
 int searchChars(char[] text, int textStart, int textEnd, char[] pattern, java.lang.Object processed)
          Returns the index of the pattern in the text using the pre-processed Object.
 
Methods inherited from class org.jscience.linguistics.search.StringSearch
createCharIntMap, createCharIntMap, equals, hashCode, index, processString, searchBytes, searchBytes, searchBytes, searchBytes, searchBytes, searchChars, searchChars, searchChars, searchChars, searchChars, searchString, searchString, searchString, searchString, searchString, searchString, toString, toStringBuffer, usesNative, usesReflection
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ShiftOr

public ShiftOr()
Constructor for ShiftOr. Note that it is not required to create multiple instances.

Method Detail

processBytes

public java.lang.Object processBytes(byte[] pattern)
Pre-processing of the pattern. The pattern may not exceed 31 bytes in length. If it does, only it's first 31 bytes are processed which might lead to unexpected results. Returns an int array.

Specified by:
processBytes in class StringSearch
Parameters:
pattern - the byte array containing the pattern, may not be null
Returns:
an Object
See Also:
StringSearch.processBytes(byte[])

processChars

public java.lang.Object processChars(char[] pattern)
Pre-processing of the pattern. The pattern may not exceed 31 characters in length. If it does, only it's first 31 characters are processed which might lead to unexpected results. Returns a CharIntMap.

Specified by:
processChars in class StringSearch
Parameters:
pattern - the pattern
Returns:
an Object
See Also:
StringSearch.processChars(char[])

searchBytes

public int searchBytes(byte[] text,
                       int textStart,
                       int textEnd,
                       byte[] pattern,
                       java.lang.Object processed)
Description copied from class: StringSearch
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Specified by:
searchBytes in class StringSearch
Parameters:
text - text the byte array containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the pattern to search for, may not be null
processed - an Object as returned from StringSearch.processBytes(byte[]), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
StringSearch.searchBytes(byte[], int,int,byte[],java.lang.Object)

searchChars

public int searchChars(char[] text,
                       int textStart,
                       int textEnd,
                       char[] pattern,
                       java.lang.Object processed)
Description copied from class: StringSearch
Returns the index of the pattern in the text using the pre-processed Object. Returns -1 if the pattern was not found.

Specified by:
searchChars in class StringSearch
Parameters:
text - the String containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the pattern to search for, may not be null
processed - an Object as returned from StringSearch.processChars(char[]) or StringSearch.processString(String), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
StringSearch.searchChars(char[], int,int,char[],Object)