org.jscience.linguistics.search
Class BoyerMooreHorspool

java.lang.Object
  extended by org.jscience.linguistics.search.StringSearch
      extended by org.jscience.linguistics.search.BoyerMooreHorspool
Direct Known Subclasses:
BoyerMooreHorspoolRaita

public class BoyerMooreHorspool
extends StringSearch

An implementation of Horspool's improved version of the Boyer-Moore String searching algorithm. See "Practical fast searching in strings" (appeared in Software - Practice & Experience, 10(6):501-506). Unfortunately, there seems to be no on-line version of his paper.

This is the second fastest algorithm in this library for the searchChars and searchString. Except for very short patterns (< 5 characters), it is always faster than any other algorithm except BoyerMooreHorspoolRaita and faster than String.indexOf(java.lang.String) by more than 5 times for patterns longer than 24 characters. It's searchBytes methods are slightly faster than in the BoyerMooreHorspoolRaita algorithm.

This implementation is based on Ricardo Baeza-Yates' implementation.

Preprocessing: O(m + ∑) time

Processing : O(mn) worst case


Nested Class Summary
 
Nested classes/interfaces inherited from class org.jscience.linguistics.search.StringSearch
StringSearch.Dispatch, StringSearch.ReflectionDispatch
 
Field Summary
 
Fields inherited from class org.jscience.linguistics.search.StringSearch
activeDispatch, useNative
 
Constructor Summary
BoyerMooreHorspool()
          Constructor for BoyerMooreHorspool.
 
Method Summary
 java.lang.Object processBytes(byte[] pattern)
          Returns a int array.
 java.lang.Object processChars(char[] pattern)
          Returns a CharIntMap.
 int searchBytes(byte[] text, int textStart, int textEnd, byte[] pattern, java.lang.Object processed)
          Returns the position in the text at which the pattern was found.
 int searchChars(char[] text, int textStart, int textEnd, char[] pattern, java.lang.Object processed)
          Returns the index of the pattern in the text using the pre-processed Object.
 
Methods inherited from class org.jscience.linguistics.search.StringSearch
createCharIntMap, createCharIntMap, equals, hashCode, index, processString, searchBytes, searchBytes, searchBytes, searchBytes, searchBytes, searchChars, searchChars, searchChars, searchChars, searchChars, searchString, searchString, searchString, searchString, searchString, searchString, toString, toStringBuffer, usesNative, usesReflection
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

BoyerMooreHorspool

public BoyerMooreHorspool()
Constructor for BoyerMooreHorspool. Note that it is not required to create multiple instances.

Method Detail

processBytes

public java.lang.Object processBytes(byte[] pattern)
Returns a int array.

Specified by:
processBytes in class StringSearch
Parameters:
pattern - the byte array containing the pattern, may not be null
Returns:
an Object
See Also:
StringSearch.processBytes(byte[])

processChars

public java.lang.Object processChars(char[] pattern)
Returns a CharIntMap.

Specified by:
processChars in class StringSearch
Parameters:
pattern - a char array containing the pattern, may not be null
Returns:
an Object
See Also:
StringSearch.processChars(char[])

searchBytes

public int searchBytes(byte[] text,
                       int textStart,
                       int textEnd,
                       byte[] pattern,
                       java.lang.Object processed)
Description copied from class: StringSearch
Returns the position in the text at which the pattern was found. Returns -1 if the pattern was not found.

Specified by:
searchBytes in class StringSearch
Parameters:
text - text the byte array containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the pattern to search for, may not be null
processed - an Object as returned from StringSearch.processBytes(byte[]), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
StringSearch.searchBytes(byte[], int,int,byte[],java.lang.Object)

searchChars

public int searchChars(char[] text,
                       int textStart,
                       int textEnd,
                       char[] pattern,
                       java.lang.Object processed)
Description copied from class: StringSearch
Returns the index of the pattern in the text using the pre-processed Object. Returns -1 if the pattern was not found.

Specified by:
searchChars in class StringSearch
Parameters:
text - the String containing the text, may not be null
textStart - at which position in the text the comparing should start
textEnd - at which position in the text comparing should stop
pattern - the pattern to search for, may not be null
processed - an Object as returned from StringSearch.processChars(char[]) or StringSearch.processString(String), may not be null
Returns:
the position in the text or -1 if the pattern was not found
See Also:
StringSearch.searchChars(char[], int,int,char[],Object)