org.jscience.biology.alignment
Class SmithWaterman

java.lang.Object
  extended by org.jscience.biology.alignment.PairwiseAlignmentAlgorithm
      extended by org.jscience.biology.alignment.SmithWaterman

public class SmithWaterman
extends PairwiseAlignmentAlgorithm

This class implement the classic local alignment algorithm (with linear gap penalty function) due to T.F.Smith and M.S.Waterman (1981).

This algorithm is very similar to the NeedlemanWunsch algorithm for global alignment. The idea here also consists of building an (n+1 x m+1) matrix M given two sequences A and B of sizes n and m, respectively. However, unlike in the global alignment case, every position M[i,j] in the matrix contains the similarity score of suffixes of A[1..i] and B[1..j].

Starting from row 0, column 0, the computeMatrix method computes each position M[i,j] with the following recurrence:

 M[0,0] = M[0,j] = M[i,0] = 0
 M[i,j] = max { M[i,j-1]   + scoreInsertion (B[j]),
 M[i-1,j-1] + scoreSubstitution (A[i], B[j]),
 M[i-1,j]   + scoreDeletion(A[i])             }
 

Note that, here, all cells in the first row and column are set to zero. The best local alignment score is the highest value found anywhere in the matrix.

Just like in global alignment case, this algorithm has quadratic space complexity because it needs to keep an (n+1 x m+1) matrix in memory. And since the work of computing each cell is constant, it also has quadratic time complexity.

After the matrix has been computed, the alignment can be retrieved by tracing a path back in the matrix from the position of the highest score until a cell of value zero is reached. This step is performed by the buildOptimalAlignment method, and its time complexity is linear on the size of the alignment.

If the similarity value only is needed (and not the alignment itself), it is easy to reduce the space requirement to O(n) by keeping just the last row or column in memory. This is precisely what is done by the computeScore method. Note that it still requires O(n2) time.

For a more efficient approach to the local alignment problem, see the CrochemoreLandauZivUkelson algorithm. For global alignment, see the NeedlemanWunsch algorithm.

See Also:
NeedlemanWunsch, CrochemoreLandauZivUkelson, CrochemoreLandauZivUkelsonLocalAlignment, CrochemoreLandauZivUkelsonGlobalAlignment

Field Summary
protected  int[][] matrix
          The dynamic programming matrix.
protected  int max_col
          Indicate the column of where an optimal local alignment can be found in the matrix.
protected  int max_row
          Indicate the row of where an optimal local alignment can be found in the matrix..
protected  CharSequence seq1
          The first sequence of an alignment.
protected  CharSequence seq2
          The second sequence of an alignment.
 
Fields inherited from class org.jscience.biology.alignment.PairwiseAlignmentAlgorithm
alignment, APPROXIMATE_MATCH_TAG, GAP_CHARACTER, GAP_TAG, MATCH_TAG, MISMATCH_TAG, score, score_computed, scoring, sequences_loaded, use_match_tag
 
Constructor Summary
SmithWaterman()
           
 
Method Summary
protected  PairwiseAlignment buildOptimalAlignment()
          Builds an optimal local alignment between the loaded sequences.
protected  void computeMatrix()
          Computes the dynamic programming matrix.
protected  PairwiseAlignment computePairwiseAlignment()
          Builds an optimal local alignment between the loaded sequences after computing the dynamic programming matrix.
protected  int computeScore()
          Computes the score of the best local alignment between the two sequences using the scoring scheme previously set.
protected  void loadSequencesInternal(java.io.Reader input1, java.io.Reader input2)
          Loads sequences into CharSequence instances.
protected  void unloadSequencesInternal()
          Frees pointers to loaded sequences and the dynamic programming matrix so that their data can be garbage collected.
 
Methods inherited from class org.jscience.biology.alignment.PairwiseAlignmentAlgorithm
getPairwiseAlignment, getScore, loadSequences, max, max, max, scoreDeletion, scoreInsertion, scoreSubstitution, setScoringScheme, unloadSequences, useMatchTag
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

seq1

protected CharSequence seq1
The first sequence of an alignment.


seq2

protected CharSequence seq2
The second sequence of an alignment.


matrix

protected int[][] matrix
The dynamic programming matrix. Each position (i, j) represents the best score between a suffic of the firsts i characters of seq1 and a suffix of the first j characters of seq2.


max_row

protected int max_row
Indicate the row of where an optimal local alignment can be found in the matrix..


max_col

protected int max_col
Indicate the column of where an optimal local alignment can be found in the matrix.

Constructor Detail

SmithWaterman

public SmithWaterman()
Method Detail

loadSequencesInternal

protected void loadSequencesInternal(java.io.Reader input1,
                                     java.io.Reader input2)
                              throws java.io.IOException,
                                     InvalidSequenceException
Loads sequences into CharSequence instances. In case of any error, an exception is raised by the constructor of CharSequence (please check the specification of that class for specific requirements).

Specified by:
loadSequencesInternal in class PairwiseAlignmentAlgorithm
Parameters:
input1 - Input for first sequence
input2 - Input for second sequence
Throws:
java.io.IOException - If an I/O error occurs when reading the sequences
InvalidSequenceException - If the sequences are not valid
See Also:
CharSequence

unloadSequencesInternal

protected void unloadSequencesInternal()
Frees pointers to loaded sequences and the dynamic programming matrix so that their data can be garbage collected.

Specified by:
unloadSequencesInternal in class PairwiseAlignmentAlgorithm
See Also:
PairwiseAlignmentAlgorithm.unloadSequences()

computePairwiseAlignment

protected PairwiseAlignment computePairwiseAlignment()
                                              throws IncompatibleScoringSchemeException
Builds an optimal local alignment between the loaded sequences after computing the dynamic programming matrix. It calls the buildOptimalAlignment method after the computeMatrix method computes the dynamic programming matrix.

Specified by:
computePairwiseAlignment in class PairwiseAlignmentAlgorithm
Returns:
an optimal pairwise alignment between the loaded sequences
Throws:
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.
See Also:
computeMatrix(), buildOptimalAlignment()

computeMatrix

protected void computeMatrix()
                      throws IncompatibleScoringSchemeException
Computes the dynamic programming matrix.

Throws:
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.

buildOptimalAlignment

protected PairwiseAlignment buildOptimalAlignment()
                                           throws IncompatibleScoringSchemeException
Builds an optimal local alignment between the loaded sequences. Before it is executed, the dynamic programming matrix must already have been computed by the computeMatrix method.

Returns:
an optimal local alignment between the loaded sequences
Throws:
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.
See Also:
computeMatrix()

computeScore

protected int computeScore()
                    throws IncompatibleScoringSchemeException
Computes the score of the best local alignment between the two sequences using the scoring scheme previously set. This method calculates the similarity value only (doesn't build the whole matrix so the alignment cannot be recovered, however it has the advantage of requiring O(n) space only).

Specified by:
computeScore in class PairwiseAlignmentAlgorithm
Returns:
the score of the best local alignment between the loaded sequences
Throws:
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.
See Also:
PairwiseAlignmentAlgorithm.getScore()