org.jscience.biology.alignment
Class PairwiseAlignmentAlgorithm

java.lang.Object
  extended by org.jscience.biology.alignment.PairwiseAlignmentAlgorithm
Direct Known Subclasses:
CrochemoreLandauZivUkelson, NeedlemanWunsch, SmithWaterman

public abstract class PairwiseAlignmentAlgorithm
extends java.lang.Object

This abstract class is the superclass of all classes implementing pairwise sequence alignment algorithms. Subclasses are required to provide methods to build a high scoring alignment between two sequences and compute its score with a given scoring scheme.

Clients are required to set a scoring scheme and load two sequences before requesting an alignment or the computation of its score. They typically make the following sequence of method calls:

 // prepare
 PairwiseAlignmentAlgorithm algorithm = new SomePairwiseAlignmentAlgorith ();
 algorithm.setScoringScheme (some_scoring_scheme);
 algorithm.loadSequences (sequence1, sequence2);
 

// now compute the alignment PairwiseAlignment alignment = algorithm.getPairwiseAlignment(); int score = algorithm.getScore();

See Also:
PairwiseAlignment

Field Summary
protected  PairwiseAlignment alignment
          Stores the product of the last pairwise alignment performed.
protected static char APPROXIMATE_MATCH_TAG
          Tag character that signals an approximate match in the score tag line of an alignment.
protected static char GAP_CHARACTER
          Character that signals a gap in sequence.
protected static char GAP_TAG
          Character that signals a gap in the score tag line of an alignment.
protected static char MATCH_TAG
          Tag character that signals a match in the score tag line of an alignment.
protected static char MISMATCH_TAG
          Character that signals a mismatch in the score tag line of an alignment.
protected  int score
          This field stores just the score of the last pairwise alignment performed (if the score_computed flag is set to true).
protected  boolean score_computed
          Flags whether the score of the alignment between the last two loaded sequences has already been computed.
protected  ScoringScheme scoring
          The scoring scheme used to compute a pairwise sequence alignment.
protected  boolean sequences_loaded
          Flags whether sequences have been loaded.
protected  boolean use_match_tag
          Indicates if the MATCH_TAG tag should be used or not.
 
Constructor Summary
PairwiseAlignmentAlgorithm()
           
 
Method Summary
protected abstract  PairwiseAlignment computePairwiseAlignment()
          Subclasses must implement this method to compute an alignment between the loaded sequences using the scoring scheme previously set.
protected abstract  int computeScore()
          Subclasses must implement this method to compute the score of the alignment between the loaded sequences using the scoring scheme previously set.
 PairwiseAlignment getPairwiseAlignment()
          Return the last pairwise alignment computed (if any) or request subclasses to compute one and return the result by calling the computePairwiseAlignment method.
 int getScore()
          Returns the score of the last alignment computed (if any) or request subclasses to compute one and return the result by calling the computeScore method.
 void loadSequences(java.io.Reader input1, java.io.Reader input2)
          Request subclasses to load the sequences according to their own needs.
protected abstract  void loadSequencesInternal(java.io.Reader input1, java.io.Reader input2)
          Subclasses must implement this method to load sequences according to their own needs and throw an exception in case of any failure.
protected  int max(int v1, int v2)
          Helper method to compute the the greater of two values.
protected  int max(int v1, int v2, int v3)
          Helper method to compute the the greater of three values.
protected  int max(int v1, int v2, int v3, int v4)
          Helper method to compute the the greater of four values.
protected  int scoreDeletion(char a)
          Helper method to invoke the scoreDeletion method of the scoring scheme set to this algorithm.
protected  int scoreInsertion(char a)
          Helper method to invoke the scoreInsertion method of the scoring scheme set to this algorithm.
protected  int scoreSubstitution(char a, char b)
          Helper method to invoke the scoreSubstitution method of the scoring scheme set to this algorithm.
 void setScoringScheme(ScoringScheme scoring)
          Sets the scoring scheme to be used for the next alignments.
 void unloadSequences()
          Frees pointer to loaded sequences and computed alignments (if any) so that their data can be garbage collected.
protected abstract  void unloadSequencesInternal()
          Subclasses must implement this method to unload sequences according to their own storage, freeing pointers to sequences and any intermediate data so that they can be garbage collected.
protected  boolean useMatchTag()
          Tells wether the MATCH_TAG tag should be used or not.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MATCH_TAG

protected static final char MATCH_TAG
Tag character that signals a match in the score tag line of an alignment. Its use is conditioned by the use_match_tag flag.

See Also:
use_match_tag, useMatchTag(), Constant Field Values

APPROXIMATE_MATCH_TAG

protected static final char APPROXIMATE_MATCH_TAG
Tag character that signals an approximate match in the score tag line of an alignment.

See Also:
Constant Field Values

MISMATCH_TAG

protected static final char MISMATCH_TAG
Character that signals a mismatch in the score tag line of an alignment.

See Also:
Constant Field Values

GAP_TAG

protected static final char GAP_TAG
Character that signals a gap in the score tag line of an alignment.

See Also:
Constant Field Values

GAP_CHARACTER

protected static final char GAP_CHARACTER
Character that signals a gap in sequence.

See Also:
Constant Field Values

use_match_tag

protected boolean use_match_tag
Indicates if the MATCH_TAG tag should be used or not. If it is true, the alignment algorithm should write the MATCH_TAG tag in the score tag line of the alignment whenever a match occurs between characters of the two sequences. If it is false the matching character should be written instead. This flag is updated whenever a scoring scheme is set to this PairwiseAlignmentAlgorithm by the setScoringScheme method.

See Also:
MATCH_TAG, useMatchTag(), setScoringScheme(org.jscience.biology.alignment.ScoringScheme)

scoring

protected ScoringScheme scoring
The scoring scheme used to compute a pairwise sequence alignment. It must be set before performing the alignment, and if a new scoring scheme is set, any alignment or score already computed is lost.


alignment

protected PairwiseAlignment alignment
Stores the product of the last pairwise alignment performed. It contains a string representation of a highest scoring alignment between the two sequences and its score. It is set after a successful execution of the computePairwiseAlignment method that subclasses must implement. It is set to null if new sequences are loaded or a new scoring scheme is set.


score

protected int score
This field stores just the score of the last pairwise alignment performed (if the score_computed flag is set to true). It is useful when just the score is needed (and not the alignment itselft). Its value is set after a successful execution of both computePairwiseAlignment or computeScore methods that subclasses must implement. If new sequences are loaded or a new scoring scheme is set, the score_computed flag is set to false, and this field's value becomes undefined.


score_computed

protected boolean score_computed
Flags whether the score of the alignment between the last two loaded sequences has already been computed. It is set to true after a successful execution of both computePairwiseAlignment or computeScore methods that subclasses must implement. It is set to falsef if new sequences are loaded or a new scoring scheme is set.


sequences_loaded

protected boolean sequences_loaded
Flags whether sequences have been loaded. It is set to true after subclasses successfully load two sequences.

Constructor Detail

PairwiseAlignmentAlgorithm

public PairwiseAlignmentAlgorithm()
Method Detail

setScoringScheme

public void setScoringScheme(ScoringScheme scoring)
Sets the scoring scheme to be used for the next alignments. Any alignment or score already computed is lost. If the scoring scheme supports partial matches, this PairwiseAlignmentAlgorithm is set not to use the MATCH_TAG tag because in this case the score tag line be confusing. If the scoring scheme does not support partial matches, then the use of the MATCH_TAG tag is enabled.

Parameters:
scoring - Scoring scheme to be used
See Also:
MATCH_TAG, ScoringScheme.isPartialMatchSupported()

useMatchTag

protected boolean useMatchTag()
Tells wether the MATCH_TAG tag should be used or not. If it returns true, the alignment algorithm should write the MATCH_TAG tag in the score tag line of the alignment produced whenever a match occurs between characters of the two sequences. If it returns false the matching character should be written instead. The value returned is conditioned by the use_match_tag flag, which is updated whenever a scoring scheme is set to this PairwiseAlignmentAlgorithm by the setScoringScheme method.

Returns:
trueMATCH_TAG tag should be used, false otherwise
See Also:
MATCH_TAG, use_match_tag, setScoringScheme(org.jscience.biology.alignment.ScoringScheme)

loadSequences

public void loadSequences(java.io.Reader input1,
                          java.io.Reader input2)
                   throws java.io.IOException,
                          InvalidSequenceException
Request subclasses to load the sequences according to their own needs. Any alignment and score already computed is lost. If no exception is raised, the loaded flag is set to true. Subclasses typically store the sequences in instances of an appropiate class and each can have its own contract, so check each algorithm to see what kind of sequences it produces. Input can come from any source provided they are encapsulated with a proper Reader. They must be ready to be read, i.e. the next read operation must return the sequence's first character.

Parameters:
input1 - First sequence
input2 - Second sequence
Throws:
java.io.IOException - If an I/O error occurs when reading the sequences
InvalidSequenceException - If the sequences are not valid

unloadSequences

public void unloadSequences()
Frees pointer to loaded sequences and computed alignments (if any) so that their data can be garbage collected.


getPairwiseAlignment

public PairwiseAlignment getPairwiseAlignment()
                                       throws IncompatibleScoringSchemeException
Return the last pairwise alignment computed (if any) or request subclasses to compute one and return the result by calling the computePairwiseAlignment method. The sequences must already be loaded and a scoring scheme must already be set.

Returns:
a pairwise alignment between the loaded sequences
Throws:
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences
See Also:
computePairwiseAlignment()

getScore

public int getScore()
             throws IncompatibleScoringSchemeException
Returns the score of the last alignment computed (if any) or request subclasses to compute one and return the result by calling the computeScore method. The sequences must already be loaded and a scoring scheme must already be set.

Returns:
the score of the alignment between the loaded sequences
Throws:
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences
See Also:
computeScore()

loadSequencesInternal

protected abstract void loadSequencesInternal(java.io.Reader input1,
                                              java.io.Reader input2)
                                       throws java.io.IOException,
                                              InvalidSequenceException
Subclasses must implement this method to load sequences according to their own needs and throw an exception in case of any failure. If no exception is raised, the loaded flag is set to true by the public method and the sequences are believed to be loaded (so an alignment or score can be requested).

Parameters:
input1 - First sequence
input2 - Second sequence
Throws:
java.io.IOException - If an I/O error occurs when reading the sequences
InvalidSequenceException - If the sequences are not valid
See Also:
loadSequences(java.io.Reader, java.io.Reader), CharSequence, FactorSequence

unloadSequencesInternal

protected abstract void unloadSequencesInternal()
Subclasses must implement this method to unload sequences according to their own storage, freeing pointers to sequences and any intermediate data so that they can be garbage collected. This methid is called by the public unloadSequences method.

See Also:
unloadSequences()

computePairwiseAlignment

protected abstract PairwiseAlignment computePairwiseAlignment()
                                                       throws IncompatibleScoringSchemeException
Subclasses must implement this method to compute an alignment between the loaded sequences using the scoring scheme previously set. This methid is called by the getPairwiseAlignment method when needed.

Returns:
a pairwise alignment between the loaded sequences
Throws:
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences
See Also:
getPairwiseAlignment()

computeScore

protected abstract int computeScore()
                             throws IncompatibleScoringSchemeException
Subclasses must implement this method to compute the score of the alignment between the loaded sequences using the scoring scheme previously set. This methid is called by the getScore method when needed.

Returns:
the score of the alignment between the loaded sequences
Throws:
IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences
See Also:
getScore()

scoreSubstitution

protected final int scoreSubstitution(char a,
                                      char b)
                               throws IncompatibleScoringSchemeException
Helper method to invoke the scoreSubstitution method of the scoring scheme set to this algorithm.

Parameters:
a - first character
b - second character
Returns:
score of substitution of a for b
Throws:
IncompatibleScoringSchemeException - if the scoring scheme is not compatible with the sequences being aligned
See Also:
ScoringScheme.scoreSubstitution(char, char)

scoreInsertion

protected final int scoreInsertion(char a)
                            throws IncompatibleScoringSchemeException
Helper method to invoke the scoreInsertion method of the scoring scheme set to this algorithm.

Parameters:
a - the character to be inserted
Returns:
score of insertion of a
Throws:
IncompatibleScoringSchemeException - if the scoring scheme is not compatible with the sequences being aligned
See Also:
ScoringScheme.scoreInsertion(char)

scoreDeletion

protected final int scoreDeletion(char a)
                           throws IncompatibleScoringSchemeException
Helper method to invoke the scoreDeletion method of the scoring scheme set to this algorithm.

Parameters:
a - the character to be deleted
Returns:
score of deletion of a
Throws:
IncompatibleScoringSchemeException - if the scoring scheme is not compatible with the sequences being aligned
See Also:
ScoringScheme.scoreDeletion(char)

max

protected final int max(int v1,
                        int v2)
Helper method to compute the the greater of two values.

Parameters:
v1 - first value
v2 - second value
Returns:
the larger of v1 and v2

max

protected final int max(int v1,
                        int v2,
                        int v3)
Helper method to compute the the greater of three values.

Parameters:
v1 - first value
v2 - second value
v3 - third value
Returns:
the larger of v1, v2 and v3

max

protected final int max(int v1,
                        int v2,
                        int v3,
                        int v4)
Helper method to compute the the greater of four values.

Parameters:
v1 - first value
v2 - second value
v3 - third value
v4 - fourth value
Returns:
the larger of v1, v2 v3 and v4