org.jscience.biology.alignment
Class CrochemoreLandauZivUkelsonLocalAlignment

java.lang.Object
  extended by org.jscience.biology.alignment.PairwiseAlignmentAlgorithm
      extended by org.jscience.biology.alignment.CrochemoreLandauZivUkelson
          extended by org.jscience.biology.alignment.CrochemoreLandauZivUkelsonLocalAlignment

public class CrochemoreLandauZivUkelsonLocalAlignment
extends CrochemoreLandauZivUkelson

This class implements the local pairwise sequence alignment algorithm (with linear gap penalty function) due to Maxime Crochemore, Gad Landau and Michal Ziv-Ukelson (2002).

This implementation derives from the paper of M.Crochemore, G.Landau and M.Ziv-Ukelson, A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices (available here as PDF or Postscript).

For a general description of the algorithm, please refer to the specification of the abstract CrochemoreLandauZivUkelson superclass.

This class consist mainly of methods that:

  • create and compute all information of a block (see createBlock and its variants);
  • compute the output border of a block (see computeOutputBorder;
  • locate the score of a high scoring global alignment in the block table (see locateScore;
  • build an optimal global alignment from the information stored in the block table (see buildOptimalAlignment.

    This algorithm works essentially in the same way as the global alignment version. The main differences is that an aptimal path can either be contained entirely in a block (called C-path) or be a block-crossing path. A block-crossing path consists of a (possibly empty) S-path (a path that starts inside a block and ends in its output border), followed by any number of paths that cross a block from its input border to its output border, and ending in an E-path (a path that starts in the input border of a block and ends inside the block).

    Therefore, it is necessary to compute extra information to keep track of these possibilities. This is accomplished by using an instance of a LocalAlignmentBlock (which extends the AlignmentBlock class) for every block in the block table.

    See Also:
    CrochemoreLandauZivUkelson, CrochemoreLandauZivUkelsonLocalAlignment

    Field Summary
    protected  int max_col
              The column index of a block (in the block table) where the high scoring local alignment ends.
    protected  byte max_path_type
              The type of the high scoring local alignment found.
    protected  int max_row
              The row index of a block (in the block table) where the high scoring local alignment ends.
    protected  int max_score
              The score of the high scoring local alignment found.
    protected  int max_source_index
              If the high scoring local alignment ends in an E-path at a block B, this field contains the index of the entry in the input border of B that where the E-path starts.
    protected static byte TYPE_C_PATH
              A constant that indicates that the high scoring path ending in a given block is a C-path, i.e. one that starts inside the block.
    protected static byte TYPE_CROSSING_PATH
              A constant that indicates that the best path ending at a given entry of the output border is a block-crossing path (one that starts outside the block).
    protected static byte TYPE_E_PATH
              A constant that indicates that the high scoring path ending in a given block is an E-path, i.e. one that starts at its input border.
    protected static byte TYPE_S_PATH
              A constant that indicates that the best path ending at a given entry of the output border is a S-path (one that starts inside the block).
     
    Fields inherited from class org.jscience.biology.alignment.CrochemoreLandauZivUkelson
    block_table, DIAGONAL_DIRECTION, LEFT_DIRECTION, num_cols, num_rows, out_matrix, seq1, seq2, smawk, STOP_DIRECTION, TOP_DIRECTION
     
    Fields inherited from class org.jscience.biology.alignment.PairwiseAlignmentAlgorithm
    alignment, APPROXIMATE_MATCH_TAG, GAP_CHARACTER, GAP_TAG, MATCH_TAG, MISMATCH_TAG, score, score_computed, scoring, sequences_loaded, use_match_tag
     
    Constructor Summary
    CrochemoreLandauZivUkelsonLocalAlignment()
               
     
    Method Summary
    protected  PairwiseAlignment buildOptimalAlignment()
              Builds an optimal local alignment between the loaded sequences after the block table has been computed by tracing a path back in the block table.
    protected  void computeOutputBorder(LocalAlignmentBlock block, int row, int col, int dim, int lc, int lr)
              Computes the output border of a block.
    protected  AlignmentBlock createBlock(Factor factor1, Factor factor2, int row, int col)
              Creates and computes all information of an alignment block.
    protected  AlignmentBlock createFirstColumnBlock(Factor factor1, Factor factor2, int row)
              Creates and computes all information of an alignment block of the first column of the block table.
    protected  AlignmentBlock createFirstRowBlock(Factor factor1, Factor factor2, int col)
              Creates and computes all information of an alignment block of the first column of the block table.
    protected  AlignmentBlock createRootBlock(Factor factor1, Factor factor2)
              Creates the root block.
    protected  int locateScore()
              Returns the score of the high scoring local alignment in the block table.
    protected  void traverseBlockCrossingPath(LocalAlignmentBlock block, java.lang.StringBuffer gapped_seq1, java.lang.StringBuffer tag_line, java.lang.StringBuffer gapped_seq2)
              Traverses a series of block crossing paths to retrieve an optimal alignment.
    protected  void traverseS_Path(LocalAlignmentBlock block, java.lang.StringBuffer gapped_seq1, java.lang.StringBuffer tag_line, java.lang.StringBuffer gapped_seq2)
              Traverses an S-path of a block to retrieve a part of an optimal alignment from the new vertex of a block to entry in its input border.
     
    Methods inherited from class org.jscience.biology.alignment.CrochemoreLandauZivUkelson
    assembleDistMatrix, assembleInputBorder, computeBlockTable, computePairwiseAlignment, computeScore, getDiagonalPrefix, getLeftPrefix, getTopPrefix, loadSequencesInternal, traverseBlock, unloadSequencesInternal
     
    Methods inherited from class org.jscience.biology.alignment.PairwiseAlignmentAlgorithm
    getPairwiseAlignment, getScore, loadSequences, max, max, max, scoreDeletion, scoreInsertion, scoreSubstitution, setScoringScheme, unloadSequences, useMatchTag
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Field Detail

    TYPE_CROSSING_PATH

    protected static final byte TYPE_CROSSING_PATH
    A constant that indicates that the best path ending at a given entry of the output border is a block-crossing path (one that starts outside the block).

    See Also:
    Constant Field Values

    TYPE_S_PATH

    protected static final byte TYPE_S_PATH
    A constant that indicates that the best path ending at a given entry of the output border is a S-path (one that starts inside the block).

    See Also:
    Constant Field Values

    TYPE_C_PATH

    protected static final byte TYPE_C_PATH
    A constant that indicates that the high scoring path ending in a given block is a C-path, i.e. one that starts inside the block.

    See Also:
    Constant Field Values

    TYPE_E_PATH

    protected static final byte TYPE_E_PATH
    A constant that indicates that the high scoring path ending in a given block is an E-path, i.e. one that starts at its input border.

    See Also:
    Constant Field Values

    max_score

    protected int max_score
    The score of the high scoring local alignment found.


    max_row

    protected int max_row
    The row index of a block (in the block table) where the high scoring local alignment ends.


    max_col

    protected int max_col
    The column index of a block (in the block table) where the high scoring local alignment ends.


    max_path_type

    protected byte max_path_type
    The type of the high scoring local alignment found.


    max_source_index

    protected int max_source_index
    If the high scoring local alignment ends in an E-path at a block B, this field contains the index of the entry in the input border of B that where the E-path starts.

    Constructor Detail

    CrochemoreLandauZivUkelsonLocalAlignment

    public CrochemoreLandauZivUkelsonLocalAlignment()
    Method Detail

    createBlock

    protected AlignmentBlock createBlock(Factor factor1,
                                         Factor factor2,
                                         int row,
                                         int col)
                                  throws IncompatibleScoringSchemeException
    Creates and computes all information of an alignment block. This method works essentially in the same way as its global alignment counterpart. Its main job is to compute the DIST column for the block. It then request the computeOutputBorder method to compute the block's output border. It also computes all S, C and E-paths of this block. Finally, it checks if the C-path of this block is higher than the highest score found so far.

    Specified by:
    createBlock in class CrochemoreLandauZivUkelson
    Parameters:
    factor1 - factor of the first sequence
    factor2 - factor of the second sequence
    row - row index of the block in the block table
    col - column index of the block in the block table
    Returns:
    the computed block
    Throws:
    IncompatibleScoringSchemeException - if the scoring scheme is not compatible with the sequences being aligned

    createRootBlock

    protected AlignmentBlock createRootBlock(Factor factor1,
                                             Factor factor2)
    Creates the root block. This is a special case of the createBlock method. No information is actually computed.

    Specified by:
    createRootBlock in class CrochemoreLandauZivUkelson
    Parameters:
    factor1 - factor of the first sequence
    factor2 - factor of the second sequence
    Returns:
    the root block

    createFirstRowBlock

    protected AlignmentBlock createFirstRowBlock(Factor factor1,
                                                 Factor factor2,
                                                 int col)
                                          throws IncompatibleScoringSchemeException
    Creates and computes all information of an alignment block of the first column of the block table. This is a special case of the createBlock method.

    Specified by:
    createFirstRowBlock in class CrochemoreLandauZivUkelson
    Parameters:
    factor1 - factor of the first sequence
    factor2 - factor of the second sequence
    col - column index of the block in the block table
    Returns:
    the computed block
    Throws:
    IncompatibleScoringSchemeException - if the scoring scheme is not compatible with the sequences being aligned
    See Also:
    createBlock

    createFirstColumnBlock

    protected AlignmentBlock createFirstColumnBlock(Factor factor1,
                                                    Factor factor2,
                                                    int row)
                                             throws IncompatibleScoringSchemeException
    Creates and computes all information of an alignment block of the first column of the block table. This is a special case of the createBlock method.

    Specified by:
    createFirstColumnBlock in class CrochemoreLandauZivUkelson
    Parameters:
    factor1 - factor of the first sequence
    factor2 - factor of the second sequence
    row - row index of the block in the block table
    Returns:
    the computed block
    Throws:
    IncompatibleScoringSchemeException - if the scoring scheme is not compatible with the sequences being aligned
    See Also:
    createBlock

    computeOutputBorder

    protected void computeOutputBorder(LocalAlignmentBlock block,
                                       int row,
                                       int col,
                                       int dim,
                                       int lc,
                                       int lr)
    Computes the output border of a block. This method works essentially in the same way as its global alignment counterpart:

  • Retrieve the block's input border;
  • Retrieve the block's complete DIST matrix;
  • Create an interface to the OUT matrix from the input border and DIST matrix;
  • Use SMAWK to compute all column maxima of the OUT matrix (SMAWK finds the index of the row that contains the maximum value of a column);
  • Assemble the output border by extracting the maximum values of each column of the OUT matrix using the information obtained in the previous step.

    However, it also check if there is a better path starting inside the block (an S path) and oupdate the output border accordingly. It also checks if this block has any path of score higher than the maximum score found so far.

    Parameters:
    block - the block for which the output border is to be computed
    row - row index of the block in the block table
    col - column index of the block in the block table
    dim - dimension of the output border
    lc - number of columns of the block
    lr - number of row of the block

  • buildOptimalAlignment

    protected PairwiseAlignment buildOptimalAlignment()
                                               throws IncompatibleScoringSchemeException
    Builds an optimal local alignment between the loaded sequences after the block table has been computed by tracing a path back in the block table.

    Specified by:
    buildOptimalAlignment in class CrochemoreLandauZivUkelson
    Returns:
    an optimal global alignment
    Throws:
    IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.
    See Also:
    CrochemoreLandauZivUkelson.traverseBlock(org.jscience.biology.alignment.AlignmentBlock, int, java.lang.StringBuffer, java.lang.StringBuffer, java.lang.StringBuffer)

    traverseBlockCrossingPath

    protected void traverseBlockCrossingPath(LocalAlignmentBlock block,
                                             java.lang.StringBuffer gapped_seq1,
                                             java.lang.StringBuffer tag_line,
                                             java.lang.StringBuffer gapped_seq2)
                                      throws IncompatibleScoringSchemeException
    Traverses a series of block crossing paths to retrieve an optimal alignment. A block-crossing path consists of a (possibly empty) S-path (a path that starts inside a block and ends in its output border), followed by any number of paths that cross a block from its input border to its output border, and ending in an E-path (a path that starts in the input border of a block and ends inside the block).

    Parameters:
    block - the block to be traversed
    gapped_seq1 - the StringBuffer to where the gapped sequence 1 is written to
    tag_line - the StringBuffer to where the tag_line is written to
    gapped_seq2 - the StringBuffer to where the gapped sequence 2 is written to
    Throws:
    IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.

    traverseS_Path

    protected void traverseS_Path(LocalAlignmentBlock block,
                                  java.lang.StringBuffer gapped_seq1,
                                  java.lang.StringBuffer tag_line,
                                  java.lang.StringBuffer gapped_seq2)
                           throws IncompatibleScoringSchemeException
    Traverses an S-path of a block to retrieve a part of an optimal alignment from the new vertex of a block to entry in its input border. This method is essentially similar to the traverseBlock. The only difference is that it uses the information of the S_direction field of the LocalAlignmentBlock class.

    Parameters:
    block - the block to be traversed
    gapped_seq1 - the StringBuffer to where the gapped sequence 1 is written to
    tag_line - the StringBuffer to where the tag_line is written to
    gapped_seq2 - the StringBuffer to where the gapped sequence 2 is written to
    Throws:
    IncompatibleScoringSchemeException - If the scoring scheme is not compatible with the loaded sequences.

    locateScore

    protected int locateScore()
    Returns the score of the high scoring local alignment in the block table.

    Specified by:
    locateScore in class CrochemoreLandauZivUkelson
    Returns:
    the score of the highest scoring local alignment