org.jscience.biology.alignment
Class ScoringMatrix

java.lang.Object
  extended by org.jscience.biology.alignment.ScoringScheme
      extended by org.jscience.biology.alignment.ScoringMatrix

public class ScoringMatrix
extends ScoringScheme

This class implements a scoring scheme based on a substitution matrix. It is useful to represent PAM and BLOSUM family of amino acids scoring matrices. Its constructor loads such matrices from a file (or any other character stream). The following is an extract of a BLOSUM62 scoring matrix file:

 A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X  *
 A  4 -1 -2 -2  0 -1 -1  0 -2 -1 -1 -1 -1 -2 -1  1  0 -3 -2  0 -2 -1  0 -4
 R -1  5  0 -2 -3  1  0 -2  0 -3 -2  2 -1 -3 -2 -1 -1 -3 -2 -3 -1  0 -1 -4
 ...
 B -2 -1  3  4 -3  0  1 -1  0 -3 -4  0 -3 -3 -2  0 -1 -4 -3 -3  4  1 -1 -4
 Z -1  0  0  1 -3  3  4 -2  0 -3 -3  1 -1 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4
 X  0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2  0  0 -2 -1 -1 -1 -1 -1 -4
 * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1
 

Matrices are expected to follow this format. They must have one row an one column for each defined character (not necessarily in the same order). Each row and column must start with a distinct character (no repetition) and all row characters must have a correspondent column, and vice versa.

Value at position (i,j) represent the score of substituting character of row i for character of column j. Insertion penalties are specified by the last row while deletion penalties must be located at the last column (both represented by the special character defined by the INDEL_CHAR constant). Note that it only supports an additive gap cost function. In case any of this rules are not followed, an InvalidScoringMatrixException exception is raised by the constructor.

If a scoring operation (substitution, insertion or deletion) involves a character not found in the matrix, an exception is raised.

See Also:
InvalidScoringMatrixException

Field Summary
protected  java.lang.String col_codes
          Stores matrix column headers in the order they were found.
protected static char COMMENT_CHAR
          The character used to start a comment line in the scoring matrix file.
protected  int dimension
          Dimension of the (squared) matrix.
protected static char INDEL_CHAR
          The character that indicates the row and column for insertion and deletion penalties in the matrix.
protected  int[][] matrix
          Stores values for each operation (substitution, insertion or deletion) defined by this matrix.
protected  int max_absolute_score
          The maximum absolute score that this matrix can return for any substitution, deletion or insertion.
protected  java.lang.String row_codes
          Stores matrix row headers in the order they were found.
 
Fields inherited from class org.jscience.biology.alignment.ScoringScheme
case_sensitive
 
Constructor Summary
ScoringMatrix(java.io.Reader input)
          Creates a new instance of a substitution matrix loaded from the character stream.
ScoringMatrix(java.io.Reader input, boolean case_sensitive)
          Creates a new instance of a substitution matrix loaded from the character stream.
 
Method Summary
 boolean isPartialMatchSupported()
          Tells whether this scoring scheme supports partial matches, which it does, although a particular scoring matrix loaded by this instace might not.
 int maxAbsoluteScore()
          Returns the maximum absolute score that this scoring scheme can return for any substitution, deletion or insertion.
 int scoreDeletion(char a)
          Returns the score of a deletion of character a according to this scoring matrix.
 int scoreInsertion(char a)
          Returns the score of an insertion of character a according to this scoring matrix.
 int scoreSubstitution(char a, char b)
          Returns the score of a substitution of character a for character b according to this scoring matrix.
 java.lang.String toString()
          Returns a String representation of this scoring matrix.
 
Methods inherited from class org.jscience.biology.alignment.ScoringScheme
isCaseSensitive
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

INDEL_CHAR

protected static final char INDEL_CHAR
The character that indicates the row and column for insertion and deletion penalties in the matrix.

See Also:
Constant Field Values

COMMENT_CHAR

protected static final char COMMENT_CHAR
The character used to start a comment line in the scoring matrix file.

See Also:
Constant Field Values

col_codes

protected java.lang.String col_codes
Stores matrix column headers in the order they were found.


row_codes

protected java.lang.String row_codes
Stores matrix row headers in the order they were found.


matrix

protected int[][] matrix
Stores values for each operation (substitution, insertion or deletion) defined by this matrix.


dimension

protected int dimension
Dimension of the (squared) matrix.


max_absolute_score

protected int max_absolute_score
The maximum absolute score that this matrix can return for any substitution, deletion or insertion.

Constructor Detail

ScoringMatrix

public ScoringMatrix(java.io.Reader input)
              throws java.io.IOException,
                     InvalidScoringMatrixException
Creates a new instance of a substitution matrix loaded from the character stream. The case of characters is significant when subsequently computing their score.

Parameters:
input - character stream from where the matrix is read
Throws:
java.io.IOException - if an I/O operation fails when reading from input
InvalidScoringMatrixException - if the matrix does not comply with the specification

ScoringMatrix

public ScoringMatrix(java.io.Reader input,
                     boolean case_sensitive)
              throws java.io.IOException,
                     InvalidScoringMatrixException
Creates a new instance of a substitution matrix loaded from the character stream. If case_sensitive is true, the case of characters is significant when subsequently computing their score; otherwise the case is ignored.

Parameters:
input - character stream from where the matrix is read
case_sensitive - true if the case of characters must be
Throws:
java.io.IOException - if an I/O operation fails when reading from input
InvalidScoringMatrixException - if the matrix does not comply with the specification
Method Detail

scoreSubstitution

public int scoreSubstitution(char a,
                             char b)
                      throws IncompatibleScoringSchemeException
Returns the score of a substitution of character a for character b according to this scoring matrix.

Specified by:
scoreSubstitution in class ScoringScheme
Parameters:
a - first character
b - second character
Returns:
score of a substitution of character a for b
Throws:
IncompatibleScoringSchemeException - if this substitution is not defined

scoreInsertion

public int scoreInsertion(char a)
                   throws IncompatibleScoringSchemeException
Returns the score of an insertion of character a according to this scoring matrix.

Specified by:
scoreInsertion in class ScoringScheme
Parameters:
a - character to be inserted
Returns:
score of insertion of a
Throws:
IncompatibleScoringSchemeException - if this character is not recognised

scoreDeletion

public int scoreDeletion(char a)
                  throws IncompatibleScoringSchemeException
Returns the score of a deletion of character a according to this scoring matrix.

Specified by:
scoreDeletion in class ScoringScheme
Parameters:
a - character to be deleted
Returns:
score of deletion of a
Throws:
IncompatibleScoringSchemeException - if this character is not recognised

isPartialMatchSupported

public boolean isPartialMatchSupported()
Tells whether this scoring scheme supports partial matches, which it does, although a particular scoring matrix loaded by this instace might not. A partial match is a situation when two characters are not equal but, for any reason, are regarded as similar by this scoring scheme, which then returns a positive score value. This is common for amino acid scoring matrices.

Specified by:
isPartialMatchSupported in class ScoringScheme
Returns:
always return true

maxAbsoluteScore

public int maxAbsoluteScore()
Returns the maximum absolute score that this scoring scheme can return for any substitution, deletion or insertion.

Specified by:
maxAbsoluteScore in class ScoringScheme
Returns:
maximum absolute score that can be returned

toString

public java.lang.String toString()
Returns a String representation of this scoring matrix.

Overrides:
toString in class java.lang.Object
Returns:
a String representation of this scoring matrix