Trimmer¶
Base Trimmer¶
- class pytrimal.BaseTrimmer¶
A sequence alignment trimmer.
All subclasses provide the same
trim
method, and are configured through their constructor.- __init__(*, backend='detect')¶
Create a new base trimmer.
- Keyword Arguments:
backend (
str
, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. IfNone
given, use the original code from trimAl.
New in version 0.2.0: The
backend
keyword argument.
- trim(alignment, matrix=None)¶
Trim the provided alignment.
- Parameters:
alignment (
Alignment
) – A multiple sequence alignment to trim.matrix (
SimilarityMatrix
, optional) – An alternative similarity matrix to use for computing the similarity statistic. IfNone
, a default matrix will be used based on the type of the alignment.
- Returns:
TrimmedAlignment
– The trimmed alignment.
Hint
This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.
Changed in version 0.1.2: Added the
matrix
optional argument.
Automatic Trimmer¶
- class pytrimal.AutomaticTrimmer(BaseTrimmer)¶
A sequence alignment trimmer with automatic parameter detection.
trimAl provides several heuristic methods for automated trimming of multiple sequence algorithms:
strict
: A statistical method that combines gaps and similarity statistics to clean the alignment.strictplus
: A statistical method that combines gaps and similarity statistics, optimized for Neighbour-Joining tree reconstruction.gappyout
: A statistical method that only uses gaps statistic to clean the alignment.automated1
: A meta-method that chooses betweenstrict
andgappyout
, optimized for Maximum Likelihood phylogenetic tree reconstruction.nogaps
: A naive method that removes every column containing at least one gap.noallgaps
: A naive method that removes every column containing only gaps.noduplicateseqs
: A naive method that removes sequences that are equal on the alignment, keeping the latest occurence.
Hint
A Python
frozenset
containing all valid automatic trimming methods can be obtained with theAutomaticTrimmer.METHODS
attribute. This can be useful for listing or validating methods beforehand, e.g. to build a CLI withargparse
.New in version 0.4.0: The
AutomaticTrimmer.METHODS
class attribute.New in version 0.5.0: Support for
pickle
protocol.- __init__(method='strict', *, backend='detect')¶
Create a new automatic alignment trimmer using the given method.
- Parameters:
method (
str
) – The automatic aligment trimming method. See the documentation forAutomaticTrimmer
for a list of supported values.- Keyword Arguments:
backend (
str
, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. IfNone
given, use the original code from trimAl.- Raises:
ValueError – When
method
is not one of the automatic alignment trimming methods supported by trimAl.
New in version 0.2.0: The
backend
keyword argument.New in version 0.4.0: The
noduplicateseqs
method.
- trim(alignment, matrix=None)¶
Trim the provided alignment.
- Parameters:
alignment (
Alignment
) – A multiple sequence alignment to trim.matrix (
SimilarityMatrix
, optional) – An alternative similarity matrix to use for computing the similarity statistic. IfNone
, a default matrix will be used based on the type of the alignment.
- Returns:
TrimmedAlignment
– The trimmed alignment.
Hint
This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.
Changed in version 0.1.2: Added the
matrix
optional argument.
Manual Trimmer¶
- class pytrimal.ManualTrimmer(BaseTrimmer)¶
A sequence alignment trimmer with manually defined thresholds.
Manual trimming allows the user to specify independent thresholds for two different statistics:
Gap threshold: Remove columns where the gap ratio (or the absolute gap count) is higher than the provided threshold.
Similarity threshold: Remove columns with a similarity ratio lower than the provided threshold.
In addition, the trimming can be restricted so that at least a configurable fraction of the original alignment is retained, in order to avoid stripping an alignment of distance sequences by aggressive trimming.
- __init__(*, gap_threshold=None, gap_absolute_threshold=None, similarity_threshold=None, conservation_percentage=None, window=None, gap_window=None, similarity_window=None, backend='detect')¶
Create a new manual alignment trimmer with the given parameters.
- Keyword Arguments:
gap_threshold (
float
, optional) – The minimum fraction of non-gap characters that must be present in a column to keep the column.gap_absolute_threshold (
int
, optional) – The absolute number of gaps allowed on a column to keep it in the alignment. Incompatible withgap_threshold
.similarity_threshold (
float
, optional) – The minimum average similarity required.conservation_percentage (
float
, optional) – The minimum percentage of positions in the original alignment to conserve.window (
int
, optional) – The size of the half-window to use when computing statistics for an alignment.gap_window (
int
, optional) – The size of the half-window to use when computing the gap statistic for an alignment. Incompatible withwindow
.similarity_window (
int
, optional) – The size of the half-window to use when computing the similarity statistic for an alignment. Incompatible withwindow
.backend (
str
, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. IfNone
given, use the original code from trimAl.
New in version 0.2.0: The
backend
keyword argument.New in version 0.2.2: The keyword arguments for controling the half-window sizes.
Changed in version 0.4.0: Removed
consistency_threshold
andconsistency_window
.
- trim(alignment, matrix=None)¶
Trim the provided alignment.
- Parameters:
alignment (
Alignment
) – A multiple sequence alignment to trim.matrix (
SimilarityMatrix
, optional) – An alternative similarity matrix to use for computing the similarity statistic. IfNone
, a default matrix will be used based on the type of the alignment.
- Returns:
TrimmedAlignment
– The trimmed alignment.
Hint
This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.
Changed in version 0.1.2: Added the
matrix
optional argument.
Representative Trimmer¶
- class pytrimal.RepresentativeTrimmer(BaseTrimmer)¶
A sequence alignment trimmer for selecting representative sequences.
Representative sequences on an alignment can be selected using a specific identity threshold, or a fixed number of representative sequences to keep. Representative trimming can be useful to reduce the weight of certain very similar sequences in an alignment, for instance to build a less conservative HMM.
New in version 0.5.0.
- __init__(clusters=None, identity_threshold=None, *, backend='detect')¶
Create a new representative alignment trimmer.
- Parameters:
clusters (
int
, optional) – The number of cluster representatives to keep in the trimmed alignment. Must be strictly positive. If the trimmer receives an alignment with less sequences than this, it will not perform any trimming.identity_threshold (
float
, optional) – The identity threshold for which to get representative sequences.
- Keyword Arguments:
backend (
str
, optional) – The SIMD extension backend to use to accelerate computation of pairwise similarity statistics. IfNone
given, use the original code from trimAl.- Raises:
ValueError – When both
clusters
andidentity_threshold
are provided at the same time, or when they don’t fall in a valid range.
- trim(alignment, matrix=None)¶
Trim the provided alignment.
- Parameters:
alignment (
Alignment
) – A multiple sequence alignment to trim.matrix (
SimilarityMatrix
, optional) – An alternative similarity matrix to use for computing the similarity statistic. IfNone
, a default matrix will be used based on the type of the alignment.
- Returns:
TrimmedAlignment
– The trimmed alignment.
Hint
This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.
Changed in version 0.1.2: Added the
matrix
optional argument.
Overlap Trimmer¶
- class pytrimal.OverlapTrimmer(BaseTrimmer)¶
A sequence alignment trimmer for overlap blocks.
Overlap trimming works by defining “good” positions, i.e. a position where most sequences agree (given a certain threshold) that the alignment contains a gap or a residue (independently of their agreement on that given residue). Sequences not containing enough “good” positions are then removed.
Example
Consider the following alignment, where the last three sequences align decently, while the first sequence doesn’t. In particular, it creates a large gap in the rest of the alignment:
>>> ali = Alignment( ... names=[b"Sp8", b"Sp17", b"Sp10", b"Sp26"], ... sequences=[ ... "LG-----------TKSD---NNNNNNNNNNNNNNNNWV----------", ... "APDLLL-IGFLLKTV-ATFG-----------------DTWFQLWQGLD", ... "DPAVL--FVIMLGTI-TKFS-----------------SEWFFAWLGLE", ... "AAALLTYLGLFLGTDYENFA-----------------AAAANAWLGLE", ... ] ... )
Let’s create an overlap trimmer so that “good” positions correspond to an agreement between at least half of the sequences, and make it remove sequences with less than 40% of good positions:
>>> trimmer = OverlapTrimmer(40.0, 0.5)
Trimming will remove the first sequence because it doesn’t contain enough good positions; then, the block containing only gaps will be removed (this is the default behaviour of all trimmer objects):
>>> trimmed = trimmer.trim(ali) >>> for name, seq in zip(trimmed.names, trimmed.sequences): ... print(name.decode().ljust(5), seq) Sp17 APDLLL-IGFLLKTV-ATFGDTWFQLWQGLD Sp10 DPAVL--FVIMLGTI-TKFSSEWFFAWLGLE Sp26 AAALLTYLGLFLGTDYENFAAAAANAWLGLE
New in version 0.4.0.
New in version 0.5.0: Support for the
pickle
protocol.- __init__(sequence_overlap, residue_overlap, *, backend='detect')¶
Create a new overlap trimmer with the given thresholds.
- trim(alignment, matrix=None)¶
Trim the provided alignment.
- Parameters:
alignment (
Alignment
) – A multiple sequence alignment to trim.matrix (
SimilarityMatrix
, optional) – An alternative similarity matrix to use for computing the similarity statistic. IfNone
, a default matrix will be used based on the type of the alignment.
- Returns:
TrimmedAlignment
– The trimmed alignment.
Hint
This method is re-entrant, and can be called safely accross different threads. Most of the computations will be done after releasing the GIL.
Changed in version 0.1.2: Added the
matrix
optional argument.