Basic examples from the trimAl documentation¶
This example shows how to run the basic methods shown in the trimAl manual, but using the pytrimal
API.
[1]:
import pytrimal
pytrimal.__version__
[1]:
'0.8.0-alpha1'
For this example, we will use an alignment used to generate HMMs for Pfam (namely PF12574, a family of Rickettsia surface antigen). We use Alignment.load
to load the alignment from a filename; note that os.PathLike
objects are supported as well.
[2]:
import pathlib
ali = pytrimal.Alignment.load(pathlib.Path("data").joinpath("PF12574.full.afa"))
To easily compare the alignments, let’s use rich-msa, a Python package for displaying multiple sequence alignments with rich.
[3]:
import rich.console
import rich.panel
from rich_msa import RichAlignment
def show_alignment(alignment):
console = rich.console.Console(width=len(alignment.sequences[0])+40)
widget = RichAlignment(names=[n.decode() for n in alignment.names], sequences=alignment.sequences, max_name_width=30)
panel = rich.panel.Panel(widget, title_align="left", title="({} residues, {} sequences)".format(len(alignment.sequences[0]), len(alignment.sequences)))
console.print(panel)
Let’s see how the original alignment looks before trimming:
[4]:
show_alignment(ali)
╭─ (316 residues, 8 sequences) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ A0A261DC17_9RICK/184-418 1 -------------------------aalvnksia------KP--EELDDLNKFRAYF----ENEQ---NKETISGLLkEDQNLKHALEQV---EIAGYKNVH-TQFA-------------GRFSTMEWKDGgV--eNA--NgiTIKKQIVRDANGHEIATLSEANHQINpPHTVQKSDGTSVAISNYRTIDFPIKLDN-NGPMHLSLAVKDQYGKNIAASNAVYFTAHYDDA----G--KLIEVSSPHPVKFTGNSPDAVGYIEHGGKIYTLPVTQEKYRSMMQEVAKNLGQGVNISP------siesi------- │ │ A0A0F3QKY5_RICAM/113-350 1 ----------------------------------LAEQKRKEIEEEKEKDKTLSTFF----GNPA---NREFIDKAL-ENPELKKKLESI---EIAGYKNVH-NTFSA---AS----GYPGGFKPVQWENQ-V---SA--N--DLRATVVKNDAGDELCTLNETTVKTK-PFTVAKQDGTQVQISSYREIDFPIKLDKADGSMHLSMVALKADGTKPSKDKAVYFTAHYEEG--PNGKPQLKEISSPKPLKFAGTGDDAIAYIEHGGEIYTLAVTRGKYKEMMKEVELNQGQSVDLSQ--AEDI------------ │ │ SCA4_RICPR/103-337 1 ----------------------------------LAEQIAKE-----EDDRKFRAFL----SNQD---NYALINKAF-EDTKTKKNLEKA---EIVGYKNVL-STYSV---AN----GYQGGFQPVQWENQ-V---SA--S--DLRSTVVKNDEGEELCTLNETTVKTK-DLIVAKQDGTQVQINSYREINFPIKLDKANGSMHLSMVALKADGTKPAKDKAVYFTAHYEEG--PNGKPQLKEISSPQPLKFVGTGDDAVAYIEHGGEIYTLAVTRGKYKEMMKEVALNHGQSVALSQtiAEDL------------ │ │ Q1RIG4_RICBR/102-183 1 -------------------------lkenaenpd--------LKSNFESDEKFRDFLrtlnEDPN---KKELYDKAL-ENPELKKGLENI---EIAGYKNVH-ASHSA---EV----YH-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------enkikeeqeel- │ │ A0A261DCJ7_9RICK/513-746 1 -------------------eanitaisvtnlain---------------SQQFKEFV----EN-----NNELIQKTW-ASSITKSTVTQAmndEVQSYAKI----------------HHANNFKPLTWSNQ-TpidNT--T--DTRSRVIKVK-DEELFKLTETKVKTS-TKVILEDGVTETEISNYRNINLPLAIKPSGTAVHLSFPVQNEKGENIEVSKALYFTTHYNEQ------GKLVEITNPLSLKFTSDDKNAMGYIQRGKHIYTIPVTKGQYEAMLQEVAKNQGCNINISQ--T---ipseasdl---- │ │ A0A0F3MK43_9RICK/402-635 1 sliledesfkkwqeqnqnktlddfkhsaaqmdnl---------------------------------------------TPETKELISKL---GAAGYANIL-GSGANieqAQe--mSFAASFCTLDWATQ-S---NAvgN--TTRKT-ITNEAGEKVVDLVSHSHSV---QLSASVNGATKTITKCRTIDIPSTVEK--GPLDLALVAQDSTGKNMPESKAVYFTVHYDQ----DG--KIVEMTHPEPLKFFSETPSSPAYTVINGEIFTLPITKEKYEQLHKEISQNME-------------eqyakdhqlaad │ │ Q1RIG4_RICBR/189-417 1 ---------------------------nndpays----------EEAKDQEKFRQFL----ANLNageRQGLYDKAL-SDEQFKGQ-----------YENIR-QEY-----AN----KYVGGFRSMQWENQ-V---SA--G--DLRSTVIKNDAGEEICTLAEKTHKTA-PMTVYKQDGTAVTVNSYRTIDFPIDLEGKSGTMHLSLVAQNKEGK---SNNALRFTAHYEADphPDGTPKLKEVSSPQPIKFMGKDENAVGYIEHGGEIYTLPVTRGKYEAMMKEVAVNKGQGVDVSQ--T---ieqdi------- │ │ A5CDE1_ORITB/319-550 1 --aeddkfkvwqqqnpskslddfkrdttqidsls----------------------------------------------QETKELLSGI---GSAGYANIMgSTANI---EQaqqmSFAASFSTLDWATH-A--nSV--G--NTTQKTITNDAGEKVTDLISHSHKTQ---LSASVNGVTKIVTKHRTIDIPRAVEENKGPLDLALVAQDTTGKNMPESKAVYLTAHYNQ------EGKLVEMTHPEPLRFFSDEPGSPAYTVINNEVYTLPITREKYDQLTKEISQNI--------------qeqdkdkereqe │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example 1¶
Remove all positions in the alignment with gaps in 10% or more of the sequences, unless this leaves less than 60% of original alignment. In such case, print the 60% best (with less gaps) positions. Equivalent to:
$ trimal -in data/example.001.AA.clw -gt 0.9 -cons 60
[5]:
trimmer = pytrimal.ManualTrimmer(gap_threshold=0.9, conservation_percentage=60)
trimmed = trimmer.trim(ali)
show_alignment(trimmed)
╭─ (190 residues, 8 sequences) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ A0A261DC17_9RICK/184-418 1 KETISGLLEDQNLKHALEQVEIAGYKNVHTQFA-----GRFSTMEWKDGVNANTIKKQIVRDANGHEIATLSEANHQINHTVQKSDGTSVAISNYRTIDFPIKLDNNGPMHLSLAVKDQYGKNIAASNAVYFTAHYDDKLIEVSSPHPVKFTGNSPDAVGYIEHGGKIYTLPVTQEKYRSMMQEVAKNLG │ │ A0A0F3QKY5_RICAM/113-350 1 REFIDKALENPELKKKLESIEIAGYKNVHNTFSASGYPGGFKPVQWENQVSANDLRATVVKNDAGDELCTLNETTVKTKFTVAKQDGTQVQISSYREIDFPIKLDKDGSMHLSMVALKADGTKPSKDKAVYFTAHYEEQLKEISSPKPLKFAGTGDDAIAYIEHGGEIYTLAVTRGKYKEMMKEVELNQG │ │ SCA4_RICPR/103-337 1 YALINKAFEDTKTKKNLEKAEIVGYKNVLSTYSANGYQGGFQPVQWENQVSASDLRSTVVKNDEGEELCTLNETTVKTKLIVAKQDGTQVQINSYREINFPIKLDKNGSMHLSMVALKADGTKPAKDKAVYFTAHYEEQLKEISSPQPLKFVGTGDDAVAYIEHGGEIYTLAVTRGKYKEMMKEVALNHG │ │ Q1RIG4_RICBR/102-183 1 KELYDKALENPELKKGLENIEIAGYKNVHASHSEVYH--------------------------------------------------------------------------------------------------------------------------------------------------------- │ │ A0A261DCJ7_9RICK/513-746 1 NELIQKTWASSITKSTVTQAEVQSYAKI-------HHANNFKPLTWSNQTNTTDTRSRVIKVK-DEELFKLTETKVKTSKVILEDGVTETEISNYRNINLPLAIKPGTAVHLSFPVQNEKGENIEVSKALYFTTHYNEKLVEITNPLSLKFTSDDKNAMGYIQRGKHIYTIPVTKGQYEAMLQEVAKNQG │ │ A0A0F3MK43_9RICK/402-635 1 ---------TPETKELISKLGAAGYANILGSGAAQSFAASFCTLDWATQSNANTTRKT-ITNEAGEKVVDLVSHSHSV-QLSASVNGATKTITKCRTIDIPSTVEK-GPLDLALVAQDSTGKNMPESKAVYFTVHYDQKIVEMTHPEPLKFFSETPSSPAYTVINGEIFTLPITKEKYEQLHKEISQNME │ │ Q1RIG4_RICBR/189-417 1 QGLYDKALSDEQFKGQ--------YENIRQEY-ANKYVGGFRSMQWENQVSAGDLRSTVIKNDAGEEICTLAEKTHKTAMTVYKQDGTAVTVNSYRTIDFPIDLEGSGTMHLSLVAQNKEGK---SNNALRFTAHYEAKLKEVSSPQPIKFMGKDENAVGYIEHGGEIYTLPVTRGKYEAMMKEVAVNKG │ │ A5CDE1_ORITB/319-550 1 ----------QETKELLSGIGSAGYANIMSTANEQSFAASFSTLDWATHASVGNTTQKTITNDAGEKVTDLISHSHKTQ-LSASVNGVTKIVTKHRTIDIPRAVEEKGPLDLALVAQDTTGKNMPESKAVYLTAHYNQKLVEMTHPEPLRFFSDEPGSPAYTVINNEVYTLPITREKYDQLTKEISQNI- │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example 2¶
Same as Example 1, but the gap score is averaged over a window starting 3 positions before and ending 3 positions after each column.
[6]:
trimmer = pytrimal.ManualTrimmer(gap_threshold=0.9, conservation_percentage=60, window=3)
trimmed = trimmer.trim(ali)
show_alignment(trimmed)
╭─ (190 residues, 8 sequences) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ A0A261DC17_9RICK/184-418 1 vnkNKFRTISLkEDQNLKHALEIAGYKNVH-TQ--GRFSTMEWKDTIKKQIVRDANGHEIATLSEANHQINpPHTVQKSDGTSVAISNYRTIDFPIKLDN-NGPMHLSLAVKDQYGKNIAASNAVYFTAHYDD-KLIEVSSPHPVKFTGNSPDAVGYIEHGGKIYTLPVTQEKYRSMMQEVAKNLGQGVN │ │ A0A0F3QKY5_RICAM/113-350 1 ---KTLSFIDL-ENPELKKKLEIAGYKNVH-NTYPGGFKPVQWENDLRATVVKNDAGDELCTLNETTVKTK-PFTVAKQDGTQVQISSYREIDFPIKLDKADGSMHLSMVALKADGTKPSKDKAVYFTAHYEEPQLKEISSPKPLKFAGTGDDAIAYIEHGGEIYTLAVTRGKYKEMMKEVELNQGQSVD │ │ SCA4_RICPR/103-337 1 ---RKFRLINF-EDTKTKKNLEIVGYKNVL-STYQGGFQPVQWENDLRSTVVKNDEGEELCTLNETTVKTK-DLIVAKQDGTQVQINSYREINFPIKLDKANGSMHLSMVALKADGTKPAKDKAVYFTAHYEEPQLKEISSPQPLKFVGTGDDAVAYIEHGGEIYTLAVTRGKYKEMMKEVALNHGQSVA │ │ Q1RIG4_RICBR/102-183 1 naeEKFRLYDL-ENPELKKGLEIAGYKNVH-ASH------------------------------------------------------------------------------------------------------------------------------------------------------------ │ │ A0A261DCJ7_9RICK/513-746 1 tnlQQFKLIQW-ASSITKSTVTVQSYAKI----HANNFKPLTWSNDTRSRVIKVK-DEELFKLTETKVKTS-TKVILEDGVTETEISNYRNINLPLAIKPSGTAVHLSFPVQNEKGENIEVSKALYFTTHYNEGKLVEITNPLSLKFTSDDKNAMGYIQRGKHIYTIPVTKGQYEAMLQEVAKNQGCNIN │ │ A0A0F3MK43_9RICK/402-635 1 aqm----------TPETKELISAAGYANIL-GSFAASFCTLDWATTTRKT-ITNEAGEKVVDLVSHSHSV---QLSASVNGATKTITKCRTIDIPSTVEK--GPLDLALVAQDSTGKNMPESKAVYFTVHYDQ-KIVEMTHPEPLKFFSETPSSPAYTVINGEIFTLPITKEKYEQLHKEISQNME---- │ │ Q1RIG4_RICBR/189-417 1 ndpEKFRLYDL-SDEQFKGQ-----YENIR-QEYVGGFRSMQWENDLRSTVIKNDAGEEICTLAEKTHKTA-PMTVYKQDGTAVTVNSYRTIDFPIDLEGKSGTMHLSLVAQNKEGK---SNNALRFTAHYEAPKLKEVSSPQPIKFMGKDENAVGYIEHGGEIYTLPVTRGKYEAMMKEVAVNKGQGVD │ │ A5CDE1_ORITB/319-550 1 qid-----------QETKELLSSAGYANIMgSTFAASFSTLDWATNTTQKTITNDAGEKVTDLISHSHKTQ---LSASVNGVTKIVTKHRTIDIPRAVEENKGPLDLALVAQDTTGKNMPESKAVYLTAHYNQGKLVEMTHPEPLRFFSDEPGSPAYTVINNEVYTLPITREKYDQLTKEISQNI----- │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example 3¶
Use an automatic method to decide optimal thresholds, based on the gap scores the input alignment. Equivalent to:
$ trimal -in data/example.001.AA.clw -gappyout
[7]:
trimmer = pytrimal.AutomaticTrimmer(method="gappyout")
trimmed = trimmer.trim(ali)
show_alignment(trimmed)
╭─ (213 residues, 8 sequences) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ A0A261DC17_9RICK/184-418 1 lvnksiaLNKFRAYFENNKETISGLLEDQNLKHALEQVEIAGYKNVHTQFA-----GRFSTMEWKDGVNANTIKKQIVRDANGHEIATLSEANHQINHTVQKSDGTSVAISNYRTIDFPIKLDNNGPMHLSLAVKDQYGKNIAASNAVYFTAHYDDKLIEVSSPHPVKFTGNSPDAVGYIEHGGKIYTLPVTQEKYRSMMQEVAKNLGsiesi │ │ A0A0F3QKY5_RICAM/113-350 1 -------DKTLSTFFGNNREFIDKALENPELKKKLESIEIAGYKNVHNTFSASGYPGGFKPVQWENQVSANDLRATVVKNDAGDELCTLNETTVKTKFTVAKQDGTQVQISSYREIDFPIKLDKDGSMHLSMVALKADGTKPSKDKAVYFTAHYEEQLKEISSPKPLKFAGTGDDAIAYIEHGGEIYTLAVTRGKYKEMMKEVELNQG----- │ │ SCA4_RICPR/103-337 1 -------DRKFRAFLSNNYALINKAFEDTKTKKNLEKAEIVGYKNVLSTYSANGYQGGFQPVQWENQVSASDLRSTVVKNDEGEELCTLNETTVKTKLIVAKQDGTQVQINSYREINFPIKLDKNGSMHLSMVALKADGTKPAKDKAVYFTAHYEEQLKEISSPQPLKFVGTGDDAVAYIEHGGEIYTLAVTRGKYKEMMKEVALNHG----- │ │ Q1RIG4_RICBR/102-183 1 enaenpdDEKFRDFLEDKKELYDKALENPELKKGLENIEIAGYKNVHASHSEVYH---------------------------------------------------------------------------------------------------------------------------------------------------------enkik │ │ A0A261DCJ7_9RICK/513-746 1 vtnlainSQQFKEFVENNNELIQKTWASSITKSTVTQAEVQSYAKI-------HHANNFKPLTWSNQTNTTDTRSRVIKVK-DEELFKLTETKVKTSKVILEDGVTETEISNYRNINLPLAIKPGTAVHLSFPVQNEKGENIEVSKALYFTTHYNEKLVEITNPLSLKFTSDDKNAMGYIQRGKHIYTIPVTKGQYEAMLQEVAKNQGipsea │ │ A0A0F3MK43_9RICK/402-635 1 aaqmdnl--------------------TPETKELISKLGAAGYANILGSGAAQSFAASFCTLDWATQSNANTTRKT-ITNEAGEKVVDLVSHSHSV-QLSASVNGATKTITKCRTIDIPSTVEK-GPLDLALVAQDSTGKNMPESKAVYFTVHYDQKIVEMTHPEPLKFFSETPSSPAYTVINGEIFTLPITKEKYEQLHKEISQNMEeqyak │ │ Q1RIG4_RICBR/189-417 1 nndpaysQEKFRQFLANRQGLYDKALSDEQFKGQ--------YENIRQEY-ANKYVGGFRSMQWENQVSAGDLRSTVIKNDAGEEICTLAEKTHKTAMTVYKQDGTAVTVNSYRTIDFPIDLEGSGTMHLSLVAQNKEGK---SNNALRFTAHYEAKLKEVSSPQPIKFMGKDENAVGYIEHGGEIYTLPVTRGKYEAMMKEVAVNKGieqdi │ │ A5CDE1_ORITB/319-550 1 tqidsls---------------------QETKELLSGIGSAGYANIMSTANEQSFAASFSTLDWATHASVGNTTQKTITNDAGEKVTDLISHSHKTQ-LSASVNGVTKIVTKHRTIDIPRAVEEKGPLDLALVAQDTTGKNMPESKAVYLTAHYNQKLVEMTHPEPLRFFSDEPGSPAYTVINNEVYTLPITREKYDQLTKEISQNI-qeqdk │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example 4¶
Use automatic methods to decide optimal thresholds, based on the combination of gap and similarity scores. Equivalent to:
$ trimal -in data/example.001.AA.clw -strictplus
[8]:
trimmer = pytrimal.AutomaticTrimmer(method="strictplus")
trimmed = trimmer.trim(ali)
show_alignment(trimmed)
╭─ (177 residues, 8 sequences) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ A0A261DC17_9RICK/184-418 1 NKFRAYFNKETISGLLkEDQNLKHALEQVEIAGYKNVFSTMEWKDGgVNA-iTIKKQIVRDANGHEIATLSENHQINTVQKSDGTSVAISNYRTIDFPIKLDGPMHLSLAVKDQYGKNSNAVYFTAHYDDKLIEVSSPHPVKFTGDAVGYIEHGGKIYTLPVTQEKYRSMMQEVAKN │ │ A0A0F3QKY5_RICAM/113-350 1 KTLSTFFNREFIDKAL-ENPELKKKLESIEIAGYKNVFKPVQWENQ-VSA--DLRATVVKNDAGDELCTLNETVKTKTVAKQDGTQVQISSYREIDFPIKLDGSMHLSMVALKADGTKDKAVYFTAHYEEQLKEISSPKPLKFAGDAIAYIEHGGEIYTLAVTRGKYKEMMKEVELN │ │ SCA4_RICPR/103-337 1 RKFRAFLNYALINKAF-EDTKTKKNLEKAEIVGYKNVFQPVQWENQ-VSA--DLRSTVVKNDEGEELCTLNETVKTKIVAKQDGTQVQINSYREINFPIKLDGSMHLSMVALKADGTKDKAVYFTAHYEEQLKEISSPQPLKFVGDAVAYIEHGGEIYTLAVTRGKYKEMMKEVALN │ │ Q1RIG4_RICBR/102-183 1 EKFRDFLKKELYDKAL-ENPELKKGLENIEIAGYKNV-------------------------------------------------------------------------------------------------------------------------------------------- │ │ A0A261DCJ7_9RICK/513-746 1 QQFKEFVNNELIQKTW-ASSITKSTVTQAEVQSYAKIFKPLTWSNQ-TNT--DTRSRVIKVK-DEELFKLTEKVKTSVILEDGVTETEISNYRNINLPLAIKTAVHLSFPVQNEKGENSKALYFTTHYNEKLVEITNPLSLKFTSNAMGYIQRGKHIYTIPVTKGQYEAMLQEVAKN │ │ A0A0F3MK43_9RICK/402-635 1 ------------------TPETKELISKLGAAGYANIFCTLDWATQ-SNAv-TTRKT-ITNEAGEKVVDLVSSHSV-LSASVNGATKTITKCRTIDIPSTVEGPLDLALVAQDSTGKNSKAVYFTVHYDQKIVEMTHPEPLKFFSSSPAYTVINGEIFTLPITKEKYEQLHKEISQN │ │ Q1RIG4_RICBR/189-417 1 EKFRQFLRQGLYDKAL-SDEQFKGQ--------YENIFRSMQWENQ-VSA--DLRSTVIKNDAGEEICTLAETHKTATVYKQDGTAVTVNSYRTIDFPIDLEGTMHLSLVAQNKEGK-NNALRFTAHYEAKLKEVSSPQPIKFMGNAVGYIEHGGEIYTLPVTRGKYEAMMKEVAVN │ │ A5CDE1_ORITB/319-550 1 -------------------QETKELLSGIGSAGYANIFSTLDWATH-ASV--NTTQKTITNDAGEKVTDLISSHKTQLSASVNGVTKIVTKHRTIDIPRAVEGPLDLALVAQDTTGKNSKAVYLTAHYNQKLVEMTHPEPLRFFSGSPAYTVINNEVYTLPITREKYDQLTKEISQN │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example 5¶
Use an heuristic to decide the optimal method for trimming the alignment. Equivalent to:
$ trimal -in data/example.001.AA.clw -automated1
[9]:
trimmer = pytrimal.AutomaticTrimmer(method="automated1")
trimmed = trimmer.trim(ali)
show_alignment(trimmed)
╭─ (174 residues, 8 sequences) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ A0A261DC17_9RICK/184-418 1 NKFRAYFNKETISGLLkEDQNLKHALEQVEIAGYKNVFSTMEWKDGgViTIKKQIVRDANGHEIATLSENHQINTVQKSDGTSVAISNYRTIDFPIKLDGPMHLSLAVKDQYGKNSNAVYFTAHYDDKLIEVSSPHPVKFTGDAVGYIEHGGKIYTLPVTQEKYRSMMQEVAKN │ │ A0A0F3QKY5_RICAM/113-350 1 KTLSTFFNREFIDKAL-ENPELKKKLESIEIAGYKNVFKPVQWENQ-V-DLRATVVKNDAGDELCTLNETVKTKTVAKQDGTQVQISSYREIDFPIKLDGSMHLSMVALKADGTKDKAVYFTAHYEEQLKEISSPKPLKFAGDAIAYIEHGGEIYTLAVTRGKYKEMMKEVELN │ │ SCA4_RICPR/103-337 1 RKFRAFLNYALINKAF-EDTKTKKNLEKAEIVGYKNVFQPVQWENQ-V-DLRSTVVKNDEGEELCTLNETVKTKIVAKQDGTQVQINSYREINFPIKLDGSMHLSMVALKADGTKDKAVYFTAHYEEQLKEISSPQPLKFVGDAVAYIEHGGEIYTLAVTRGKYKEMMKEVALN │ │ Q1RIG4_RICBR/102-183 1 EKFRDFLKKELYDKAL-ENPELKKGLENIEIAGYKNV----------------------------------------------------------------------------------------------------------------------------------------- │ │ A0A261DCJ7_9RICK/513-746 1 QQFKEFVNNELIQKTW-ASSITKSTVTQAEVQSYAKIFKPLTWSNQ-T-DTRSRVIKVK-DEELFKLTEKVKTSVILEDGVTETEISNYRNINLPLAIKTAVHLSFPVQNEKGENSKALYFTTHYNEKLVEITNPLSLKFTSNAMGYIQRGKHIYTIPVTKGQYEAMLQEVAKN │ │ A0A0F3MK43_9RICK/402-635 1 ------------------TPETKELISKLGAAGYANIFCTLDWATQ-S-TTRKT-ITNEAGEKVVDLVSSHSV-LSASVNGATKTITKCRTIDIPSTVEGPLDLALVAQDSTGKNSKAVYFTVHYDQKIVEMTHPEPLKFFSSSPAYTVINGEIFTLPITKEKYEQLHKEISQN │ │ Q1RIG4_RICBR/189-417 1 EKFRQFLRQGLYDKAL-SDEQFKGQ--------YENIFRSMQWENQ-V-DLRSTVIKNDAGEEICTLAETHKTATVYKQDGTAVTVNSYRTIDFPIDLEGTMHLSLVAQNKEGK-NNALRFTAHYEAKLKEVSSPQPIKFMGNAVGYIEHGGEIYTLPVTRGKYEAMMKEVAVN │ │ A5CDE1_ORITB/319-550 1 -------------------QETKELLSGIGSAGYANIFSTLDWATH-A-NTTQKTITNDAGEKVTDLISSHKTQLSASVNGVTKIVTKHRTIDIPRAVEGPLDLALVAQDTTGKNSKAVYLTAHYNQKLVEMTHPEPLRFFSGSPAYTVINNEVYTLPITREKYDQLTKEISQN │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example 6¶
Use residues and sequences overlap thresholds to delete some sequences from the alignment. See the trimAl User Guide for details.
$ trimal -in data/example.001.AA.clw -resoverlap 0.6 -seqoverlap 75
[10]:
trimmer = pytrimal.OverlapTrimmer(residue_overlap=0.6, sequence_overlap=75)
trimmed = trimmer.trim(ali)
show_alignment(trimmed)
╭─ (279 residues, 5 sequences) ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ A0A261DC17_9RICK/184-418 1 ------aalvnksia------KP--EELDDLNKFRAYFENEQ---NKETISGLLkEDQNLKHALEQV---EIAGYKNVHTQFA------GRFSTMEWKDGgV--eNANgiTIKKQIVRDANGHEIATLSEANHQINpPHTVQKSDGTSVAISNYRTIDFPIKLDN-NGPMHLSLAVKDQYGKNIAASNAVYFTAHYDDA----G--KLIEVSSPHPVKFTGNSPDAVGYIEHGGKIYTLPVTQEKYRSMMQEVAKNLGQGVNISP------siesi--- │ │ A0A0F3QKY5_RICAM/113-350 1 ---------------LAEQKRKEIEEEKEKDKTLSTFFGNPA---NREFIDKAL-ENPELKKKLESI---EIAGYKNVHNTFSAASGYPGGFKPVQWENQ-V---SAN--DLRATVVKNDAGDELCTLNETTVKTK-PFTVAKQDGTQVQISSYREIDFPIKLDKADGSMHLSMVALKADGTKPSKDKAVYFTAHYEEG--PNGKPQLKEISSPKPLKFAGTGDDAIAYIEHGGEIYTLAVTRGKYKEMMKEVELNQGQSVDLSQ--AEDI-------- │ │ SCA4_RICPR/103-337 1 ---------------LAEQIAKE-----EDDRKFRAFLSNQD---NYALINKAF-EDTKTKKNLEKA---EIVGYKNVLSTYSVANGYQGGFQPVQWENQ-V---SAS--DLRSTVVKNDEGEELCTLNETTVKTK-DLIVAKQDGTQVQINSYREINFPIKLDKANGSMHLSMVALKADGTKPAKDKAVYFTAHYEEG--PNGKPQLKEISSPQPLKFVGTGDDAVAYIEHGGEIYTLAVTRGKYKEMMKEVALNHGQSVALSQtiAEDL-------- │ │ A0A261DCJ7_9RICK/513-746 1 eanitaisvtnlain---------------SQQFKEFVEN-----NNELIQKTW-ASSITKSTVTQAmndEVQSYAKI--------HHANNFKPLTWSNQ-TpidNTT--DTRSRVIKVK-DEELFKLTETKVKTS-TKVILEDGVTETEISNYRNINLPLAIKPSGTAVHLSFPVQNEKGENIEVSKALYFTTHYNEQ------GKLVEITNPLSLKFTSDDKNAMGYIQRGKHIYTIPVTKGQYEAMLQEVAKNQGCNINISQ--T---ipseasdl │ │ Q1RIG4_RICBR/189-417 1 --------nndpays----------EEAKDQEKFRQFLANLNageRQGLYDKAL-SDEQFKGQ-----------YENIRQEY--ANKYVGGFRSMQWENQ-V---SAG--DLRSTVIKNDAGEEICTLAEKTHKTA-PMTVYKQDGTAVTVNSYRTIDFPIDLEGKSGTMHLSLVAQNKEGK---SNNALRFTAHYEADphPDGTPKLKEVSSPQPIKFMGKDENAVGYIEHGGEIYTLPVTRGKYEAMMKEVAVNKGQGVDVSQ--T---ieqdi--- │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example 10¶
Select the 5 most representative sequences from the alignment.
$ trimal -in data/example.001.AA.clw -clusters 5
[11]:
trimmer = pytrimal.RepresentativeTrimmer(clusters=5)
trimmed = trimmer.trim(ali)
show_alignment(trimmed)
╭─ (316 residues, 8 sequences) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ A0A261DC17_9RICK/184-418 1 -------------------------aalvnksia------KP--EELDDLNKFRAYF----ENEQ---NKETISGLLkEDQNLKHALEQV---EIAGYKNVH-TQFA-------------GRFSTMEWKDGgV--eNA--NgiTIKKQIVRDANGHEIATLSEANHQINpPHTVQKSDGTSVAISNYRTIDFPIKLDN-NGPMHLSLAVKDQYGKNIAASNAVYFTAHYDDA----G--KLIEVSSPHPVKFTGNSPDAVGYIEHGGKIYTLPVTQEKYRSMMQEVAKNLGQGVNISP------siesi------- │ │ A0A0F3QKY5_RICAM/113-350 1 ----------------------------------LAEQKRKEIEEEKEKDKTLSTFF----GNPA---NREFIDKAL-ENPELKKKLESI---EIAGYKNVH-NTFSA---AS----GYPGGFKPVQWENQ-V---SA--N--DLRATVVKNDAGDELCTLNETTVKTK-PFTVAKQDGTQVQISSYREIDFPIKLDKADGSMHLSMVALKADGTKPSKDKAVYFTAHYEEG--PNGKPQLKEISSPKPLKFAGTGDDAIAYIEHGGEIYTLAVTRGKYKEMMKEVELNQGQSVDLSQ--AEDI------------ │ │ SCA4_RICPR/103-337 1 ----------------------------------LAEQIAKE-----EDDRKFRAFL----SNQD---NYALINKAF-EDTKTKKNLEKA---EIVGYKNVL-STYSV---AN----GYQGGFQPVQWENQ-V---SA--S--DLRSTVVKNDEGEELCTLNETTVKTK-DLIVAKQDGTQVQINSYREINFPIKLDKANGSMHLSMVALKADGTKPAKDKAVYFTAHYEEG--PNGKPQLKEISSPQPLKFVGTGDDAVAYIEHGGEIYTLAVTRGKYKEMMKEVALNHGQSVALSQtiAEDL------------ │ │ Q1RIG4_RICBR/102-183 1 -------------------------lkenaenpd--------LKSNFESDEKFRDFLrtlnEDPN---KKELYDKAL-ENPELKKGLENI---EIAGYKNVH-ASHSA---EV----YH-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------enkikeeqeel- │ │ A0A261DCJ7_9RICK/513-746 1 -------------------eanitaisvtnlain---------------SQQFKEFV----EN-----NNELIQKTW-ASSITKSTVTQAmndEVQSYAKI----------------HHANNFKPLTWSNQ-TpidNT--T--DTRSRVIKVK-DEELFKLTETKVKTS-TKVILEDGVTETEISNYRNINLPLAIKPSGTAVHLSFPVQNEKGENIEVSKALYFTTHYNEQ------GKLVEITNPLSLKFTSDDKNAMGYIQRGKHIYTIPVTKGQYEAMLQEVAKNQGCNINISQ--T---ipseasdl---- │ │ A0A0F3MK43_9RICK/402-635 1 sliledesfkkwqeqnqnktlddfkhsaaqmdnl---------------------------------------------TPETKELISKL---GAAGYANIL-GSGANieqAQe--mSFAASFCTLDWATQ-S---NAvgN--TTRKT-ITNEAGEKVVDLVSHSHSV---QLSASVNGATKTITKCRTIDIPSTVEK--GPLDLALVAQDSTGKNMPESKAVYFTVHYDQ----DG--KIVEMTHPEPLKFFSETPSSPAYTVINGEIFTLPITKEKYEQLHKEISQNME-------------eqyakdhqlaad │ │ Q1RIG4_RICBR/189-417 1 ---------------------------nndpays----------EEAKDQEKFRQFL----ANLNageRQGLYDKAL-SDEQFKGQ-----------YENIR-QEY-----AN----KYVGGFRSMQWENQ-V---SA--G--DLRSTVIKNDAGEEICTLAEKTHKTA-PMTVYKQDGTAVTVNSYRTIDFPIDLEGKSGTMHLSLVAQNKEGK---SNNALRFTAHYEADphPDGTPKLKEVSSPQPIKFMGKDENAVGYIEHGGEIYTLPVTRGKYEAMMKEVAVNKGQGVDVSQ--T---ieqdi------- │ │ A5CDE1_ORITB/319-550 1 --aeddkfkvwqqqnpskslddfkrdttqidsls----------------------------------------------QETKELLSGI---GSAGYANIMgSTANI---EQaqqmSFAASFSTLDWATH-A--nSV--G--NTTQKTITNDAGEKVTDLISHSHKTQ---LSASVNGVTKIVTKHRTIDIPRAVEENKGPLDLALVAQDTTGKNMPESKAVYLTAHYNQ------EGKLVEMTHPEPLRFFSDEPGSPAYTVINNEVYTLPITREKYDQLTKEISQNI--------------qeqdkdkereqe │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯