MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

Abstract : Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment. We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence. MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse. Citation: Ranwez V, Harispe S, Delsuc F, Douzery EJP (2011) MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons. PLoS ONE 6(9): e22594.
Document type :
Journal articles
Complete list of metadatas

Cited literature [53 references]  Display  Hide  Download

https://hal-sde.archives-ouvertes.fr/hal-01773250
Contributor : Frederic Delsuc <>
Submitted on : Saturday, April 21, 2018 - 11:25:57 AM
Last modification on : Thursday, July 11, 2019 - 2:10:08 PM
Long-term archiving on : Tuesday, September 18, 2018 - 6:49:22 PM

File

Ranwez-PLoSONE11.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Vincent Ranwez, Sébastien Harispe, Frédéric Delsuc, Emmanuel Douzery. MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons. PLoS ONE, Public Library of Science, 2011, 6 (9), ⟨10.1371/journal.pone.0022594⟩. ⟨hal-01773250⟩

Share

Metrics

Record views

204

Files downloads

216