

Where δ j, F is the Kronecker symbol, taking value one if element j is folded and zero otherwise. We found that the overall folding behavior of a complete repeat array can be well described with few global descriptors that can be directly calculated solely from sequence information. in a variegated zoo of folding mechanisms. The resulting model is then applied to thousands of different sequences that fold to the same overall topology, revealing distinct folding routes, the emergence of subdomains, downhill scenarios, etc.
#Sequence definition free#
We map the energetics of the sequences to an Ising model with one free parameter that we fit with experimental folding data. We hypothesize that the local energetics can be estimated with a maximum entropy model for the natural sequence statistics that results in a pairwise Potts model for amino acid interactions ( 13). We will make use of Ankyrin repeat proteins, as this is one of the most abundant families, and their folding mechanism can be well approximated with simple folding models ( 12). Here, we aim to use evolutionary information from repeat-protein systems to investigate the folding mechanisms of thousands of natural repeat arrays. In various cases, the detailed folding mechanism of the repeat arrays has been identified to play a major role in their biological function ( 11), but for most of the repeat arrays, it remains unknown. Their activity is usually associated with specific protein–protein interactions, with a versatility that can be equated to that of antibodies. These proteins are present in all taxa and are particularly abundant in eukaryotes, where they account for about 20% of the coded proteins. To what extent nature has exploited this opportunity is yet unknown.īesides single-point mutations, the evolution of repeat proteins is thought to occur via duplications and deletions of large portions of primary structure, usually encompassing one or more repeats ( 10). Thus, the energy landscape of repeat proteins appears “plastic” and very amenable to design ( 9). When the local energetics are similar along the assemblage, parallel folding routes can be identified ( 7), and the routes can be switched by (de)stabilizing regions along the array ( 8). In general, the folding mechanisms are defined by an initial nucleation in some region of the array and the propagation of structure to their neighbors. Notably, simple coarsed one-dimensional Ising-like models of repeat protein have been found to be extremely useful for interpreting in vitro experiments ( 6).

Being quasi-one-dimensional, the folding of the complete array is dominated by the local energetics within each repeat and its local neighbors, making the folding sensitive to small perturbations that may lead to the breakdown of cooperativity and the appearance of stable intermediates and subdomains ( 5).

In these, folding domains are not easy to define and identify, as several, but not necessarily all, of the repetitions cooperate in the stabilization of structures ( 5). The repeats usually fold in recursive structural elements that pack against each other in a roughly periodic way, making the overall architecture of the arrays appear as elongated objects ( 4). Repeat proteins are composed of tandem arrays of similar amino acid stretches. We show that the global stability and cooperativity of the repeating arrays can be predicted from simple sequence scores. Additionally, we characterized nucleation-propagation and multidomain folding mechanisms. Fully cooperative all-or-none transitions are obtained for arrays with enough sequence-similar elements and strong interactions between them, while noncooperative element-by-element intermittent folding arose if the elements are dissimilar and the interactions between them are energetically weak. We analyzed the folding of thousands of natural Ankyrin repeat proteins and found that a multiplicity of folding mechanisms are possible. These parameters are used to inform an Ising-like model that allows for the generation of folding curves, apparent domain emergence, and occupation of intermediate states that are highly compatible with experimental data in specific case studies. We model the energetics by a combination of an inverse Potts-model scheme with an explicit mechanistic model of duplications and deletions of repeats to calculate the evolutionary parameters of the system at the single-residue level. Here, we propose a scheme to map evolutionary information at the sequence level to a coarse-grained model for repeat-protein folding and use it to investigate the folding of thousands of repeat proteins. These proteins constitute excellent model systems to investigate how evolution relates to structure, folding, and function. Repeat proteins are made with tandem copies of similar amino acid stretches that fold into elongated architectures.
