Functional regions of proteins have evolved to have specific patterns of amino acids tailored to the activity of the biomolecule. The identification of the functional region of such protein families was obtained with large scale mutation experiments where the effect on the protein function was tested against each alteration.
In this work, we propose an approach to identify functional regions of proteins to distinguish between residues that have a strictly functional role from the one that is important for the protein structural stability.
The methodology that we propose here is based on the hypothesis that an artificial evolution process based on protein design, in the absence of any functional constraints, would lead only to co-evolution events of the structural type.
With this information, we then perform three analysis steps: firstly, we identify conserved residues in I) natural sequences and then in II) artificial families obtained from computational design. The comparison from (I) and (II) allows us to annotate the residues as structural (orange), functional (green) or intermediate (yellow).
The analysis resulted in a detailed list of residues that could have potentially a functional or a structural role for the proteins, and because of that show a correlation signature. Many of the residues annotated with our method were verified by comparison to databases of protein annotations.
Our results demonstrate the validity of our automated approach to identify functional residues in protein families.
Large scale analysis of the whole proteome using an automated algorithm based on our methodology could give an important contribution to the identification of functional protein regions. By designing protein complexes our method could be used also to classify functional residues for their involvement in protein-protein interactions.