Theory of Sequence Effects and Concentration on Protein Aggregation in Amyloid Fibrils

by Caleb Huang

Principal Investigator: Dr. Jeremy Schmit, Associate Professor of Physics

Kansas State University Physics Department  REU Program


            My research this summer primarily involves developing analytical theory for binding properties of the amyloid fibril, which is a sheet of aggregated proteins that causes various common diseases such as Alzheimer’s disease, Parkinson’s disease, and type 2 diabetes. In short, I translate protein structures and dynamics, which are notoriously complex, onto a piece of paper. The running joke among theoretical physicists is to introduce the premise of an argument by “assuming a spherical cow.” This is an apt description of what I do: I first strip the properties of proteins to its fundamentals, and add new variables piece by piece. Fundamentally, I want to answer why some proteins are more prone to aggregate than others and what factors influence how they aggregate.

            Research Description is an extended summary of my project. It is meant to be understandable to the general science-educated public. REU Social Life discusses the REU experience outside of research. About Me gives a brief bio about my background in physics and school.


Figure 1: The theoretical physicist’s perspective of the amyloid fibril.

Research Description

Amyloid Fibril and Diseases

            Ubiquitous in living organisms, the protein is an organic polymer consisting of a sequence of various amino acids. It typically folds into a three-dimensional structure after being synthesized. Depending on its sequence, it may serve various functions such as communication, structural support, and transport. However, under certain circumstances, the protein does not fold properly but instead binds onto another protein via hydrogen bonds along the protein backbone. The formation results in a crystalline sheet of proteins called the amyloid fibril, which is the cause for various diseases such as Alzheimer’s disease, Parkinson’s disease, and type 2 diabetes. The register is a metric for polymer alignment, a key property that affects biochemistry of aggregates in vivo and therefore their physiological effects on the body. Amyloid fibrils tend to be close to in-register, whereas disordered protein aggregations form non-fibrillar amorphous “glops.”

            Current understanding on the properties of amyloid fibril formation is mostly experimentally determined and case-specific. The goal of this project thus is to construct a theoretical framework to determine general principles of amyloid fibril formation.

Figure 2: Cartoon illustrations of protein aggregation.


Biased Diffusion Model

            Fundamentally, the amyloid fibril is nothing more than polymers hydrogen bonding to each other. The different properties in the amino acid is captured by differences in binding energy. For example, polar amino acids tend to be weakly binding, whereas nonpolar amino acids tend to be strongly binding due to entropic effects. Binding energies work in opposition to thermal energy and dictate how fast the bonds break. Qualitatively, positive binding energy favors forward binding (difficult to unbind), neutral binding energy has no effect (unbinding process purely thermodynamic), and negative binding energies favors unbinding (easy to unbind).


Figure 3: The dissociation rate of the n-th amino acid, k-,n, is dictated by the binding energy.

Toy Models Describe Binding Dynamics

Microscale Binding (Amino Acids)

            The binding and unbinding dynamics can be modeled using the discrete random walk model. Intuitively, for moderate binding energies, the biased “walk” can be visualized as a dance where the polymer takes a few steps forward while taking a step back occasionally. The 1-D random walk is solvable analytically and provides insight on the relation between energy and the residence time (how long the polymer sticks). The 2-D model is only solvable numerically but is a more accurate representation of polymer behavior.

t_n (E_1,…,E_L)=p_(n,+) t_(n+1)+p_(n,-) t_(n-1)+τ_n

p_(+,n) (E_n )=k_+/(k_++k_(-,n) (E_n ) ) 

Macroscale Binding (Proteins)

            The information from the microscopic view of amino acid binding is incorporated in the macroscopic scale of protein polymer binding, namely the polymer off rate, denoted koff(E1,,EL,R). The off rate depends inversely on the residence time at binding, t1(E1,,EL). The polymer on rate, kon(c, L), follows general diffusion rules and is proportional to concentration. While energy dictates how long a polymer stays, concentration dictates how rapidly polymers meet.


P_inc (〖c,E_1,…,E〗_L,R)=(k_on (c,L))/(k_on (c,L)+k_off (〖E_1,…,E〗_L,R))


            Polymer aggregation is concentration- and energy-dependent. It is roughly analogous to a polyamorous dating game, where each polymer represents a “person” with a certain “attraction” to others. Finding mates requires both a considerable number of opportunities to meet available candidates and a high mutual attraction factor. However, a high concentration of polymers can compensate for low attraction and vice versa.

Figure 4: a) High concentration and high attraction lead to high rates of aggregation. b) High concentration and low attraction lead to medium rates of aggregation. c) Low concentration and high attraction lead to medium rates of aggregation. d) Low concentration and low attraction lead to low rates of aggregation.

            The probability of polymer binding converges to one at infinite concentration. Our model predicts that the growth rate converges to the diffusion rate and that the register converges to approximately L/3 for polymers of length L. At high concentrations, high registers (disordered states) are less selected against (Figure 5 inset). This shows that our model fits well-known theory: generating well-ordered crystals requires time. However, different proteins aggregate differently even under similar conditions in the body, say equal concentrations and temperatures. Here, we assume that all amino acids have equal binding energies. But, as previously discussed, what distinguishes one protein from another is its unique sequence of amino acids. Importantly, this affects the probability distribution of registers.


Figure 5

Varied Binding Energies: Nature’s Favored Search Algorithm

            For uniform binding energies, off-register binding is energetically less favorably simply due to fewer hydrogen bonds. At R=0, there are eight hydrogen bonds, whereas at R=7, there is only one. The polymer off rate, koff, increases exponentially as the register increases, thus lowering the binding probability of high registers (by the probability of polymer incorporation model, the binding probability falls off roughly proportional to 1/(1+e|R|)). However, there are no additional energy penalties or gains for off-registers.

Figure 6: At a low concentration (blue), the probability distribution is a larger peak centered around zero. At a high concentration (yellow), the probability distribution is wider, and higher registers are more likely.

            We vary the energy distribution while keeping the total binding energy constant. One method to find distributions that give maximum and minimum disordered states is to perform a constrained search optimization. To solve for the behavior of distributions in between, we directly check certain energy distribution motifs. We assume that the dark green circles are nonpolar amino acids and the light green circles are polar amino acids. Established theory tells us that nonpolar amino acid pairs bind strongly, polar amino acid pairs bind weakly, and polar and nonpolar amino acids pairs bind weakly. For the dimer energy motif, the R=1 binding state thus becomes highly unfavorable with all weak bonds, while R=2, like R=0, has alternating weak and strong bonds. The triblock with high energy exteriors follows a similar trend. For the diblock and triblock with interior high energies, the low registers are favorable while high registers are highly unfavorable.

Figure 7: The dimer and triblock with exterior high energies favor certain off registers, whereas the diblock and triblock with interior high energies favor low registers.

            Taking the expected value of the absolute value of register at a continuous range of concentrations, we find that the behavior of the function with respect to concentration depends on energy. The dimer and uniform chain share similar values and behaviors, while the diblock and triblock with high energy interiors share similar values and behaviors. The triblock case high energy exteriors has a unique behavior of its own. While the shape of the functions change depending on the exact binding energies chosen, this grouping is consistent independent of energy. For normal energies and concentrations, the high disorder and low disorder grouping listed under “Intermediate concentration regime” roughly holds.

Figure 8

Applications to Bioinformatics and Medicine

            There have been many studies done in bioinformatics identifying aggregation prone sequences using various algorithms. In a study by Fang et al., a machine learning algorithm identified several sequences either as amyloidogenic or non-amyloidogenic. One sequence identified as amyloidogenic was GKVQIVYK. Using our model, we arbitrarily shuffled the sequence (dashed) and calculated the average register. For the combination of amino acids, the identified sequence showed to be lowest in disorder and thus consistent with amyloidogenic properties.

Figure 9: Our model predicts that the amyloidogenic sequence is the least disordered.


            While our model alone cannot necessarily predict whether a polymer will form ordered or disordered aggregates, it can determine whether a sequence is more or less ordered than another. Then, experimental evidence can be used as upper or lower bound benchmarks for amyloid fibril formation. In the Aβ10-35 region, the normal sequence consists of two main nonpolar cores flanked with polar residues (block motif). In the Aβ10-35,scrambled region, the sequence is rearranged such that the sequence is alternating polar and nonpolar (dimer motif). Antzutkin et al. showed experimentally that while Aβ10-35 formed amyloid fibrils, Aβ10-35,scrambled formed nonfibrillar polymer aggregates. Our model correctly predicted that Aβ10-35,scrambled is more disordered than Aβ10-35. From this, we can deduce that the Aβ10-35 sequence rearranged to have either a hydrophobic core (yellow) or hydrophobic residues concentrated on one side (purple) will form even more ordered aggregates, whereas hydrophobic residues concentrated at both flanks (blue) will form highly disordered aggregates.

Figure 10


            Our findings on sequence-dependent disordered aggregation allow us to better characterize sequence motifs that are prone to form amyloid fibrils and their binding behaviors under specified conditions. This may have important implications on medical research in understanding disease mechanism and developing drugs.

Figure 11: Given information on concentration and binding energy, our model explains how polymers behave in solution.

REU Social Life

            As physicists, we tend to be more of an unusual breed, and this REU has been a great opportunity for us to find a shared identity. One thing that we have in common is our eccentric personalities, and that has tied us together. On the flip side, many of us come from very different walks of life, from the extremely rural areas to the big metropolitan cities, and we all learn from each other, whether it’s through spending hours discussing our journeys through college or constructing mathematical models of morality or debating controversial topics in politics. Additionally, we naturally fraternize with the Math REU. In the evenings, we work together on Putnam math problems, go bowling, climb the rock walls, swim at the city pool, play racquet sports, and explore the city. I’ve lived in seven cities across four countries for various periods of time, all of them with populations ranging from one to twenty-four million, so no doubt Manhattan, Kansas: population 50,000 has been a nice change of scenery from the city life. Overall, I can easily say that the REU has been one of the best experiences I’ve had in college.

Figure 12: (Left) Trip to see the dinosaurs at the Discovery Center. (Center) REU canoe day. (Right) Attack on an armed fleet of canoes.

Photo Credits: Ottillia Ni, our unofficial REU photographer.



About Me:

I am originally from San Jose, California, but I lived in various cities in Taiwan during high school. My frequent moving from city to city and school to school has in part instilled an adventurous streak in me, not only in travel but also in academics. I started college interested in a wide range of subjects, considering majors anywhere from philosophy to engineering to education. I eventually settled on physics as I enjoy the problem solving aspect of it. However, physics, being the king of all sciences, opened doors to mathematics, chemistry, and biology (and speaking as a former avid stamp collector, I don’t necessarily appreciate the derision some physicists give to either stamp collecting or the other sciences, but that’s another story).

Now that I am finishing up my college degree, I still haven’t made up my mind a particular field of specialization. I currently study biophysics and applied mathematics while dabbling in Latin literature at the University of California, Los Angeles. In addition to research, I work as a test prep instructor teaching the SAT, ACT, and GRE and as a tutor and mentor for middle school students from underserved areas. Previously, I volunteered with the UCLA Mobile Clinic, providing social services to the homeless in LA. I am interested in sustainable and community living, and I am actively involved in the student cooperative near my university, an independent housing organization owned by students and run by students. Ultimately as a physicist, I hope that my work has a direct impact on the community, one cow at a time (Figure 13). In the future, I hope to pursue an MD/PhD, integrating theory with direct clinical applications.

Figure 13: Artistic rendering of the role of the physicist in medicine.


More about me: LinkedIn Profile

Group webpage: Schmit Research Group


 nsf-logoThis program is funded by the National Science Foundation through grant number PHYS-1461251.  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.