FeatureREDUCE
I designed and continue to enhance the FeatureREDUCE algorithm, which parameterizes the relative affinity of all possible DNA sequences in terms of a small set of free energy parameters associated with base pair substitutions or insertions/deletions and their possible dependencies. In addition, or algorithm accounts for the considerable PBM-specific biases. Using robust regression techniques, we achieve quantification of relative binding affinities at an unprecedented level of accuracy.
The biophysical model inferred by FeatureREDUCE is interpretable, which can provide clues about structural mechanism and how amino-acid sequence features determine binding specificity. Comparison to direct measurements of binding constants using SPR and MITOMI technology allowed us to rigorously determine the accuracy of our sequence-to-affinity models. The combination of modeling spatial bias and nucleotide dependencies in a robust regression framework allows us to match MITOMI measurements of relative binding affinity at a high level of accuracy (R2 = 0.93, RMSE = 0.05).
In a recent benchmark study comparing 26 transcription factor sequence specificity algorithms (Weirauch et al., Nature Biotech, 2013), FeatureREDUCE emerged as the top-performing algorithm.
The manuscript showcasing the first iteration of FeatureREDUCE will be submitted soon.
FeatureREDUCE can be downloaded here.