Thank you for your work. I think integrating protein language models into Directed Evolution is a good idea.
However, I have a question: In your paper, Directed Evolution has significantly increased the fitness of mutated seqs, but these fitness values seem to be based on predictions from models trained on the training dataset. However, the final fitness prediction values ended up exceeding the range of the training data labels. Is this reasonable? For example, I have analyzed the fitness range of the avGFP training data you provided, which is 1.28 to 4.12. I believe that the model might not be accurate in predicting fitness values exceeding 4.12 (due to the lack of training data and the model's disregard for out-of-distribution data), yet your Directed Evolution method has improved the avGFP's fitness to an out-of-distribution value of 11.796, which seems unreasonable. What are your thoughts on this?
Lastly, I want to say again that guiding directed evolution with the residue probability distribution of protein language models is a good idea. Thank you for your work and the open-source code.