We investigate the optimal condition of an enzyme by directly analyzing the sequence. We propose an embedding method to represent the amino acids and the construct information as vectors in the latent space.
File descriptions:
1.change_to_csv.java: Convert .fas data to .csv data
2.change_to_number.java: Convert string data in .csv to real number or one-hot type
3.Split_train_test.java:Split the data into training set and test set proportionally
4.create_probability.java: Generate the sampling ratio of positive and negative samples
5.create_samples.java: Generate positive and negative samples according to the sampling ratio
6.protein_learning.java:Embedding learning
7.check_matrix.java: Evaluate the accuracy of the model
Li, X., Dou, Z., Sun, Y. et al. A sequence embedding method for enzyme optimal condition analysis. BMC Bioinformatics 21, 512 (2020). https://doi.org/10.1186/s12859-020-03851-5