--- license: gpl-3.0 language: - en base_model: - facebook/esm2_t33_650M_UR50D tags: - Antifungal - Protein - Bioinformatics - Machine Learning --- # AntiFP2: Fine-tuned ESM2 Antifungal Protein Classifier This repository contains a fine-tuned ESM2 model for classifying antifungal proteins from amino acid sequences. The model is trained to predict binary labels indicating whether a protein is antifungal or not. ## Model Description - **Base Model:** ESM2-t36-3B-UR50D (Fine-tuned) - **Fine-tuning Task:** Binary antifungal protein classification. - **Architecture:** ESM2 backbone with a linear classification head. - **Input:** Protein amino acid sequences. - **Output:** Binary labels (0 = non-antifungal, 1 = antifungal). ## Repository Contents - `pytorch_model.bin`: Trained model weights. - `alphabet.bin`: ESM2 alphabet (tokenizer). - `config.json`: Model configuration. - `README.md`: This file. ## Usage ### Installation Install required Python packages: ```bash pip install torch esm biopython huggingface_hub ```` ### Loading the Model from Hugging Face ```python import torch import torch.nn as nn import esm from huggingface_hub import hf_hub_download import json # Define the classifier architecture (must match training) class ProteinClassifier(nn.Module): def __init__(self, esm_model, embedding_dim, num_classes): super(AntiFP2Classifier, self).__init__() self.esm_model = esm_model self.fc = nn.Linear(embedding_dim, num_classes) def forward(self, tokens): with torch.no_grad(): results = self.esm_model(tokens, repr_layers=[36]) embeddings = results["representations"][36].mean(1) output = self.fc(embeddings) return output # Download model files from Hugging Face Hub repo_id = "your-username/antifp2" model_weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin") alphabet_path = hf_hub_download(repo_id=repo_id, filename="alphabet.bin") config_path = hf_hub_download(repo_id=repo_id, filename="config.json") # Load ESM2 backbone model esm_model, alphabet = esm.pretrained.esm2_t36_3B_UR50D() # Load configuration with open(config_path, 'r') as f: config = json.load(f) # Initialize classifier classifier = ProteinClassifier(esm_model, embedding_dim=config['embedding_dim'], num_classes=config['num_classes']) # Load weights classifier.load_state_dict(torch.load(model_weights_path)) classifier.eval() # Load alphabet tokenizer alphabet = torch.load(alphabet_path) batch_converter = alphabet.get_batch_converter() # Move model to GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") classifier = classifier.to(device) ``` ## Input Format Input sequences must be provided as amino acid strings using standard single-letter codes. ## Output The model outputs logits for two classes, which can be converted to probabilities using softmax. The predicted label is antifungal (1) if the probability exceeds a threshold (e.g., 0.5). ---