PratikBShinde commited on
Commit
b9c7555
·
verified ·
1 Parent(s): 293ada3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -3
README.md CHANGED
@@ -1,3 +1,102 @@
1
- ---
2
- license: gpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - facebook/esm2_t33_650M_UR50D
7
+ tags:
8
+ - Antifungal
9
+ - Protein
10
+ - Bioinformatics
11
+ - Machine Learning
12
+ ---
13
+
14
+ # AntiFP2: Fine-tuned ESM2 Antifungal Protein Classifier
15
+
16
+ This repository contains a fine-tuned ESM2 model for classifying antifungal proteins from amino acid sequences. The model is trained to predict binary labels indicating whether a protein is antifungal or not.
17
+
18
+ ## Model Description
19
+
20
+ - **Base Model:** ESM2-t36-3B-UR50D (Fine-tuned)
21
+ - **Fine-tuning Task:** Binary antifungal protein classification.
22
+ - **Architecture:** ESM2 backbone with a linear classification head.
23
+ - **Input:** Protein amino acid sequences.
24
+ - **Output:** Binary labels (0 = non-antifungal, 1 = antifungal).
25
+
26
+ ## Repository Contents
27
+
28
+ - `pytorch_model.bin`: Trained model weights.
29
+ - `alphabet.bin`: ESM2 alphabet (tokenizer).
30
+ - `config.json`: Model configuration.
31
+ - `README.md`: This file.
32
+
33
+ ## Usage
34
+
35
+ ### Installation
36
+
37
+ Install required Python packages:
38
+
39
+ ```bash
40
+ pip install torch esm biopython huggingface_hub
41
+ ````
42
+
43
+ ### Loading the Model from Hugging Face
44
+
45
+ ```python
46
+ import torch
47
+ import torch.nn as nn
48
+ import esm
49
+ from huggingface_hub import hf_hub_download
50
+ import json
51
+
52
+ # Define the classifier architecture (must match training)
53
+ class ProteinClassifier(nn.Module):
54
+ def __init__(self, esm_model, embedding_dim, num_classes):
55
+ super(AntiFP2Classifier, self).__init__()
56
+ self.esm_model = esm_model
57
+ self.fc = nn.Linear(embedding_dim, num_classes)
58
+ def forward(self, tokens):
59
+ with torch.no_grad():
60
+ results = self.esm_model(tokens, repr_layers=[36])
61
+ embeddings = results["representations"][36].mean(1)
62
+ output = self.fc(embeddings)
63
+ return output
64
+
65
+ # Download model files from Hugging Face Hub
66
+ repo_id = "your-username/antifp2"
67
+ model_weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")
68
+ alphabet_path = hf_hub_download(repo_id=repo_id, filename="alphabet.bin")
69
+ config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
70
+
71
+ # Load ESM2 backbone model
72
+ esm_model, alphabet = esm.pretrained.esm2_t36_3B_UR50D()
73
+
74
+ # Load configuration
75
+ with open(config_path, 'r') as f:
76
+ config = json.load(f)
77
+
78
+ # Initialize classifier
79
+ classifier = ProteinClassifier(esm_model, embedding_dim=config['embedding_dim'], num_classes=config['num_classes'])
80
+
81
+ # Load weights
82
+ classifier.load_state_dict(torch.load(model_weights_path))
83
+ classifier.eval()
84
+
85
+ # Load alphabet tokenizer
86
+ alphabet = torch.load(alphabet_path)
87
+ batch_converter = alphabet.get_batch_converter()
88
+
89
+ # Move model to GPU if available
90
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
91
+ classifier = classifier.to(device)
92
+ ```
93
+
94
+ ## Input Format
95
+
96
+ Input sequences must be provided as amino acid strings using standard single-letter codes.
97
+
98
+ ## Output
99
+
100
+ The model outputs logits for two classes, which can be converted to probabilities using softmax. The predicted label is antifungal (1) if the probability exceeds a threshold (e.g., 0.5).
101
+
102
+ ---