CQSB
/

GianLMB commited on
Commit
0894164
·
verified ·
1 Parent(s): b89042f

Update model card

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -19,8 +19,12 @@ Models for intrinsic disorder were evaluated on Critical Assessment of Intrinsic
19
  - For the CAID1 and CAID2 evaluation, models were trained exclusively on data from DisProt 7.0 database. These models are denoted with the suffix “DisProt7” (see table below).
20
  - For the CAID3 evaluation, the training set was expanded to also include data from both CAID1 and CAID2. These models are labelled with the suffix “ID”, and are the recommended models for intrinsic disorder prediction.
21
 
22
- Models for soft disorder classification are trained instead on the SoftDis dataset, derived from an extensive analysis of clusters of alternative structures for the same protein
23
- sequence in the Protein Data Bank (PDB). For each position in the represantitive sequence of each cluster, the dataset provides the frequency of closely-related homologs for which the corresponding residue is higly flexible or missing. Any position with a frequency higher than 0 is labeled as soft disordered. Models trained with this dataset are denoted with the suffix “SD”.
 
 
 
 
24
 
25
 
26
  ## Model checkpoints
@@ -42,7 +46,7 @@ We provide different model checkpoints, based on training data and pre-trained P
42
  | [Ankh-LoRA-SD](https://huggingface.co/CQSB/Ankh-LoRA-SD) | SoftDis | [ankh-large](https://huggingface.co/ElnaggarLab/ankh-large) |
43
  | [PortT5-LoRA-SD](https://huggingface.co/CQSB/ProtT5-LoRA-SD) | SoftDis | [prot_t5_xl_uniref5](Rostlab/prot_t5_xl_uniref50) |
44
 
45
- \* DisProt7, CAID1 and CAID2 data
46
 
47
  ## Intended uses & limitations
48
 
 
19
  - For the CAID1 and CAID2 evaluation, models were trained exclusively on data from DisProt 7.0 database. These models are denoted with the suffix “DisProt7” (see table below).
20
  - For the CAID3 evaluation, the training set was expanded to also include data from both CAID1 and CAID2. These models are labelled with the suffix “ID”, and are the recommended models for intrinsic disorder prediction.
21
 
22
+ Models for soft disorder classification are trained instead on the [SoftDis](https://huggingface.co/datasets/CQSB/SoftDis) dataset, derived from an extensive analysis of clusters of alternative structures for the same protein
23
+ sequence in the Protein Data Bank (PDB). For each position in the represantitive sequence of each cluster, the dataset provides the frequency of closely-related homologs for which the corresponding residue is higly flexible or missing. Any position with a frequency higher than 0 is labeled as soft disordered.
24
+
25
+ The data split for model training corresponds in particular to the [id05](https://huggingface.co/datasets/CQSB/SoftDis/tree/main/splits/id05) configuration, that further clusters representative sequences for each structure at 0.5 sequence identity. See the [SoftDis Dataset card](https://huggingface.co/datasets/CQSB/SoftDis) for more details.
26
+
27
+ Models trained with this dataset are denoted with the suffix “SD”.
28
 
29
 
30
  ## Model checkpoints
 
46
  | [Ankh-LoRA-SD](https://huggingface.co/CQSB/Ankh-LoRA-SD) | SoftDis | [ankh-large](https://huggingface.co/ElnaggarLab/ankh-large) |
47
  | [PortT5-LoRA-SD](https://huggingface.co/CQSB/ProtT5-LoRA-SD) | SoftDis | [prot_t5_xl_uniref5](Rostlab/prot_t5_xl_uniref50) |
48
 
49
+ \* Union of DisProt7, CAID1 and CAID2 datasets
50
 
51
  ## Intended uses & limitations
52