Updated cell and gene numberd for diverse task Pythia C2S model
Browse files
README.md
CHANGED
@@ -24,8 +24,10 @@ This model was trained on over 57 million human and mouse cells gathered from ov
|
|
24 |
datasets from CellxGene and the Human Cell Atlas. This dataset covers a broad range of cell types and conditions
|
25 |
from multiple tissues in both human and mouse.
|
26 |
|
27 |
-
This model was trained with a variable number of genes per cell sentence
|
28 |
-
|
|
|
|
|
29 |
|
30 |
# Tasks
|
31 |
This model is designed for the following tasks:
|
|
|
24 |
datasets from CellxGene and the Human Cell Atlas. This dataset covers a broad range of cell types and conditions
|
25 |
from multiple tissues in both human and mouse.
|
26 |
|
27 |
+
This model was trained with a variable number of genes per cell sentence, with a maximum context length of 8192 tokens.
|
28 |
+
The context length of the default Pythia model was extended using rotary positional embeddings prior to C2S training.
|
29 |
+
- Cells: For multi cell samples, each training sample contained between 5 and 20 cells, with the same number of genes for each of the cells in the same sample.
|
30 |
+
- Genes: For single cell samples, each cell sentence contained between 100 and 2048 genes. For multi cell samples, each cell sentence per cell contained between 100 and 400 genes.
|
31 |
|
32 |
# Tasks
|
33 |
This model is designed for the following tasks:
|