File size: 1,738 Bytes
3f0c3e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5568661
 
3f0c3e9
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: cc-by-nc-nd-4.0
language:
- en
base_model: EleutherAI/pythia-410m
library_name: transformers
tags:
- biology
- scRNAseq
---

# Overview
This is the C2S-Pythia-410m-cell-type-conditioned-cell-generation model, built on the Pythia-410m architecture developed 
by EleutherAI, fine-tuned using Cell2Sentence (C2S) on a comprehensive collection of single-cell RNA sequencing 
(scRNA-seq) datasets from CellxGene and the Human Cell Atlas. Cell2Sentence is a pioneering technique that adapts 
large language models (LLMs) to single-cell biology by converting scRNA-seq data into "cell sentences" — ordered 
sequences of gene names based on expression levels. This model is specifically trained for cell type-conditioned 
single-cell generation, enabling the generation of realistic single-cell profiles conditioned on specified cell 
types.

# Training Data
This model was trained on over 57 million human and mouse cells gathered from over 800 single-cell RNA sequencing 
datasets from CellxGene and the Human Cell Atlas. This dataset covers a broad range of cell types and conditions
from multiple tissues in both human and mouse.

This model was trained with the top 200 genes per cell sentence.

# Tasks
This model is designed for:

- Cell type-conditioned single-cell generation: Generating single-cell profiles conditioned on specific cell types, allowing for the creation of synthetic cells that reflect the gene expression patterns of targeted cell types.


# Cell2Sentence Links
- GitHub: https://github.com/vandijklab/cell2sentence
- Paper: https://www.biorxiv.org/content/10.1101/2023.09.11.557287v3

# Pythia Links
- Paper: https://arxiv.org/pdf/2304.01373
- Hugging Face: https://huggingface.co/EleutherAI/pythia-410m