Commit History
feat: use default from_pretrained function
4ac66e4
feat(train): use new HF _do_init api
6b84155
fix: model compatible with do_init
f3a8cbb
feat: layernorm > rmsnorm in long runs
0f2cf98
fix: use correctly cache during inference + allow unscan (#170)
42968cf
unverified
feat: vmap optimizer (#166)
b993d27
unverified
feat: scan layers + gradient checkpointing (#161)
07a6f9a
unverified
Merge branch 'main' of https://github.com/borisdayma/dalle-mini into main
bcd360f
feat: better multi-node support (#158)
728a3c3
unverified
feat(text): support emojis (#154)
7ef7bd9
unverified
fix: smelu
7f2f8ed
fix: sinkformer
2c583b3
fix: support smelu
a2dcee4
feat: allow relative position (#156)
769d20a
unverified
feat: sinkhorn in lse mode (#155)
00d4661
unverified
fix: sinkformer gradient
eed4896
feat(model): allow bias (#152)
361a994
unverified
feat: add sinkformer + custom final ln + pre-ln (#151)
f139b0b
unverified
feat: placeholders for more config
69bcbeb
feat: force final ln in encoder
32f4ba5
feat: allow more configurations
5bd4c20
fix: DeepNet doesn't scale weights of embedding/output layers (#150)
503d6b4
unverified
Shuming Ma
Shuming Ma
commited on
feat: remove unecessary LN
02824a7
feat: add cogview
472c4cc
fix(textnormalizer): consider utf8 on windows (#148)
3b8d8cb
unverified
illtellyoulater
commited on