File size: 6,161 Bytes
98ef490 d63834d f53ec14 d63834d f53ec14 6efdc98 9e2fa5a 6efdc98 8c08886 09e3a52 8c08886 09e3a52 66b094b 09e3a52 49f7b80 09e3a52 27de13f 196b9de 27de13f 2951b3d fd4fad5 2951b3d b963f7b f341627 b963f7b da27032 b963f7b 558d010 19c1733 3baf79c 19c1733 b698972 63901e2 19c1733 3baf79c 19c1733 dc4aef9 19c1733 5a497b3 19c1733 8740e1b 247a0a0 8740e1b d1f176f 3c83a3f 5e81566 8740e1b 747372a 8740e1b 7e7db76 8740e1b 5523e64 423b5d4 8740e1b 19c1733 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
pipeline_tag: text-generation
inference: true
widget:
- text: "public class HelloWorld {\n public static void main(String[] args) {"
example_title: Hello world
group: Java
license: bigcode-openrail-m
datasets:
- bigcode/starcoderdata
metrics:
- code_eval
library_name: transformers
language:
- code
tags:
- NarrowTransformer
model-index:
- name: NT-Java-1.1B
results:
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Java)
metrics:
- name: pass@1
type: pass@1
value: 18.3
verified: false
extra_gated_prompt: >-
## Model License Agreement
Please read the BigCode [OpenRAIL-M
license](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement)
agreement before accepting it.
extra_gated_fields:
I accept the above license agreement, and will use the Model complying with the set of use restrictions and sharing requirements: checkbox
duplicated_from: bigcode-data/starcoderbase-1b
---
# NT-Java-1.1B
## Table of Contents
1. [Model Summary](#model-summary)
2. [Intended Uses](#intended-uses)
3. [Limitations](#limitations)
4. [Training](#training)
5. [License](#license)
6. [Citation](#citation)
## Model Summary
The Narrow Transformer (NT) model NT-Java-1.1B is an open-source specialized code model built by extending pre-training on StarCoderBase-1B, designed for coding tasks in Java programming. The model is a decoder-only transformer with Multi-Query Attention and with a context length of 8192 tokens. The model was trained with Java subset of the StarCoderData dataset, which is ~22B tokens.
- **Repository:** [bigcode/Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
- **Paper:**
- **Language(s):** Java
# Intended Uses
Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. Being a small language model (SLM), the NT-Java-1.1B can be deployed on consumer-grade PCs. It outperforms comparably-sized open-source code models in Java programming tasks. Feel free to explore this powerful language model for your Java projects!
Quantized versions of NT-Java-1.1B, [NT-Java-1.1B-GGUF](https://huggingface.co/infosys/NT-Java-1.1B-GGUF), performs comparably to open 1B models on MultiPL-E Java code benchmarks and can be used with multiple frameworks, including CTranslate2, GPT4ALL, etc., making it versatile for various deployment scenarios.
**Feel free to share your generations in the Community tab!**
## Primary Use cases
The model is intended for commercial use for Java programming tasks. The model provides uses for applications which require:
1. Memory/compute constrained environments
2. Latency bound scenarios
3. Code generation/Completion task in Java
4. FIM task in Java
### Generation
```Java
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "infosys/NT-Java-1.1B"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("public class HelloWorld {\n public static void main(String[] args) {", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```
#### Quantized Versions through `bitsandbytes`
* _Using 8-bit precision (int8)_
```java
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
checkpoint = "infosys/NT-Java-1.1B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, quantization_config=quantization_config)
inputs = tokenizer.encode("public class HelloWorld {\n public static void main(String[] args) {", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```
### Attribution & Other Requirements
The pretraining dataset of the model was filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a [search index](https://huggingface.co/spaces/bigcode/starcoder-search) that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code.
# Limitations
The model, NT-Java-1.1B, has been trained on publicly available datasets and comes without any safety guarantees. Due to this, like all Language Models, its outputs cannot be reliably predicted and sometimes the generated code is not guaranteed to work as intended. It can also be inefficient and may contain bugs or exploits. Therefore, it's crucial for users and developers to conduct thorough safety testing and implement filtering mechanisms tailored to their needs.
# Training
## Model
- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective
- **Pretraining steps:** 50k
- **Pretraining tokens:** 22 Billion
- **Precision:** bfloat16
## Hardware
- **GPUs:** 6 NVIDIA A100 80GB
- **Training time:** 4 days
## Software
- **Orchestration:** [Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
# License
The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).
# Citation
```
@article{li2023starcoder,
title={NARROW TRANSFORMER: STARCODER-BASED JAVA-LM FOR DESKTOP},
author={Kamalkumar Rathinasamy and Balaji A J and Rajab Ali Mondal and Ankush Kumar and Harshini K and Gagan Gayari and Sreenivasa Raghavan Karumboor Seshadri},
year={2024},
eprint={2305.06161},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |