Update README.md

a30d56a verified 4 months ago

4.37 kB

	---
	license: apache-2.0
	tags:
	- mistral
	- 7b
	- lora
	- fine-tuning
	- indic-align
	- Malayalam
	- conversational-ai
	---

	# Model Card for dhee-chat-mistral-ml

	A fine-tuned Malayalam conversational model based on `mistralai/Mistral-7B-v0.3` , optimized for Malayalam language understanding and generation.

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kyyUEQ3LVwmTge8zN496Kx-SyPn7e8rV?usp=sharing)

	## Model Details

	* Base Model: Mistral 7B v0.3
	* Fine-tuning Method: LoRA (Low-Rank Adaptation)
	* Dataset: `ai4bharat/indic-align`
	* Language: Malayalam
	* Model ID: `dheeyantra/dhee-chat-mistral-ml`

	## Intended Uses & Limitations

	This model is intended for use in Malayalam conversational applications, such as chatbots and virtual assistants. As it is fine-tuned on the `ai4bharat/indic-align` dataset, its knowledge and conversational style are primarily shaped by this data.

	Limitations:
	* The model's responses are based on the patterns and information present in the training data. It may generate incorrect or biased information.
	* Performance may vary depending on the complexity and nuance of the input.
	* The model is primarily focused on Malayalam and may not perform well in other languages or code-mixed scenarios unless explicitly trained for them.

	## How to Get Started with Hugging Face Transformers

	You can use the following Python code to load and run inference with the `dheeyantra/dhee-chat-mistral-ml` model:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_path = "dheeyantra/dhee-chat-mistral-ml"

	# Load tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(model_path)

	# Move model to GPU if available
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)

	# Prepare chat messages
	messages = [
	{"role": "User", "content": "എത്ര വേദങ്ങളുണ്ട്?"},
	{"role": "Dhee", "content": "നാല് വേദങ്ങളുണ്ട്: ഋഗ്വേദം, യജുർവേദം, സാമവേദം, അഥർവവേദം."},
	{"role": "User", "content": "ഋഗ്വേദത്തെക്കുറിച്ച് കൂടുതൽ പറയൂ?"}
	]

	# Apply chat template to get prompt
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	# Tokenize prompt
	inputs = tokenizer(prompt, return_tensors="pt").to(device)

	# Generate output
	with torch.no_grad():
	output_ids = model.generate(
	**inputs,
	max_new_tokens=64,
	do_sample=True,
	temperature=0.9,
	top_p=0.95,
	pad_token_id=tokenizer.eos_token_id
	)

	# Decode generated text
	generated_text = tokenizer.decode(output_ids[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True)
	results = [{"generated_text": generated_text}]

	print("Generated text:")
	print(results[0]['generated_text'])
	```

	## Disclaimer
	This model is provided as-is. Users should be aware of its potential limitations and biases before deploying it in any application. Responsible AI practices should be followed.


	## Training Configuration

	The model was fine-tuned using the following LoRA and training parameters:

	### LoRA Parameters:
	* `r`: 16
	* `target_modules`: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
	* `lora_alpha`: 16
	* `lora_dropout`: 0
	* `bias`: "none"
	* `use_gradient_checkpointing`: "unsloth"
	* `use_rslora`: False
	* `loftq_config`: None

	### Training Arguments:
	* `gradient_accumulation_steps`: 4
	* `warmup_ratio`: 0.03
	* `fp16`: True
	* `optim`: "adamw_8bit"
	* `max_seq_length`: 32768

	## Acknowledgements
	We extend our sincere gratitude to the following organizations for their invaluable contributions to this project:

	* NxtGen: For generously providing the necessary infrastructure that powered the model training.
	* AI4Bharat: For developing and making available the indic-align dataset, which was crucial for fine-tuning this model.



	## Citation
	If you use this model in your research or applications, please cite:

	```bibtex
	@misc{dheenxtgen2025,
	title={ dhee-chat-mistral-ml : A Compact Language Model for Malayalam},
	author={Dheeyantra Research Labs},
	year={2025},}
	}
	```