--- license: apache-2.0 tags: - mistral - 7b - lora - fine-tuning - indic-align - Malayalam - conversational-ai --- # Model Card for dhee-chat-mistral-ml A fine-tuned Malayalam conversational model based on `mistralai/Mistral-7B-v0.3` , optimized for Malayalam language understanding and generation. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kyyUEQ3LVwmTge8zN496Kx-SyPn7e8rV?usp=sharing) ## Model Details * **Base Model:** Mistral 7B v0.3 * **Fine-tuning Method:** LoRA (Low-Rank Adaptation) * **Dataset:** `ai4bharat/indic-align` * **Language:** Malayalam * **Model ID:** `dheeyantra/dhee-chat-mistral-ml` ## Intended Uses & Limitations This model is intended for use in Malayalam conversational applications, such as chatbots and virtual assistants. As it is fine-tuned on the `ai4bharat/indic-align` dataset, its knowledge and conversational style are primarily shaped by this data. Limitations: * The model's responses are based on the patterns and information present in the training data. It may generate incorrect or biased information. * Performance may vary depending on the complexity and nuance of the input. * The model is primarily focused on Malayalam and may not perform well in other languages or code-mixed scenarios unless explicitly trained for them. ## How to Get Started with Hugging Face Transformers You can use the following Python code to load and run inference with the `dheeyantra/dhee-chat-mistral-ml` model: ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_path = "dheeyantra/dhee-chat-mistral-ml" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path) # Move model to GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) # Prepare chat messages messages = [ {"role": "User", "content": "എത്ര വേദങ്ങളുണ്ട്?"}, {"role": "Dhee", "content": "നാല് വേദങ്ങളുണ്ട്: ഋഗ്വേദം, യജുർവേദം, സാമവേദം, അഥർവവേദം."}, {"role": "User", "content": "ഋഗ്വേദത്തെക്കുറിച്ച് കൂടുതൽ പറയൂ?"} ] # Apply chat template to get prompt prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) # Tokenize prompt inputs = tokenizer(prompt, return_tensors="pt").to(device) # Generate output with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens=64, do_sample=True, temperature=0.9, top_p=0.95, pad_token_id=tokenizer.eos_token_id ) # Decode generated text generated_text = tokenizer.decode(output_ids[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True) results = [{"generated_text": generated_text}] print("Generated text:") print(results[0]['generated_text']) ``` ## Disclaimer This model is provided as-is. Users should be aware of its potential limitations and biases before deploying it in any application. Responsible AI practices should be followed. ## Training Configuration The model was fine-tuned using the following LoRA and training parameters: ### LoRA Parameters: * `r`: 16 * `target_modules`: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"] * `lora_alpha`: 16 * `lora_dropout`: 0 * `bias`: "none" * `use_gradient_checkpointing`: "unsloth" * `use_rslora`: False * `loftq_config`: None ### Training Arguments: * `gradient_accumulation_steps`: 4 * `warmup_ratio`: 0.03 * `fp16`: True * `optim`: "adamw_8bit" * `max_seq_length`: 32768 ## Acknowledgements We extend our sincere gratitude to the following organizations for their invaluable contributions to this project: * NxtGen: For generously providing the necessary infrastructure that powered the model training. * AI4Bharat: For developing and making available the indic-align dataset, which was crucial for fine-tuning this model. ## Citation If you use this model in your research or applications, please cite: ```bibtex @misc{dheenxtgen2025, title={ dhee-chat-mistral-ml : A Compact Language Model for Malayalam}, author={Dheeyantra Research Labs}, year={2025},} } ```