--- base_model: mistralai/Mistral-7B-Instruct-v0.3 library_name: transformers model_name: Doctor_AI_LoRA-Mistral-7B-Instructritvik77 tags: - generated_from_trainer - trl - medical - Doctor - PEFT - MEDICAL - AIMEDICAL - DOCTORai licence: license license: apache-2.0 datasets: - FreedomIntelligence/medical-o1-reasoning-SFT pipeline_tag: text-generation --- # Model Card for Doctor_AI_LoRA-Mistral-7B-Instructritvik77 This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3). It has been trained using [TRL](https://github.com/huggingface/trl). ## Quick start ```python # from peft import PeftModel, PeftConfig # from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig # from datasets import load_dataset # import torch # # Quantization config for 4-bit loading # bnb_config = BitsAndBytesConfig( # load_in_4bit=True, # bnb_4bit_quant_type="nf4", # bnb_4bit_compute_dtype=torch.bfloat16, # bnb_4bit_use_double_quant=True, # ) # # Repo ID for the PEFT model # peft_model_id = f"{username}/{output_dir}" # e.g., ritvik77/Mixtral-7B-LoRA-Salesforce-Optimized-AI-AgentCall # device = "auto" # # Load PEFT config from the Hub # config = PeftConfig.from_pretrained(peft_model_id) # # Load the base model (e.g., Mistral-7B) with quantization # model = AutoModelForCausalLM.from_pretrained( # config.base_model_name_or_path, # Base model ID stored in PEFT config # device_map="auto", # quantization_config=bnb_config, # Apply 4-bit quantization # ) # # Load tokenizer from the PEFT model repo # tokenizer = AutoTokenizer.from_pretrained(peft_model_id) # # Resize token embeddings to match tokenizer (if needed) # model.resize_token_embeddings(len(tokenizer)) # # Load PEFT adapters and apply them to the base model # model = PeftModel.from_pretrained(model, peft_model_id) # # Convert model to bfloat16 and set to evaluation mode # model.to(torch.bfloat16) # model.eval() import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel, PeftConfig # ✅ Quantization config for 4-bit loading (Memory Optimization) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", # ✅ Improved precision for LoRA weights bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, # ✅ Reduces VRAM overhead ) # ✅ Load tokenizer from fine-tuned checkpoint (Ensures token consistency) peft_model_id = "ritvik77/Doctor_AI_LoRA-Mistral-7B-Instructritvik77" tokenizer = AutoTokenizer.from_pretrained(peft_model_id, trust_remote_code=True) # ✅ Ensure `pad_token` is correctly assigned if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token # ✅ Load Base Model with Quantization for Memory Efficiency model_name = "mistralai/Mistral-7B-Instruct-v0.3" model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", # ✅ Efficiently maps to available GPUs quantization_config=bnb_config, # ✅ Efficient quantization for large models torch_dtype=torch.bfloat16 ) # ✅ Resize Token Embeddings BEFORE Loading LoRA Adapter (Prevents size mismatch) model.resize_token_embeddings(len(tokenizer)) # ✅ Load PEFT Adapter (LoRA Weights) model = PeftModel.from_pretrained(model, peft_model_id) # ✅ Unfreeze LoRA layers to ensure they are trainable for name, param in model.named_parameters(): if "lora" in name: param.requires_grad = True # ✅ Confirm LoRA Layers Are Active if hasattr(model, 'print_trainable_parameters'): model.print_trainable_parameters() else: print("❗ Warning: LoRA adapter may not have loaded correctly.") # ✅ Ensure model is in evaluation mode for inference model.eval() # ✅ Sample Inference Code def generate_response(prompt, max_new_tokens=300, temperature=0.7): inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=True, temperature=temperature ) return tokenizer.decode(outputs[0], skip_special_tokens=True) # ✅ Sample Prompt for Medical Diagnosis prompt = "Patient reports chest pain and shortness of breath. What might be the diagnosis?" response = generate_response(prompt) print("\n🩺 **Diagnosis:**", response) print("🚀 PEFT model loaded successfully with resized embeddings!") ## Training procedure This model was trained with SFT. ### Framework versions - TRL: 0.15.2 - Transformers: 4.48.3 - Pytorch: 2.5.1+cu124 - Datasets: 3.3.2 - Tokenizers: 0.21.0 ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```