--- title: LangGraph Data Analyst Agent emoji: 🤖 colorFrom: blue colorTo: purple sdk: streamlit sdk_version: "1.28.0" app_file: app.py pinned: false license: mit --- # 🤖 LangGraph Data Analyst Agent An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations. ## 🌟 Features ### Core Functionality - **Multi-Agent Architecture**: Separate specialized agents for structured and unstructured queries - **Query Classification**: Automatic routing to appropriate agent based on query type - **Rich Tool Set**: Comprehensive tools for data analysis and insights ### Advanced Memory & Persistence - **Session Management**: Persistent conversations across page reloads and browser sessions - **User Profile Tracking**: Agent learns and remembers user interests and preferences - **Conversation History**: Full context retention using LangGraph checkpointers - **Cross-Session Continuity**: Resume conversations using session IDs ### Intelligent Recommendations - **Query Suggestions**: AI-powered recommendations based on conversation history - **Interactive Refinement**: Collaborative query building with the agent - **Context-Aware**: Suggestions based on user profile and previous interactions ## 🏗️ Architecture The agent uses LangGraph's multi-agent architecture with the following components: ``` User Query → Classifier → [Structured Agent | Unstructured Agent | Recommender] → Summarizer → Response ↓ Tool Nodes (Dataset Analysis Tools) ``` ### Agent Types 1. **Structured Agent**: Handles quantitative queries (statistics, examples, distributions) 2. **Unstructured Agent**: Handles qualitative queries (summaries, insights, patterns) 3. **Query Recommender**: Suggests follow-up questions based on context 4. **Summarizer**: Updates user profile and conversation memory ## 🚀 Setup Instructions ### Prerequisites - **Python Version**: 3.9 or higher - **API Key**: OpenAI API key or Nebius API key - **For Hugging Face Spaces**: Ensure your API key is set as a Space secret ### Installation 1. **Clone the repository**: ```bash git clone cd Agents ``` 2. **Install dependencies**: ```bash pip install -r requirements.txt ``` 3. **Configure API Key**: Create a `.env` file in the project root: ```bash # For OpenAI (recommended) OPENAI_API_KEY=your_openai_api_key_here # OR for Nebius NEBIUS_API_KEY=your_nebius_api_key_here ``` 4. **Run the application**: ```bash streamlit run app.py ``` 5. **Access the app**: Open your browser to `http://localhost:8501` ### Alternative Deployment #### For Hugging Face Spaces: 1. **Fork or upload this repository to Hugging Face Spaces** 2. **Set your API key as a Space secret:** - Go to your Space settings - Navigate to "Variables and secrets" - Add a secret named `NEBIUS_API_KEY` or `OPENAI_API_KEY` - Enter your API key as the value 3. **The app will start automatically** #### For other cloud deployment: ```bash export OPENAI_API_KEY=your_api_key_here # OR export NEBIUS_API_KEY=your_api_key_here ``` ## 🎯 Usage Guide ### Query Types #### Structured Queries (Quantitative Analysis) - "How many records are in each category?" - "What are the most common customer issues?" - "Show me 5 examples of billing problems" - "Get distribution of intents" #### Unstructured Queries (Qualitative Analysis) - "Summarize the refund category" - "What patterns do you see in payment issues?" - "Analyze customer sentiment in billing conversations" - "What insights can you provide about technical support?" #### Memory & Recommendations - "What do you remember about me?" - "What should I query next?" - "Advise me what to explore" - "Recommend follow-up questions" ### Session Management #### Creating Sessions - **New Session**: Click "🆕 New Session" to start fresh - **Auto-Generated**: Each new browser session gets a unique ID #### Resuming Sessions 1. Copy your session ID from the sidebar (e.g., `a1b2c3d4...`) 2. Enter the full session ID in "Join Existing Session" 3. Click "🔗 Join Session" to resume #### Cross-Tab Persistence - Open multiple tabs with the same session ID - Conversations sync across all tabs - Memory and user profile persist ## 🧠 Memory System ### User Profile Tracking The agent automatically tracks: - **Interests**: Topics and categories you frequently ask about - **Expertise Level**: Inferred from question complexity (beginner/intermediate/advanced) - **Preferences**: Analysis style preferences (quantitative vs qualitative) - **Query History**: Recent questions for context ### Conversation Persistence - **Thread-based**: Each session has a unique thread ID - **Checkpoint System**: LangGraph automatically saves state after each interaction - **Cross-Session**: Resume conversations days or weeks later ### Memory Queries Ask the agent what it remembers: ``` "What do you remember about me?" "What are my interests?" "What have I asked about before?" ``` ## 🔧 Testing the Agent ### Basic Functionality Tests 1. **Classification Test**: ``` Query: "How many categories are there?" Expected: Routes to Structured Agent → Uses get_dataset_stats tool ``` 2. **Follow-up Memory Test**: ``` Query 1: "Show me billing examples" Query 2: "Show me more examples" Expected: Agent remembers previous context about billing ``` 3. **User Profile Test**: ``` Query 1: "I'm interested in refund patterns" Query 2: "What do you remember about me?" Expected: Agent mentions interest in refunds ``` 4. **Recommendation Test**: ``` Query: "What should I query next?" Expected: Personalized suggestions based on history ``` ### Advanced Feature Tests 1. **Session Persistence**: - Ask a question, reload the page - Verify conversation history remains - Verify user profile persists 2. **Cross-Session Memory**: - Note your session ID - Close browser completely - Reopen and join the same session - Verify full conversation and profile restoration 3. **Interactive Recommendations**: ``` User: "Advise me what to query next" Agent: "Based on your interest in billing, you might want to analyze refund patterns." User: "I'd rather see examples instead" Agent: "Then I suggest showing 5 examples of refund requests." User: "Please do so" Expected: Agent executes the refined query ``` ## 📁 File Structure ``` Agents/ ├── README.md # This file ├── requirements.txt # Python dependencies ├── .env # API keys (create this) ├── app.py # LangGraph Streamlit app ├── langgraph_agent.py # LangGraph agent implementation ├── agent-memory.ipynb # Memory example notebook ├── test_agent.py # Test suite └── DEPLOYMENT_GUIDE.md # Original deployment guide ``` ## 🛠️ Technical Implementation ### LangGraph Components **State Management**: ```python class AgentState(TypedDict): messages: List[Any] query_type: Optional[str] user_profile: Optional[Dict[str, Any]] session_context: Optional[Dict[str, Any]] ``` **Tool Categories**: - **Structured Tools**: Statistics, distributions, examples, search - **Unstructured Tools**: Summaries, insights, pattern analysis - **Memory Tools**: Profile updates, preference tracking **Graph Flow**: 1. **Classifier**: Determines query type 2. **Agent Selection**: Routes to appropriate specialist 3. **Tool Execution**: Dynamic tool usage based on needs 4. **Memory Update**: Profile and context updates 5. **Response Generation**: Final answer with memory integration ### Memory Architecture **Checkpointer**: LangGraph's `MemorySaver` for conversation persistence **Thread Management**: Unique thread IDs for session isolation **Profile Synthesis**: LLM-powered extraction of user characteristics **Context Retention**: Full conversation history with temporal awareness ## 🔍 Troubleshooting ### Common Issues 1. **API Key Errors**: - Verify `.env` file exists and has correct key - Check environment variable is set in deployment - Ensure API key has sufficient credits 2. **Memory Not Persisting**: - Verify session ID remains consistent - Check browser localStorage not being cleared - Ensure thread_id parameter is passed correctly 3. **Dataset Loading Issues**: - Check internet connection for Hugging Face datasets - Verify datasets library is installed - Try clearing Streamlit cache: `streamlit cache clear` 4. **Tool Execution Errors**: - Verify all dependencies in requirements.txt are installed - Check dataset is properly loaded - Review error messages in Streamlit interface ### Debug Mode Enable debug logging by setting: ```python import logging logging.basicConfig(level=logging.DEBUG) ``` ## 🎓 Learning Objectives This implementation demonstrates: 1. **LangGraph Multi-Agent Systems**: Specialized agents for different query types 2. **Memory & Persistence**: Conversation continuity across sessions 3. **Tool Integration**: Dynamic tool selection and execution 4. **State Management**: Complex state updates and routing 5. **User Experience**: Session management and interactive features ## 🚀 Future Enhancements Potential improvements: - **Database Persistence**: Replace MemorySaver with PostgreSQL checkpointer - **Advanced Analytics**: More sophisticated data analysis tools - **Export Features**: PDF/CSV report generation - **User Authentication**: Multi-user support with profiles - **Real-time Collaboration**: Shared sessions between users ## 📄 License This project is for educational purposes as part of a data science curriculum. ## 🤝 Contributing This is an assignment project. For questions or issues, please contact the course instructors. --- **Built with**: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets