SaritMeshesha commited on
Commit
b2706cf
·
verified ·
1 Parent(s): 36becb3

upload app

Browse files
Files changed (4) hide show
  1. README.md +303 -17
  2. app.py +620 -0
  3. langgraph_agent.py +651 -0
  4. requirements.txt +12 -3
README.md CHANGED
@@ -1,20 +1,306 @@
1
- ---
2
- title: Langraph Llm Data Analyst Agent
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Streamlit template space
12
- license: mit
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
1
+ # 🤖 LangGraph Data Analyst Agent
2
+
3
+ An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations.
4
+
5
+ ## 🌟 Features
6
+
7
+ ### Core Functionality
8
+ - **Multi-Agent Architecture**: Separate specialized agents for structured and unstructured queries
9
+ - **Query Classification**: Automatic routing to appropriate agent based on query type
10
+ - **Rich Tool Set**: Comprehensive tools for data analysis and insights
11
+
12
+ ### Advanced Memory & Persistence
13
+ - **Session Management**: Persistent conversations across page reloads and browser sessions
14
+ - **User Profile Tracking**: Agent learns and remembers user interests and preferences
15
+ - **Conversation History**: Full context retention using LangGraph checkpointers
16
+ - **Cross-Session Continuity**: Resume conversations using session IDs
17
+
18
+ ### Intelligent Recommendations
19
+ - **Query Suggestions**: AI-powered recommendations based on conversation history
20
+ - **Interactive Refinement**: Collaborative query building with the agent
21
+ - **Context-Aware**: Suggestions based on user profile and previous interactions
22
+
23
+ ## 🏗️ Architecture
24
+
25
+ The agent uses LangGraph's multi-agent architecture with the following components:
26
+
27
+ ```
28
+ User Query → Classifier → [Structured Agent | Unstructured Agent | Recommender] → Summarizer → Response
29
+
30
+ Tool Nodes (Dataset Analysis Tools)
31
+ ```
32
+
33
+ ### Agent Types
34
+ 1. **Structured Agent**: Handles quantitative queries (statistics, examples, distributions)
35
+ 2. **Unstructured Agent**: Handles qualitative queries (summaries, insights, patterns)
36
+ 3. **Query Recommender**: Suggests follow-up questions based on context
37
+ 4. **Summarizer**: Updates user profile and conversation memory
38
+
39
+ ## 🚀 Setup Instructions
40
+
41
+ ### Prerequisites
42
+ - **Python Version**: 3.9 or higher
43
+ - **API Key**: OpenAI API key or Nebius API key
44
+
45
+ ### Installation
46
+
47
+ 1. **Clone the repository**:
48
+ ```bash
49
+ git clone <repository-url>
50
+ cd Agents
51
+ ```
52
+
53
+ 2. **Install dependencies**:
54
+ ```bash
55
+ pip install -r requirements.txt
56
+ ```
57
+
58
+ 3. **Configure API Key**:
59
+
60
+ Create a `.env` file in the project root:
61
+ ```bash
62
+ # For OpenAI (recommended)
63
+ OPENAI_API_KEY=your_openai_api_key_here
64
+
65
+ # OR for Nebius
66
+ NEBIUS_API_KEY=your_nebius_api_key_here
67
+ ```
68
+
69
+ 4. **Run the application**:
70
+ ```bash
71
+ streamlit run app_langgraph.py
72
+ ```
73
+
74
+ 5. **Access the app**:
75
+ Open your browser to `http://localhost:8501`
76
+
77
+ ### Alternative Deployment
78
+
79
+ For cloud deployment, set the environment variable:
80
+ ```bash
81
+ export OPENAI_API_KEY=your_api_key_here
82
+ # OR
83
+ export NEBIUS_API_KEY=your_api_key_here
84
+ ```
85
+
86
+ ## 🎯 Usage Guide
87
+
88
+ ### Query Types
89
+
90
+ #### Structured Queries (Quantitative Analysis)
91
+ - "How many records are in each category?"
92
+ - "What are the most common customer issues?"
93
+ - "Show me 5 examples of billing problems"
94
+ - "Get distribution of intents"
95
+
96
+ #### Unstructured Queries (Qualitative Analysis)
97
+ - "Summarize the refund category"
98
+ - "What patterns do you see in payment issues?"
99
+ - "Analyze customer sentiment in billing conversations"
100
+ - "What insights can you provide about technical support?"
101
+
102
+ #### Memory & Recommendations
103
+ - "What do you remember about me?"
104
+ - "What should I query next?"
105
+ - "Advise me what to explore"
106
+ - "Recommend follow-up questions"
107
+
108
+ ### Session Management
109
+
110
+ #### Creating Sessions
111
+ - **New Session**: Click "🆕 New Session" to start fresh
112
+ - **Auto-Generated**: Each new browser session gets a unique ID
113
+
114
+ #### Resuming Sessions
115
+ 1. Copy your session ID from the sidebar (e.g., `a1b2c3d4...`)
116
+ 2. Enter the full session ID in "Join Existing Session"
117
+ 3. Click "🔗 Join Session" to resume
118
+
119
+ #### Cross-Tab Persistence
120
+ - Open multiple tabs with the same session ID
121
+ - Conversations sync across all tabs
122
+ - Memory and user profile persist
123
+
124
+ ## 🧠 Memory System
125
+
126
+ ### User Profile Tracking
127
+ The agent automatically tracks:
128
+ - **Interests**: Topics and categories you frequently ask about
129
+ - **Expertise Level**: Inferred from question complexity (beginner/intermediate/advanced)
130
+ - **Preferences**: Analysis style preferences (quantitative vs qualitative)
131
+ - **Query History**: Recent questions for context
132
+
133
+ ### Conversation Persistence
134
+ - **Thread-based**: Each session has a unique thread ID
135
+ - **Checkpoint System**: LangGraph automatically saves state after each interaction
136
+ - **Cross-Session**: Resume conversations days or weeks later
137
+
138
+ ### Memory Queries
139
+ Ask the agent what it remembers:
140
+ ```
141
+ "What do you remember about me?"
142
+ "What are my interests?"
143
+ "What have I asked about before?"
144
+ ```
145
 
146
+ ## 🔧 Testing the Agent
147
 
148
+ ### Basic Functionality Tests
149
+
150
+ 1. **Classification Test**:
151
+ ```
152
+ Query: "How many categories are there?"
153
+ Expected: Routes to Structured Agent → Uses get_dataset_stats tool
154
+ ```
155
+
156
+ 2. **Follow-up Memory Test**:
157
+ ```
158
+ Query 1: "Show me billing examples"
159
+ Query 2: "Show me more examples"
160
+ Expected: Agent remembers previous context about billing
161
+ ```
162
+
163
+ 3. **User Profile Test**:
164
+ ```
165
+ Query 1: "I'm interested in refund patterns"
166
+ Query 2: "What do you remember about me?"
167
+ Expected: Agent mentions interest in refunds
168
+ ```
169
+
170
+ 4. **Recommendation Test**:
171
+ ```
172
+ Query: "What should I query next?"
173
+ Expected: Personalized suggestions based on history
174
+ ```
175
+
176
+ ### Advanced Feature Tests
177
+
178
+ 1. **Session Persistence**:
179
+ - Ask a question, reload the page
180
+ - Verify conversation history remains
181
+ - Verify user profile persists
182
+
183
+ 2. **Cross-Session Memory**:
184
+ - Note your session ID
185
+ - Close browser completely
186
+ - Reopen and join the same session
187
+ - Verify full conversation and profile restoration
188
+
189
+ 3. **Interactive Recommendations**:
190
+ ```
191
+ User: "Advise me what to query next"
192
+ Agent: "Based on your interest in billing, you might want to analyze refund patterns."
193
+ User: "I'd rather see examples instead"
194
+ Agent: "Then I suggest showing 5 examples of refund requests."
195
+ User: "Please do so"
196
+ Expected: Agent executes the refined query
197
+ ```
198
+
199
+ ## 📁 File Structure
200
+
201
+ ```
202
+ Agents/
203
+ ├── README.md # This file
204
+ ├── requirements.txt # Python dependencies
205
+ ├── .env # API keys (create this)
206
+ ├── app_langgraph.py # New LangGraph Streamlit app
207
+ ├── langgraph_agent.py # LangGraph agent implementation
208
+ ├── app.py # Original app (for reference)
209
+ ├── agent-memory.ipynb # Memory example notebook
210
+ └── DEPLOYMENT_GUIDE.md # Original deployment guide
211
+ ```
212
+
213
+ ## 🛠️ Technical Implementation
214
+
215
+ ### LangGraph Components
216
+
217
+ **State Management**:
218
+ ```python
219
+ class AgentState(TypedDict):
220
+ messages: List[Any]
221
+ query_type: Optional[str]
222
+ user_profile: Optional[Dict[str, Any]]
223
+ session_context: Optional[Dict[str, Any]]
224
+ ```
225
+
226
+ **Tool Categories**:
227
+ - **Structured Tools**: Statistics, distributions, examples, search
228
+ - **Unstructured Tools**: Summaries, insights, pattern analysis
229
+ - **Memory Tools**: Profile updates, preference tracking
230
+
231
+ **Graph Flow**:
232
+ 1. **Classifier**: Determines query type
233
+ 2. **Agent Selection**: Routes to appropriate specialist
234
+ 3. **Tool Execution**: Dynamic tool usage based on needs
235
+ 4. **Memory Update**: Profile and context updates
236
+ 5. **Response Generation**: Final answer with memory integration
237
+
238
+ ### Memory Architecture
239
+
240
+ **Checkpointer**: LangGraph's `MemorySaver` for conversation persistence
241
+ **Thread Management**: Unique thread IDs for session isolation
242
+ **Profile Synthesis**: LLM-powered extraction of user characteristics
243
+ **Context Retention**: Full conversation history with temporal awareness
244
+
245
+ ## 🔍 Troubleshooting
246
+
247
+ ### Common Issues
248
+
249
+ 1. **API Key Errors**:
250
+ - Verify `.env` file exists and has correct key
251
+ - Check environment variable is set in deployment
252
+ - Ensure API key has sufficient credits
253
+
254
+ 2. **Memory Not Persisting**:
255
+ - Verify session ID remains consistent
256
+ - Check browser localStorage not being cleared
257
+ - Ensure thread_id parameter is passed correctly
258
+
259
+ 3. **Dataset Loading Issues**:
260
+ - Check internet connection for Hugging Face datasets
261
+ - Verify datasets library is installed
262
+ - Try clearing Streamlit cache: `streamlit cache clear`
263
+
264
+ 4. **Tool Execution Errors**:
265
+ - Verify all dependencies in requirements.txt are installed
266
+ - Check dataset is properly loaded
267
+ - Review error messages in Streamlit interface
268
+
269
+ ### Debug Mode
270
+
271
+ Enable debug logging by setting:
272
+ ```python
273
+ import logging
274
+ logging.basicConfig(level=logging.DEBUG)
275
+ ```
276
+
277
+ ## 🎓 Learning Objectives
278
+
279
+ This implementation demonstrates:
280
+
281
+ 1. **LangGraph Multi-Agent Systems**: Specialized agents for different query types
282
+ 2. **Memory & Persistence**: Conversation continuity across sessions
283
+ 3. **Tool Integration**: Dynamic tool selection and execution
284
+ 4. **State Management**: Complex state updates and routing
285
+ 5. **User Experience**: Session management and interactive features
286
+
287
+ ## 🚀 Future Enhancements
288
+
289
+ Potential improvements:
290
+ - **Database Persistence**: Replace MemorySaver with PostgreSQL checkpointer
291
+ - **Advanced Analytics**: More sophisticated data analysis tools
292
+ - **Export Features**: PDF/CSV report generation
293
+ - **User Authentication**: Multi-user support with profiles
294
+ - **Real-time Collaboration**: Shared sessions between users
295
+
296
+ ## 📄 License
297
+
298
+ This project is for educational purposes as part of a data science curriculum.
299
+
300
+ ## 🤝 Contributing
301
+
302
+ This is an assignment project. For questions or issues, please contact the course instructors.
303
+
304
+ ---
305
 
306
+ **Built with**: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets
 
app.py ADDED
@@ -0,0 +1,620 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import uuid
4
+ from datetime import datetime
5
+ from typing import Dict, List, Optional
6
+
7
+ import pandas as pd
8
+ import streamlit as st
9
+ from datasets import load_dataset
10
+ from dotenv import load_dotenv
11
+
12
+ from langgraph_agent import DataAnalystAgent, DatasetManager
13
+
14
+ # Load environment variables
15
+ load_dotenv()
16
+
17
+ # Set up page config
18
+ st.set_page_config(
19
+ page_title="🤖 LangGraph Data Analyst Agent",
20
+ layout="wide",
21
+ page_icon="🤖",
22
+ initial_sidebar_state="expanded",
23
+ )
24
+
25
+ # Custom CSS for styling
26
+ st.markdown(
27
+ """
28
+ <style>
29
+ /* Main theme colors */
30
+ :root {
31
+ --primary-color: #1f77b4;
32
+ --secondary-color: #ff7f0e;
33
+ --success-color: #2ca02c;
34
+ --error-color: #d62728;
35
+ --warning-color: #ff9800;
36
+ --background-color: #0e1117;
37
+ --card-background: #262730;
38
+ }
39
+
40
+ /* Custom styling for the main container */
41
+ .main-header {
42
+ background: linear-gradient(90deg, #1f77b4 0%, #ff7f0e 100%);
43
+ padding: 2rem 1rem;
44
+ border-radius: 10px;
45
+ margin-bottom: 2rem;
46
+ text-align: center;
47
+ color: white;
48
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
49
+ }
50
+
51
+ .main-header h1 {
52
+ margin: 0;
53
+ font-size: 2.5rem;
54
+ font-weight: 700;
55
+ text-shadow: 2px 2px 4px rgba(0,0,0,0.3);
56
+ }
57
+
58
+ .main-header p {
59
+ margin: 0.5rem 0 0 0;
60
+ font-size: 1.2rem;
61
+ opacity: 0.9;
62
+ }
63
+
64
+ /* Card styling */
65
+ .info-card {
66
+ background: var(--card-background);
67
+ padding: 1.5rem;
68
+ border-radius: 10px;
69
+ border-left: 4px solid var(--primary-color);
70
+ margin: 1rem 0;
71
+ box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
72
+ }
73
+
74
+ .success-card {
75
+ background: linear-gradient(90deg,
76
+ rgba(44, 160, 44, 0.1) 0%,
77
+ rgba(44, 160, 44, 0.05) 100%);
78
+ border-left: 4px solid var(--success-color);
79
+ padding: 1rem;
80
+ border-radius: 8px;
81
+ margin: 1rem 0;
82
+ }
83
+
84
+ .error-card {
85
+ background: linear-gradient(90deg,
86
+ rgba(214, 39, 40, 0.1) 0%,
87
+ rgba(214, 39, 40, 0.05) 100%);
88
+ border-left: 4px solid var(--error-color);
89
+ padding: 1rem;
90
+ border-radius: 8px;
91
+ margin: 1rem 0;
92
+ }
93
+
94
+ .memory-card {
95
+ background: linear-gradient(90deg,
96
+ rgba(255, 127, 14, 0.1) 0%,
97
+ rgba(255, 127, 14, 0.05) 100%);
98
+ border-left: 4px solid var(--secondary-color);
99
+ padding: 1rem;
100
+ border-radius: 8px;
101
+ margin: 1rem 0;
102
+ }
103
+
104
+ /* Chat message styling */
105
+ .user-message {
106
+ background: linear-gradient(90deg,
107
+ rgba(31, 119, 180, 0.1) 0%,
108
+ rgba(31, 119, 180, 0.05) 100%);
109
+ padding: 1rem;
110
+ border-radius: 10px;
111
+ margin: 0.5rem 0;
112
+ border-left: 4px solid var(--primary-color);
113
+ }
114
+
115
+ .assistant-message {
116
+ background: linear-gradient(90deg,
117
+ rgba(255, 127, 14, 0.1) 0%,
118
+ rgba(255, 127, 14, 0.05) 100%);
119
+ padding: 1rem;
120
+ border-radius: 10px;
121
+ margin: 0.5rem 0;
122
+ border-left: 4px solid var(--secondary-color);
123
+ }
124
+
125
+ .session-info {
126
+ background: var(--card-background);
127
+ padding: 1rem;
128
+ border-radius: 8px;
129
+ margin: 0.5rem 0;
130
+ border: 1px solid rgba(255, 255, 255, 0.1);
131
+ font-size: 0.9rem;
132
+ }
133
+
134
+ /* Animation for thinking indicator */
135
+ @keyframes pulse {
136
+ 0% { opacity: 1; }
137
+ 50% { opacity: 0.5; }
138
+ 100% { opacity: 1; }
139
+ }
140
+
141
+ .thinking-indicator {
142
+ animation: pulse 2s infinite;
143
+ }
144
+ </style>
145
+ """,
146
+ unsafe_allow_html=True,
147
+ )
148
+
149
+
150
+ # API configuration
151
+ def get_api_configuration():
152
+ """Get API configuration from environment variables."""
153
+ api_key = os.environ.get("NEBIUS_API_KEY") or os.environ.get("OPENAI_API_KEY")
154
+
155
+ if not api_key:
156
+ st.markdown(
157
+ """
158
+ <div class="error-card">
159
+ <h3>🔑 API Key Configuration Required</h3>
160
+
161
+ <h4>For Local Development:</h4>
162
+ <ol>
163
+ <li>Create a <code>.env</code> file in your project directory</li>
164
+ <li>Add your API key: <code>NEBIUS_API_KEY=your_api_key_here</code></li>
165
+ <li>Or use OpenAI: <code>OPENAI_API_KEY=your_api_key_here</code></li>
166
+ <li>Restart the application</li>
167
+ </ol>
168
+
169
+ <h4>For Deployment:</h4>
170
+ <ol>
171
+ <li>Set environment variable <code>NEBIUS_API_KEY</code> or
172
+ <code>OPENAI_API_KEY</code></li>
173
+ <li>Restart your application</li>
174
+ </ol>
175
+ </div>
176
+ """,
177
+ unsafe_allow_html=True,
178
+ )
179
+ st.stop()
180
+
181
+ return api_key
182
+
183
+
184
+ # Initialize the agent
185
+ @st.cache_resource
186
+ def get_agent(api_key: str) -> DataAnalystAgent:
187
+ """Initialize and cache the LangGraph agent."""
188
+ return DataAnalystAgent(api_key=api_key)
189
+
190
+
191
+ # Load dataset
192
+ @st.cache_data
193
+ def load_bitext_dataset():
194
+ """Load and cache the Bitext dataset."""
195
+ try:
196
+ dataset = load_dataset(
197
+ "bitext/Bitext-customer-support-llm-chatbot-training-dataset"
198
+ )
199
+ df = pd.DataFrame(dataset["train"])
200
+ return df
201
+ except Exception as e:
202
+ st.error(f"Error loading dataset: {e}")
203
+ return None
204
+
205
+
206
+ # Session management functions
207
+ def initialize_session():
208
+ """Initialize session state variables."""
209
+ if "session_id" not in st.session_state:
210
+ st.session_state.session_id = str(uuid.uuid4())
211
+
212
+ if "conversation_history" not in st.session_state:
213
+ st.session_state.conversation_history = []
214
+
215
+ if "user_profile" not in st.session_state:
216
+ st.session_state.user_profile = {}
217
+
218
+ if "current_thread_id" not in st.session_state:
219
+ st.session_state.current_thread_id = st.session_state.session_id
220
+
221
+
222
+ def create_new_session():
223
+ """Create a new session with a new thread ID."""
224
+ st.session_state.session_id = str(uuid.uuid4())
225
+ st.session_state.current_thread_id = st.session_state.session_id
226
+ st.session_state.conversation_history = []
227
+ st.session_state.user_profile = {}
228
+
229
+
230
+ def format_conversation_message(role: str, content: str, timestamp: str = None):
231
+ """Format a conversation message for display."""
232
+ if timestamp is None:
233
+ timestamp = datetime.now().strftime("%H:%M:%S")
234
+
235
+ if role == "human":
236
+ return f"""
237
+ <div class="user-message">
238
+ <strong>👤 You ({timestamp}):</strong><br>
239
+ {content}
240
+ </div>
241
+ """
242
+ else:
243
+ return f"""
244
+ <div class="assistant-message">
245
+ <strong>🤖 Agent ({timestamp}):</strong><br>
246
+ {content}
247
+ </div>
248
+ """
249
+
250
+
251
+ def display_user_profile(profile: Dict):
252
+ """Display user profile information."""
253
+ if not profile:
254
+ return
255
+
256
+ with st.expander("🧠 What I Remember About You", expanded=False):
257
+ col1, col2 = st.columns(2)
258
+
259
+ with col1:
260
+ st.markdown("**Your Interests:**")
261
+ interests = profile.get("interests", [])
262
+ if interests:
263
+ for interest in interests:
264
+ st.write(f"• {interest}")
265
+ else:
266
+ st.write("_No interests recorded yet_")
267
+
268
+ st.markdown("**Expertise Level:**")
269
+ expertise = profile.get("expertise_level", "beginner")
270
+ st.write(f"• {expertise.title()}")
271
+
272
+ with col2:
273
+ st.markdown("**Your Preferences:**")
274
+ preferences = profile.get("preferences", {})
275
+ if preferences:
276
+ for key, value in preferences.items():
277
+ st.write(f"• {key}: {value}")
278
+ else:
279
+ st.write("_No preferences recorded yet_")
280
+
281
+ st.markdown("**Recent Query Topics:**")
282
+ query_history = profile.get("query_history", [])
283
+ if query_history:
284
+ for query in query_history[-3:]: # Show last 3
285
+ st.write(f"• {query[:50]}...")
286
+ else:
287
+ st.write("_No query history yet_")
288
+
289
+
290
+ def main():
291
+ # Custom header
292
+ st.markdown(
293
+ """
294
+ <div class="main-header">
295
+ <h1>🤖 LangGraph Data Analyst Agent</h1>
296
+ <p>Intelligent Analysis with Memory & Recommendations</p>
297
+ </div>
298
+ """,
299
+ unsafe_allow_html=True,
300
+ )
301
+
302
+ # Initialize session
303
+ initialize_session()
304
+
305
+ # Get API configuration
306
+ api_key = get_api_configuration()
307
+
308
+ # Initialize agent
309
+ agent = get_agent(api_key)
310
+
311
+ # Load dataset
312
+ with st.spinner("🔄 Loading dataset..."):
313
+ df = load_bitext_dataset()
314
+
315
+ if df is None:
316
+ st.markdown(
317
+ """
318
+ <div class="error-card">
319
+ <h3>❌ Dataset Loading Failed</h3>
320
+ <p>Failed to load dataset. Please check your connection and try again.</p>
321
+ </div>
322
+ """,
323
+ unsafe_allow_html=True,
324
+ )
325
+ return
326
+
327
+ # Success message
328
+ st.markdown(
329
+ f"""
330
+ <div class="success-card">
331
+ <h3>✅ System Ready</h3>
332
+ <p>Dataset loaded with <strong>{len(df):,}</strong> records.
333
+ LangGraph agent initialized with memory.</p>
334
+ </div>
335
+ """,
336
+ unsafe_allow_html=True,
337
+ )
338
+
339
+ # Sidebar configuration
340
+ with st.sidebar:
341
+ st.markdown("## ⚙️ Session Management")
342
+
343
+ # Session ID management
344
+ st.markdown("### 🆔 Session Control")
345
+
346
+ col1, col2 = st.columns(2)
347
+ with col1:
348
+ if st.button("🆕 New Session", use_container_width=True):
349
+ create_new_session()
350
+ st.rerun()
351
+
352
+ with col2:
353
+ if st.button("🔄 Refresh", use_container_width=True):
354
+ st.rerun()
355
+
356
+ # Display session info
357
+ st.markdown(
358
+ f"""
359
+ <div class="session-info">
360
+ <strong>Current Session:</strong><br>
361
+ <code>{st.session_state.current_thread_id[:8]}...</code><br>
362
+ <strong>Messages:</strong> {len(st.session_state.conversation_history)}
363
+ </div>
364
+ """,
365
+ unsafe_allow_html=True,
366
+ )
367
+
368
+ # Custom session ID input
369
+ st.markdown("### 🔗 Join Existing Session")
370
+ custom_thread_id = st.text_input(
371
+ "Enter Session ID:",
372
+ placeholder="Enter full session ID to join...",
373
+ help="Use this to resume a previous conversation",
374
+ )
375
+
376
+ if st.button("🔗 Join Session") and custom_thread_id:
377
+ st.session_state.current_thread_id = custom_thread_id
378
+ # Load conversation history for this thread
379
+ history = agent.get_conversation_history(custom_thread_id)
380
+ st.session_state.conversation_history = history
381
+ # Load user profile for this thread
382
+ profile = agent.get_user_profile(custom_thread_id)
383
+ st.session_state.user_profile = profile
384
+ st.success(f"Joined session: {custom_thread_id[:8]}...")
385
+ st.rerun()
386
+
387
+ st.markdown("---")
388
+
389
+ # Dataset info
390
+ st.markdown("### 📊 Dataset Info")
391
+ col1, col2 = st.columns(2)
392
+ with col1:
393
+ st.metric("📝 Records", f"{len(df):,}")
394
+ with col2:
395
+ st.metric("📂 Categories", len(df["category"].unique()))
396
+
397
+ st.metric("🎯 Intents", len(df["intent"].unique()))
398
+
399
+ # Quick examples
400
+ st.markdown("### 💡 Try These Queries")
401
+ example_queries = [
402
+ "What are the most common categories?",
403
+ "Show me examples of billing issues",
404
+ "Summarize the refund category",
405
+ "What should I query next?",
406
+ "What do you remember about me?",
407
+ ]
408
+
409
+ for query in example_queries:
410
+ if st.button(f"💬 {query}", key=f"example_{hash(query)}"):
411
+ st.session_state.pending_query = query
412
+ st.rerun()
413
+
414
+ # Main content area
415
+ # Display user profile
416
+ if st.session_state.user_profile:
417
+ display_user_profile(st.session_state.user_profile)
418
+
419
+ # Dataset information in expandable section
420
+ with st.expander("📊 Dataset Information", expanded=False):
421
+ st.markdown("### Dataset Details")
422
+
423
+ metrics_col1, metrics_col2, metrics_col3, metrics_col4 = st.columns(4)
424
+ with metrics_col1:
425
+ st.metric("Total Records", f"{len(df):,}")
426
+ with metrics_col2:
427
+ st.metric("Columns", len(df.columns))
428
+ with metrics_col3:
429
+ st.metric("Categories", len(df["category"].unique()))
430
+ with metrics_col4:
431
+ st.metric("Intents", len(df["intent"].unique()))
432
+
433
+ st.markdown("### Sample Data")
434
+ st.dataframe(df.head(), use_container_width=True)
435
+
436
+ st.markdown("### Category Distribution")
437
+ st.bar_chart(df["category"].value_counts())
438
+
439
+ # User input section
440
+ st.markdown("## 💬 Chat with the Agent")
441
+
442
+ # Handle pending query from sidebar
443
+ has_pending_query = hasattr(st.session_state, "pending_query")
444
+ if has_pending_query:
445
+ user_question = st.session_state.pending_query
446
+ delattr(st.session_state, "pending_query")
447
+ else:
448
+ user_question = st.text_input(
449
+ "Ask your question:",
450
+ placeholder="e.g., What are the most common customer issues?",
451
+ key="user_input",
452
+ help="Ask about statistics, examples, insights, or request recommendations",
453
+ )
454
+
455
+ # Submit button
456
+ col1, col2, col3 = st.columns([1, 2, 1])
457
+ with col2:
458
+ submit_clicked = st.button("🚀 Send Message", use_container_width=True)
459
+
460
+ # Process query
461
+ if (submit_clicked or has_pending_query) and user_question:
462
+ # Add user message to local history
463
+ timestamp = datetime.now().strftime("%H:%M:%S")
464
+ st.session_state.conversation_history.append(
465
+ {"role": "human", "content": user_question, "timestamp": timestamp}
466
+ )
467
+
468
+ # Show thinking indicator
469
+ thinking_placeholder = st.empty()
470
+ thinking_placeholder.markdown(
471
+ """
472
+ <div class="thinking-indicator">
473
+ <div class="info-card">
474
+ ⚙️ <strong>Agent is thinking...</strong>
475
+ Processing your query through the LangGraph workflow.
476
+ </div>
477
+ </div>
478
+ """,
479
+ unsafe_allow_html=True,
480
+ )
481
+
482
+ try:
483
+ # Invoke the agent
484
+ result = agent.invoke(user_question, st.session_state.current_thread_id)
485
+
486
+ # Get the last assistant message
487
+ assistant_response = None
488
+ for msg in reversed(result["messages"]):
489
+ if (
490
+ hasattr(msg, "content")
491
+ and msg.content
492
+ and not isinstance(msg, type(user_question))
493
+ ):
494
+ # Check if this is an AI message (not human or tool message)
495
+ if not hasattr(msg, "tool_calls") or not msg.tool_calls:
496
+ if "human" not in str(type(msg)).lower():
497
+ content = msg.content
498
+
499
+ # Clean up Qwen model thinking tags
500
+ if "<think>" in content and "</think>" in content:
501
+ # Extract only the part after </think>
502
+ parts = content.split("</think>")
503
+ if len(parts) > 1:
504
+ content = parts[1].strip()
505
+
506
+ assistant_response = content
507
+ break
508
+
509
+ if not assistant_response:
510
+ assistant_response = "I processed your query but couldn't generate a response. Please try again."
511
+
512
+ # Add assistant response to local history
513
+ st.session_state.conversation_history.append(
514
+ {
515
+ "role": "assistant",
516
+ "content": assistant_response,
517
+ "timestamp": datetime.now().strftime("%H:%M:%S"),
518
+ }
519
+ )
520
+
521
+ # Update user profile from agent state
522
+ if result.get("user_profile"):
523
+ st.session_state.user_profile = result["user_profile"]
524
+
525
+ except Exception as e:
526
+ error_msg = f"Sorry, I encountered an error: {str(e)}"
527
+ st.session_state.conversation_history.append(
528
+ {
529
+ "role": "assistant",
530
+ "content": error_msg,
531
+ "timestamp": datetime.now().strftime("%H:%M:%S"),
532
+ }
533
+ )
534
+
535
+ finally:
536
+ thinking_placeholder.empty()
537
+
538
+ # Clear the input and rerun to show new messages
539
+ st.rerun()
540
+
541
+ # Display conversation
542
+ if st.session_state.conversation_history:
543
+ st.markdown("## 💭 Conversation")
544
+
545
+ # Display messages
546
+ for i, message in enumerate(st.session_state.conversation_history):
547
+ message_html = format_conversation_message(
548
+ message["role"], message["content"], message.get("timestamp", "")
549
+ )
550
+ st.markdown(message_html, unsafe_allow_html=True)
551
+
552
+ # Add separator except for last message
553
+ if i < len(st.session_state.conversation_history) - 1:
554
+ st.markdown("---")
555
+
556
+ # Action buttons
557
+ col1, col2, col3 = st.columns(3)
558
+
559
+ with col1:
560
+ if st.button("🗑️ Clear Chat"):
561
+ st.session_state.conversation_history = []
562
+ st.rerun()
563
+
564
+ with col2:
565
+ if st.button("💾 Export Chat"):
566
+ chat_data = {
567
+ "session_id": st.session_state.current_thread_id,
568
+ "timestamp": datetime.now().isoformat(),
569
+ "conversation": st.session_state.conversation_history,
570
+ "user_profile": st.session_state.user_profile,
571
+ }
572
+ st.download_button(
573
+ label="📥 Download JSON",
574
+ data=json.dumps(chat_data, indent=2),
575
+ file_name=f"chat_export_{st.session_state.current_thread_id[:8]}.json",
576
+ mime="application/json",
577
+ )
578
+
579
+ with col3:
580
+ if st.button("🤖 Get Recommendations"):
581
+ st.session_state.pending_query = "What should I query next?"
582
+ st.rerun()
583
+
584
+ # Instructions
585
+ with st.expander("📋 How to Use This Agent", expanded=False):
586
+ st.markdown(
587
+ """
588
+ ### 🎯 Query Types Supported:
589
+
590
+ **Structured Queries (Quantitative):**
591
+ - "How many records are in each category?"
592
+ - "Show me 5 examples of billing issues"
593
+ - "What are the most common intents?"
594
+
595
+ **Unstructured Queries (Qualitative):**
596
+ - "Summarize the refund category"
597
+ - "What patterns do you see in payment issues?"
598
+ - "Analyze customer sentiment in billing conversations"
599
+
600
+ **Memory & Recommendations:**
601
+ - "What do you remember about me?"
602
+ - "What should I query next?"
603
+ - "Advise me what to explore"
604
+
605
+ ### 🧠 Memory Features:
606
+ - **Session Persistence:** Your conversations are saved across page reloads
607
+ - **User Profile:** The agent learns about your interests and preferences
608
+ - **Query History:** Past queries influence future recommendations
609
+ - **Cross-Session:** Use session IDs to resume conversations later
610
+
611
+ ### 🔧 Advanced Features:
612
+ - **Multi-Agent Architecture:** Separate agents for different query types
613
+ - **Tool Usage:** Dynamic tool selection based on your needs
614
+ - **Interactive Recommendations:** Collaborative query refinement
615
+ """
616
+ )
617
+
618
+
619
+ if __name__ == "__main__":
620
+ main()
langgraph_agent.py ADDED
@@ -0,0 +1,651 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ from enum import Enum
4
+ from typing import Any, Dict, List, Optional, TypedDict
5
+
6
+ import pandas as pd
7
+ from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
8
+ from langchain_core.tools import tool
9
+ from langchain_openai import ChatOpenAI
10
+ from langgraph.checkpoint.memory import MemorySaver
11
+ from langgraph.graph import END, START, StateGraph
12
+ from langgraph.prebuilt import ToolNode
13
+ from pydantic import BaseModel, Field
14
+
15
+
16
+ # Enums for query types
17
+ class QueryType(str, Enum):
18
+ STRUCTURED = "structured"
19
+ UNSTRUCTURED = "unstructured"
20
+ OUT_OF_SCOPE = "out_of_scope"
21
+ RECOMMEND_QUERY = "recommend_query"
22
+
23
+
24
+ class AnalysisType(str, Enum):
25
+ QUANTITATIVE = "quantitative"
26
+ QUALITATIVE = "qualitative"
27
+ OUT_OF_SCOPE = "out_of_scope"
28
+
29
+
30
+ # State definition
31
+ class AgentState(TypedDict):
32
+ messages: List[Any]
33
+ query_type: Optional[str]
34
+ analysis_result: Optional[Dict[str, Any]]
35
+ user_profile: Optional[Dict[str, Any]]
36
+ session_context: Optional[Dict[str, Any]]
37
+ recommendations: Optional[List[str]]
38
+
39
+
40
+ # User profile model
41
+ class UserProfile(BaseModel):
42
+ interests: List[str] = Field(default_factory=list)
43
+ query_history: List[str] = Field(default_factory=list)
44
+ preferences: Dict[str, Any] = Field(default_factory=dict)
45
+ expertise_level: str = "beginner"
46
+
47
+
48
+ # Dataset management
49
+ class DatasetManager:
50
+ _instance = None
51
+ _df = None
52
+
53
+ def __new__(cls):
54
+ if cls._instance is None:
55
+ cls._instance = super(DatasetManager, cls).__new__(cls)
56
+ return cls._instance
57
+
58
+ def get_dataset(self) -> pd.DataFrame:
59
+ if self._df is None:
60
+ from datasets import load_dataset
61
+
62
+ dataset = load_dataset(
63
+ "bitext/Bitext-customer-support-llm-chatbot-training-dataset"
64
+ )
65
+ self._df = pd.DataFrame(dataset["train"])
66
+ return self._df
67
+
68
+
69
+ # Tools for structured queries (quantitative analysis)
70
+ @tool
71
+ def get_category_distribution() -> Dict[str, int]:
72
+ """Get the distribution of categories in the dataset."""
73
+ df = DatasetManager().get_dataset()
74
+ return df["category"].value_counts().to_dict()
75
+
76
+
77
+ @tool
78
+ def get_intent_distribution() -> Dict[str, int]:
79
+ """Get the distribution of intents in the dataset."""
80
+ df = DatasetManager().get_dataset()
81
+ return df["intent"].value_counts().to_dict()
82
+
83
+
84
+ @tool
85
+ def get_dataset_stats() -> Dict[str, Any]:
86
+ """Get basic statistics about the dataset."""
87
+ df = DatasetManager().get_dataset()
88
+ return {
89
+ "total_records": len(df),
90
+ "unique_categories": len(df["category"].unique()),
91
+ "unique_intents": len(df["intent"].unique()),
92
+ "columns": df.columns.tolist(),
93
+ }
94
+
95
+
96
+ @tool
97
+ def get_examples_by_category(category: str, n: int = 5) -> List[Dict[str, Any]]:
98
+ """Get examples from a specific category."""
99
+ df = DatasetManager().get_dataset()
100
+ filtered_df = df[df["category"].str.lower() == category.lower()]
101
+ if filtered_df.empty:
102
+ return []
103
+ return filtered_df.head(n).to_dict("records")
104
+
105
+
106
+ @tool
107
+ def get_examples_by_intent(intent: str, n: int = 5) -> List[Dict[str, Any]]:
108
+ """Get examples from a specific intent."""
109
+ df = DatasetManager().get_dataset()
110
+ filtered_df = df[df["intent"].str.lower() == intent.lower()]
111
+ if filtered_df.empty:
112
+ return []
113
+ return filtered_df.head(n).to_dict("records")
114
+
115
+
116
+ @tool
117
+ def search_conversations(query: str, n: int = 5) -> List[Dict[str, Any]]:
118
+ """Search for conversations containing specific keywords."""
119
+ df = DatasetManager().get_dataset()
120
+ mask = df["customer"].str.contains(query, case=False, na=False) | df[
121
+ "agent"
122
+ ].str.contains(query, case=False, na=False)
123
+ filtered_df = df[mask]
124
+ return filtered_df.head(n).to_dict("records")
125
+
126
+
127
+ # Tools for unstructured queries (qualitative analysis)
128
+ @tool
129
+ def get_category_summary(category: str) -> Dict[str, Any]:
130
+ """Get a summary of conversations in a specific category."""
131
+ df = DatasetManager().get_dataset()
132
+ filtered_df = df[df["category"].str.lower() == category.lower()]
133
+ if filtered_df.empty:
134
+ return {"error": f"No data found for category: {category}"}
135
+
136
+ return {
137
+ "category": category,
138
+ "count": len(filtered_df),
139
+ "unique_intents": filtered_df["intent"].nunique(),
140
+ "intents": filtered_df["intent"].value_counts().to_dict(),
141
+ "sample_conversations": filtered_df.head(3).to_dict("records"),
142
+ }
143
+
144
+
145
+ @tool
146
+ def get_intent_summary(intent: str) -> Dict[str, Any]:
147
+ """Get a summary of conversations for a specific intent."""
148
+ df = DatasetManager().get_dataset()
149
+ filtered_df = df[df["intent"].str.lower() == intent.lower()]
150
+ if filtered_df.empty:
151
+ return {"error": f"No data found for intent: {intent}"}
152
+
153
+ return {
154
+ "intent": intent,
155
+ "count": len(filtered_df),
156
+ "categories": filtered_df["category"].value_counts().to_dict(),
157
+ "sample_conversations": filtered_df.head(3).to_dict("records"),
158
+ }
159
+
160
+
161
+ # Memory tools
162
+ @tool
163
+ def update_user_profile(
164
+ interests: List[str], preferences: Dict[str, Any], expertise_level: str = "beginner"
165
+ ) -> Dict[str, Any]:
166
+ """Update the user's profile with new information."""
167
+ return {
168
+ "interests": interests,
169
+ "preferences": preferences,
170
+ "expertise_level": expertise_level,
171
+ "updated": True,
172
+ }
173
+
174
+
175
+ # Define tool lists for different agents
176
+ structured_tools = [
177
+ get_category_distribution,
178
+ get_intent_distribution,
179
+ get_dataset_stats,
180
+ get_examples_by_category,
181
+ get_examples_by_intent,
182
+ search_conversations,
183
+ ]
184
+
185
+ unstructured_tools = [
186
+ get_category_summary,
187
+ get_intent_summary,
188
+ search_conversations,
189
+ get_examples_by_category,
190
+ get_examples_by_intent,
191
+ ]
192
+
193
+ memory_tools = [update_user_profile]
194
+
195
+
196
+ class DataAnalystAgent:
197
+ def __init__(self, api_key: str, model_name: str = None):
198
+ # Determine if using Nebius or OpenAI based on API key source
199
+ is_nebius = os.environ.get("NEBIUS_API_KEY") == api_key
200
+
201
+ if is_nebius:
202
+ # Configure for Nebius API
203
+ self.llm = ChatOpenAI(
204
+ api_key=api_key,
205
+ model=model_name or "Qwen/Qwen3-30B-A3B",
206
+ base_url="https://api.studio.nebius.com/v1",
207
+ temperature=0,
208
+ )
209
+ else:
210
+ # Configure for OpenAI API
211
+ self.llm = ChatOpenAI(
212
+ api_key=api_key, model=model_name or "gpt-4o", temperature=0
213
+ )
214
+
215
+ self.memory = MemorySaver()
216
+ self.graph = self._build_graph()
217
+
218
+ def _build_graph(self) -> StateGraph:
219
+ """Build the LangGraph workflow."""
220
+ builder = StateGraph(AgentState)
221
+
222
+ # Add nodes
223
+ builder.add_node("classifier", self._classify_query)
224
+ builder.add_node("structured_agent", self._structured_agent)
225
+ builder.add_node("unstructured_agent", self._unstructured_agent)
226
+ builder.add_node("structured_tools", ToolNode(structured_tools))
227
+ builder.add_node("unstructured_tools", ToolNode(unstructured_tools))
228
+ builder.add_node("summarizer", self._update_summary)
229
+ builder.add_node("recommender", self._recommend_queries)
230
+ builder.add_node("out_of_scope", self._handle_out_of_scope)
231
+
232
+ # Add edges
233
+ builder.add_edge(START, "classifier")
234
+
235
+ # Conditional edges from classifier
236
+ builder.add_conditional_edges(
237
+ "classifier",
238
+ self._route_query,
239
+ {
240
+ "structured": "structured_agent",
241
+ "unstructured": "unstructured_agent",
242
+ "out_of_scope": "out_of_scope",
243
+ "recommend_query": "recommender",
244
+ },
245
+ )
246
+
247
+ # Tool routing for structured agent
248
+ builder.add_conditional_edges(
249
+ "structured_agent",
250
+ self._should_use_tools,
251
+ {"tools": "structured_tools", "end": "summarizer"},
252
+ )
253
+
254
+ # Tool routing for unstructured agent
255
+ builder.add_conditional_edges(
256
+ "unstructured_agent",
257
+ self._should_use_tools,
258
+ {"tools": "unstructured_tools", "end": "summarizer"},
259
+ )
260
+
261
+ # From tools back to respective agents
262
+ builder.add_edge("structured_tools", "structured_agent")
263
+ builder.add_edge("unstructured_tools", "unstructured_agent")
264
+
265
+ # End edges
266
+ builder.add_edge("summarizer", END)
267
+ builder.add_edge("out_of_scope", END)
268
+ builder.add_edge("recommender", END)
269
+
270
+ return builder.compile(checkpointer=self.memory)
271
+
272
+ def _classify_query(self, state: AgentState) -> AgentState:
273
+ """Classify the user query into different types."""
274
+ last_message = state["messages"][-1]
275
+ user_query = last_message.content.lower()
276
+
277
+ # Simple keyword-based classification for better reliability
278
+ # Check for recommendation requests first
279
+ if any(
280
+ word in user_query
281
+ for word in [
282
+ "what should i",
283
+ "what to query",
284
+ "recommend",
285
+ "suggest",
286
+ "advise",
287
+ "what next",
288
+ "what can i ask",
289
+ ]
290
+ ):
291
+ query_type = "recommend_query"
292
+
293
+ # Check for out of scope queries
294
+ elif any(
295
+ word in user_query
296
+ for word in [
297
+ "weather",
298
+ "news",
299
+ "sports",
300
+ "politics",
301
+ "cooking",
302
+ "travel",
303
+ "music",
304
+ "movies",
305
+ "games",
306
+ "programming",
307
+ "code",
308
+ ]
309
+ ) and not any(
310
+ word in user_query
311
+ for word in ["category", "intent", "customer", "support", "data", "records"]
312
+ ):
313
+ query_type = "out_of_scope"
314
+
315
+ # Check for unstructured/qualitative queries
316
+ elif any(
317
+ word in user_query
318
+ for word in [
319
+ "summarize",
320
+ "summary",
321
+ "patterns",
322
+ "insights",
323
+ "analysis",
324
+ "analyze",
325
+ "themes",
326
+ "trends",
327
+ "what patterns",
328
+ "understand",
329
+ ]
330
+ ):
331
+ query_type = "unstructured"
332
+
333
+ # Default to structured for data-related queries
334
+ else:
335
+ query_type = "structured"
336
+
337
+ # Double-check with LLM for edge cases, but use simpler prompt
338
+ if query_type == "out_of_scope":
339
+ simple_prompt = f"""
340
+ Is this question about customer support data analysis?
341
+ Question: "{last_message.content}"
342
+
343
+ Answer only "yes" or "no".
344
+ """
345
+
346
+ try:
347
+ response = self.llm.invoke([HumanMessage(content=simple_prompt)])
348
+ if "yes" in response.content.lower():
349
+ query_type = "structured" # Override if actually about data
350
+ except Exception:
351
+ pass # Keep original classification if LLM fails
352
+
353
+ state["query_type"] = query_type
354
+ return state
355
+
356
+ def _route_query(self, state: AgentState) -> str:
357
+ """Route to appropriate agent based on classification."""
358
+ return state["query_type"]
359
+
360
+ def _structured_agent(self, state: AgentState) -> AgentState:
361
+ """Handle structured/quantitative queries."""
362
+
363
+ system_prompt = """
364
+ You are a data analyst that MUST use tools to answer questions about
365
+ customer support data. You have access to these tools:
366
+
367
+ - get_category_distribution: Get category counts
368
+ - get_intent_distribution: Get intent counts
369
+ - get_dataset_stats: Get basic dataset statistics
370
+ - get_examples_by_category: Get examples from a category
371
+ - get_examples_by_intent: Get examples from an intent
372
+ - search_conversations: Search for conversations with keywords
373
+
374
+ IMPORTANT: Always use the appropriate tool to get real data.
375
+ Do NOT make up or guess answers. Use tools to get actual numbers.
376
+
377
+ For questions about:
378
+ - "How many categories" or "category distribution" → use get_category_distribution
379
+ - "How many intents" or "intent distribution" → use get_intent_distribution
380
+ - "Total records" or "dataset size" → use get_dataset_stats
381
+ - "Examples of X" → use get_examples_by_category or get_examples_by_intent
382
+ - "Search for X" → use search_conversations
383
+ """
384
+
385
+ llm_with_tools = self.llm.bind_tools(structured_tools)
386
+ messages = [SystemMessage(content=system_prompt)] + state["messages"]
387
+ response = llm_with_tools.invoke(messages)
388
+
389
+ state["messages"].append(response)
390
+ return state
391
+
392
+ def _unstructured_agent(self, state: AgentState) -> AgentState:
393
+ """Handle unstructured/qualitative queries."""
394
+
395
+ system_prompt = """
396
+ You are a data analyst that MUST use tools to provide insights about
397
+ customer support data. You have access to these tools:
398
+
399
+ - get_category_summary: Get detailed summary of a category
400
+ - get_intent_summary: Get detailed summary of an intent
401
+ - search_conversations: Search conversations for patterns
402
+ - get_examples_by_category: Get examples to analyze patterns
403
+ - get_examples_by_intent: Get examples to analyze patterns
404
+
405
+ IMPORTANT: Always use the appropriate tool to get real data.
406
+ Do NOT make up or guess insights. Use tools to get actual data first.
407
+
408
+ For questions about:
409
+ - "Summarize X category" → use get_category_summary
410
+ - "Analyze X intent" → use get_intent_summary
411
+ - "Patterns in X" → use get_examples_by_category or search_conversations
412
+ """
413
+
414
+ llm_with_tools = self.llm.bind_tools(unstructured_tools)
415
+ messages = [SystemMessage(content=system_prompt)] + state["messages"]
416
+ response = llm_with_tools.invoke(messages)
417
+
418
+ state["messages"].append(response)
419
+ return state
420
+
421
+ def _should_use_tools(self, state: AgentState) -> str:
422
+ """Determine if the agent should use tools or end."""
423
+ last_message = state["messages"][-1]
424
+
425
+ # Check if LLM made tool calls
426
+ if hasattr(last_message, "tool_calls") and last_message.tool_calls:
427
+ return "tools"
428
+
429
+ # If no tool calls but this is the first response from agent,
430
+ # force tool usage for data questions
431
+ messages = state["messages"]
432
+ human_messages = [msg for msg in messages if isinstance(msg, HumanMessage)]
433
+
434
+ if len(human_messages) >= 1:
435
+ last_human_msg = human_messages[-1].content.lower()
436
+
437
+ # Check if this looks like a data question that needs tools
438
+ needs_tools = any(
439
+ word in last_human_msg
440
+ for word in [
441
+ "how many",
442
+ "show me",
443
+ "examples",
444
+ "distribution",
445
+ "categories",
446
+ "intents",
447
+ "records",
448
+ "statistics",
449
+ "stats",
450
+ "count",
451
+ "total",
452
+ "billing",
453
+ "refund",
454
+ "payment",
455
+ "technical",
456
+ "support",
457
+ ]
458
+ )
459
+
460
+ # Count AI messages - if this is first AI response and needs tools, force it
461
+ ai_messages = [msg for msg in messages if not isinstance(msg, HumanMessage)]
462
+ if needs_tools and len(ai_messages) <= 1:
463
+ return "tools"
464
+
465
+ return "end"
466
+
467
+ def _update_summary(self, state: AgentState) -> AgentState:
468
+ """Update user profile/summary based on the interaction."""
469
+ user_profile = state.get("user_profile", {})
470
+ last_human_message = None
471
+
472
+ # Find the last human message
473
+ for msg in reversed(state["messages"]):
474
+ if isinstance(msg, HumanMessage):
475
+ last_human_message = msg
476
+ break
477
+
478
+ if last_human_message:
479
+ # Extract information about user interests
480
+ system_prompt = """
481
+ Based on the user's question, extract information about their
482
+ interests and update their profile. Consider:
483
+ - What categories/intents they're interested in
484
+ - Their level of technical detail preference
485
+ - Types of analysis they prefer
486
+
487
+ Return a JSON with:
488
+ {
489
+ "interests": ["list of topics they seem interested in"],
490
+ "preferences": {"any preferences about analysis style"},
491
+ "expertise_level": "beginner/intermediate/advanced"
492
+ }
493
+
494
+ If no clear information can be extracted, return empty lists/dicts.
495
+ """
496
+
497
+ messages = [
498
+ SystemMessage(content=system_prompt),
499
+ HumanMessage(content=f"User question: {last_human_message.content}"),
500
+ ]
501
+
502
+ try:
503
+ response = self.llm.invoke(messages)
504
+ profile_update = json.loads(response.content)
505
+
506
+ # Merge with existing profile
507
+ if not user_profile:
508
+ user_profile = {
509
+ "interests": [],
510
+ "preferences": {},
511
+ "expertise_level": "beginner",
512
+ "query_history": [],
513
+ }
514
+
515
+ # Update interests (avoid duplicates)
516
+ new_interests = profile_update.get("interests", [])
517
+ existing_interests = user_profile.get("interests", [])
518
+ user_profile["interests"] = list(
519
+ set(existing_interests + new_interests)
520
+ )
521
+
522
+ # Update preferences
523
+ user_profile["preferences"].update(
524
+ profile_update.get("preferences", {})
525
+ )
526
+
527
+ # Update expertise level if provided
528
+ if profile_update.get("expertise_level"):
529
+ user_profile["expertise_level"] = profile_update["expertise_level"]
530
+
531
+ # Add to query history
532
+ if "query_history" not in user_profile:
533
+ user_profile["query_history"] = []
534
+ user_profile["query_history"].append(last_human_message.content)
535
+
536
+ # Keep only last 10 queries
537
+ user_profile["query_history"] = user_profile["query_history"][-10:]
538
+
539
+ except (json.JSONDecodeError, Exception):
540
+ # If parsing fails, just add to query history
541
+ if not user_profile:
542
+ user_profile = {"query_history": []}
543
+ if "query_history" not in user_profile:
544
+ user_profile["query_history"] = []
545
+ user_profile["query_history"].append(last_human_message.content)
546
+ user_profile["query_history"] = user_profile["query_history"][-10:]
547
+
548
+ state["user_profile"] = user_profile
549
+ return state
550
+
551
+ def _recommend_queries(self, state: AgentState) -> AgentState:
552
+ """Recommend next queries based on conversation history and user profile."""
553
+ user_profile = state.get("user_profile", {})
554
+ query_history = user_profile.get("query_history", [])
555
+ interests = user_profile.get("interests", [])
556
+
557
+ # Get dataset info for context
558
+ df = DatasetManager().get_dataset()
559
+ categories = df["category"].unique().tolist()
560
+ intents = df["intent"].unique()[:20].tolist()
561
+
562
+ system_prompt = f"""
563
+ You are a query recommendation assistant. Based on the user's conversation
564
+ history and interests, suggest relevant follow-up questions they could ask
565
+ about the customer support dataset.
566
+
567
+ User's query history: {query_history}
568
+ User's interests: {interests}
569
+
570
+ Available categories: {categories}
571
+ Sample intents: {intents}
572
+
573
+ Suggest 3-5 relevant questions the user might want to ask next. Consider:
574
+ - Natural follow-ups to their previous questions
575
+ - Related categories or intents they haven't explored
576
+ - Different types of analysis (if they've only done quantitative,
577
+ suggest qualitative and vice versa)
578
+
579
+ Be conversational and explain why each suggestion might be interesting.
580
+ Start with "Based on your previous queries, you might want to..."
581
+ """
582
+
583
+ messages = [SystemMessage(content=system_prompt)]
584
+
585
+ # Add conversation context
586
+ if state["messages"]:
587
+ messages.extend(state["messages"])
588
+ else:
589
+ messages.append(HumanMessage(content="What should I query next?"))
590
+
591
+ response = self.llm.invoke(messages)
592
+ state["messages"].append(response)
593
+
594
+ return state
595
+
596
+ def _handle_out_of_scope(self, state: AgentState) -> AgentState:
597
+ """Handle queries that are out of scope."""
598
+ response = AIMessage(
599
+ content="I'm sorry, but I can only answer questions about the customer "
600
+ "support dataset. Please ask questions about categories, intents, "
601
+ "conversation examples, or data statistics."
602
+ )
603
+ state["messages"].append(response)
604
+ return state
605
+
606
+ def invoke(self, message: str, thread_id: str) -> Dict[str, Any]:
607
+ """Invoke the agent with a message and thread ID."""
608
+ config = {"configurable": {"thread_id": thread_id}}
609
+
610
+ # Create input state
611
+ input_state = {"messages": [HumanMessage(content=message)]}
612
+
613
+ # Invoke the graph
614
+ result = self.graph.invoke(input_state, config)
615
+
616
+ return result
617
+
618
+ def get_conversation_history(self, thread_id: str) -> List[Dict[str, Any]]:
619
+ """Get conversation history for a thread."""
620
+ config = {"configurable": {"thread_id": thread_id}}
621
+
622
+ try:
623
+ # Get the current state
624
+ state = self.graph.get_state(config)
625
+ if state and state.values.get("messages"):
626
+ return [
627
+ {
628
+ "role": (
629
+ "human" if isinstance(msg, HumanMessage) else "assistant"
630
+ ),
631
+ "content": msg.content,
632
+ }
633
+ for msg in state.values["messages"]
634
+ ]
635
+ except Exception:
636
+ pass
637
+
638
+ return []
639
+
640
+ def get_user_profile(self, thread_id: str) -> Dict[str, Any]:
641
+ """Get user profile for a thread."""
642
+ config = {"configurable": {"thread_id": thread_id}}
643
+
644
+ try:
645
+ state = self.graph.get_state(config)
646
+ if state and state.values.get("user_profile"):
647
+ return state.values["user_profile"]
648
+ except Exception:
649
+ pass
650
+
651
+ return {}
requirements.txt CHANGED
@@ -1,3 +1,12 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
 
 
 
 
 
 
1
+ streamlit>=1.28.0
2
+ pandas>=2.0.0
3
+ datasets>=2.14.0
4
+ openai>=1.3.0
5
+ pydantic>=2.4.0
6
+ python-dotenv>=1.0.0
7
+ requests>=2.31.0
8
+ langgraph>=0.2.0
9
+ langchain>=0.2.0
10
+ langchain-core>=0.2.0
11
+ langchain-openai>=0.1.0
12
+ langsmith>=0.1.0