Spaces:

marimo-team
/

marimo-learn

Running

App Files Files Community

Srihari Thyagarajan commited on May 19

Commit

68f7784

unverified ·

2 Parent(s): 5e07263 34f04d3

Merge pull request #83 from Azmi-84/duckdb

Browse files

Add DuckDB getting started guide with interactive examples.

Files changed (1) hide show

duckdb/01_getting_started.py +1792 -0

duckdb/01_getting_started.py ADDED Viewed

	@@ -0,0 +1,1792 @@

+# /// script
+# requires-python = ">=3.11"
+# dependencies = [
+#     "marimo",
+#     "duckdb==1.2.2",
+#     "polars==1.27.0",
+#     "numpy==2.2.4",
+#     "pyarrow==19.0.1",
+#     "pandas==2.2.3",
+#     "sqlglot==26.12.1",
+#     "plotly==5.23.1",
+#     "statsmodels==0.14.4",
+# ]
+# ///
+import marimo
+__generated_with = "0.13.4"
+app = marimo.App(width="medium")
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        rf"""
+    <p align="center">
+      <img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSxHAqB0W_61zuIGVMiU6sEeQyTaw-9xwiprw&s" alt="DuckDB Image"/>
+    </p>
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        rf"""
+    # 🦆 **DuckDB**: An Embeddable Analytical Database System
+    ## What is DuckDB?
+    [DuckDB](https://duckdb.org/) is a _high-performance_, in-process, embeddable SQL OLAP (Online Analytical Processing) Database Management System (DBMS) designed for simplicity and speed. It's essentially a fully-featured database that runs directly within your application's process, without needing a separate server. This makes it excellent for complex analytical workloads, offering a robust SQL interface and efficient processing – perfect for learning about databases and data analysis concepts.  It's a great alternative to heavier database systems like PostgreSQL or MySQL when you don't need a full-blown server.
+    ---
+    ## ⚡ Key Features
+    | Feature | Description |
+    |:---------|:-------------|
+    | **In-Process Architecture** | Runs directly within your application's memory space - no separate server needed, simplifying deployment |
+    | **Columnar Storage** | Data stored in columns instead of rows, dramatically improving performance for analytical queries |
+    | **Vectorized Execution** | Performs operations on entire columns at once, significantly speeding up data processing |
+    | **ACID Transactions** | Ensures data integrity and reliability across operations |
+    | **Multi-Language Support** | Provides APIs for `Python`, `R`, `Java`, `C++`, and more |
+    | **Zero External Dependencies** | Minimal dependencies, making setup and deployment straightforward |
+    | **High Portability** | Works across various operating systems (Windows, macOS, Linux) and hardware architectures |
+    ---
+    ## [Use Cases](https://github.com/davidgasquez/awesome-duckdb?tab=readme-ov-file):
+    - **Data Analysis and Exploration:**  DuckDB is ideal for quickly querying and analyzing datasets, especially for initial exploratory analysis.
+    - **Embedded Analytics in Applications:**  You can integrate DuckDB directly into your applications to provide analytical capabilities without the need for a separate database server.
+    - **ETL (Extract, Transform, Load) Processes:** DuckDB can be used to perform initial data transformation and cleaning steps as part of an ETL pipeline.
+    - **Data Science and Machine Learning Workflows:**  It's a lightweight alternative to larger databases for prototyping data analysis and machine learning models.
+    - **Rapid Prototyping of Data Analysis Pipelines:** Quickly test and iterate on data analysis ideas without the complexity of setting up a full-blown database environment.
+    - **Small to Medium Datasets:** DuckDB shines when working with datasets that don't require the massive scalability of a traditional database server.
+    ---
+    ### [Installation](https://duckdb.org/docs/installation/?version=stable&environment=python):
+    - Python installation:
+    ```
+    pip install duckdb
+    ```
+    ```
+    conda install python-duckdb -c conda-forge.
+    ```
+    <!-- >**_Note_:** DuckDB requires Python 3.7 or newer. You also need to have Python and `pip` or `conda` installed on your system. -->
+    /// attention | Note
+    DuckDB requires Python 3.7 or newer. You also need to have Python and `pip` or `conda` installed on your system.
+    ///
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        r"""
+    # [1. DuckDB Connections: In-Memory vs. File-based](https://duckdb.org/docs/stable/connect/overview.html)
+    DuckDB is a lightweight, _relational database management system (RDBMS)_ designed for analytical workloads. Unlike traditional client-server databases, it operates _in-process_ (embedded within your application) and supports both _in-memory_ (temporary) and _file-based_ (persistent) storage.
+    ---
+    | Feature | In-Memory Connection | File-Based Connection |
+    |:---------|:---------------------|:----------------------|
+    | Persistence | Temporary (lost when session ends) | Stored on disk (persists between sessions) |
+    | Use Cases | Quick analysis, ephemeral data, testing | Long-term storage, data that needs to be accessed later |
+    | Performance | Faster for most operations | Slightly slower but provides persistence |
+    | Creation | duckdb.connect(':memory:') | duckdb.connect('filename.db') |
+    | Multiple Connection Access | Limited to single connection | Multiple connections can access the same database |
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(os):
+    # Remove previous database if it exists
+    if os.path.exists("example.db"):
+        os.remove("example.db")
+    if not os.path.exists("data"):
+        os.makedirs("data")
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    _df = mo.sql(
+        f"""
+        -- Print the DuckDB version
+        SELECT version() AS version_info
+        """
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        """
+    ## Creating DuckDB Connections
+    Let's create both types of DuckDB connections and explore their characteristics.
+    1. **In-memory connection**: Data exists only during the current session
+    2. **File-based connection**: Data persists between sessions
+    We'll then demonstrate the key differences between these connection types.
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(duckdb):
+    # Create an in-memory DuckDB connection
+    memory_db = duckdb.connect(":memory:")
+    # Create a file-based DuckDB connection
+    file_db = duckdb.connect("example.db")
+    return file_db, memory_db
+@app.cell(hide_code=True)
+def _(file_db, memory_db):
+    # Test both connections
+    memory_db.execute(
+        "CREATE TABLE IF NOT EXISTS mem_test (id INTEGER, name VARCHAR)"
+    )
+    memory_db.execute("INSERT INTO mem_test VALUES (1, 'Memory Test')")
+    file_db.execute(
+        "CREATE TABLE IF NOT EXISTS file_test (id INTEGER, name VARCHAR)"
+    )
+    file_db.execute("INSERT INTO file_test VALUES (1, 'File Test')")
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        r"""
+    ## Testing Connection Persistence
+    Let's demonstrate how in-memory databases are ephemeral, while file-based databases persist.
+    1. First, we'll query our tables to confirm the data was properly inserted
+    2. Then, we'll simulate an application restart by creating new connections
+    3. Finally, we'll check which data persists after the "restart"
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(r"""## Current Database Contents""")
+    return
+@app.cell(hide_code=True)
+def _(mem_test, memory_db, mo):
+    _df = mo.sql(
+        f"""
+        SELECT * FROM mem_test
+        """,
+        engine=memory_db
+    )
+    return
+@app.cell(hide_code=True)
+def _(file_db, file_test, mo):
+    _df = mo.sql(
+        f"""
+        SELECT * FROM file_test
+        """,
+        engine=file_db
+    )
+    return
+@app.cell
+def _():
+    # We don't actually close the connections here since we need them for later cells
+    # Just a placeholder for the concept
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(rf"""## 🔄 Simulating Application Restart...""")
+    return
+@app.cell(hide_code=True)
+def _(duckdb):
+    # Create new connections (simulating restart)
+    new_memory_db = duckdb.connect(":memory:")
+    new_file_db = duckdb.connect("example.db")
+    return new_file_db, new_memory_db
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # Try to query tables in the new memory connection
+    try:
+        new_memory_db.execute("SELECT * FROM mem_test").df()
+        memory_persistence = "✅ Data persisted in memory (unexpected)"
+        memory_data_available = True
+    except Exception as e:
+        memory_persistence = "❌ Data lost from memory (expected behavior)"
+        memory_data_available = False
+    return memory_data_available, memory_persistence
+@app.cell(hide_code=True)
+def _(new_file_db):
+    # Try to query tables in the new file connection
+    try:
+        file_data = new_file_db.execute("SELECT * FROM file_test").df()
+        file_persistence = "✅ Data persisted in file (expected behavior)"
+        file_data_available = True
+    except Exception as e:
+        file_persistence = "❌ Data lost from file (unexpected)"
+        file_data_available = False
+        file_data = None
+    return file_data, file_data_available, file_persistence
+@app.cell(hide_code=True)
+def _(
+    file_data_available,
+    file_persistence,
+    memory_data_available,
+    memory_persistence,
+    mo,
+):
+    # Create an interactive display to show persistence results
+    persistence_results = mo.ui.table(
+        {
+            "Connection Type": ["In-Memory Database", "File-Based Database"],
+            "Persistence Status": [memory_persistence, file_persistence],
+            "Data Available After Restart": [
+                memory_data_available,
+                file_data_available,
+            ],
+        }
+    )
+    return (persistence_results,)
+@app.cell(hide_code=True)
+def _(mo, persistence_results):
+    mo.vstack(
+        [
+            mo.vstack([mo.md(f"""## Persistence Test Results""")], align="center"),
+            persistence_results,
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(file_data, file_data_available, mo):
+    if file_data_available:
+        mo.md("### Persisted File-Based Data:")
+        mo.ui.table(file_data)
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        r"""
+    # [2. Creating Tables in DuckDB](https://duckdb.org/docs/stable/sql/statements/create_table.html)
+    DuckDB supports standard SQL syntax for creating tables. Let's create more complex tables to demonstrate different data types and constraints.
+    ## Table Creation Options
+    DuckDB supports various table creation options, including:
+    - **Basic tables** with column definitions
+    - **Temporary tables** that exist only during the session
+    - **CREATE OR REPLACE** to recreate tables
+    - **Primary keys** and other constraints
+    - **Various data types** including INTEGER, VARCHAR, TIMESTAMP, DECIMAL, etc.
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(file_db, new_memory_db):
+    # For the memory database
+    try:
+        new_memory_db.execute("DROP TABLE IF EXISTS users_memory")
+    except:
+        pass
+    # For the file database
+    try:
+        file_db.execute("DROP TABLE IF EXISTS users_file")
+    except:
+        pass
+    return
+@app.cell(hide_code=True)
+def _(file_db, new_memory_db):
+    # Create advanced users table in memory database with primary key
+    new_memory_db.execute("""
+    CREATE TABLE users_memory (
+        id INTEGER PRIMARY KEY,
+        name VARCHAR NOT NULL,
+        age INTEGER CHECK (age > 0),
+        email VARCHAR UNIQUE,
+        registration_date DATE DEFAULT CURRENT_DATE,
+        last_login TIMESTAMP,
+        account_balance DECIMAL(10,2) DEFAULT 0.00
+    )
+    """)
+    # Create users table in file database
+    file_db.execute("""
+    CREATE TABLE users_file (
+        id INTEGER PRIMARY KEY,
+        name VARCHAR NOT NULL,
+        age INTEGER CHECK (age > 0),
+        email VARCHAR UNIQUE,
+        registration_date DATE DEFAULT CURRENT_DATE,
+        last_login TIMESTAMP,
+        account_balance DECIMAL(10,2) DEFAULT 0.00
+    )
+    """)
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # Get table schema information using DuckDB's internal system tables
+    memory_schema = new_memory_db.execute("""
+        SELECT column_name, data_type, is_nullable
+        FROM information_schema.columns
+        WHERE table_name = 'users_memory'
+        ORDER BY ordinal_position
+    """).df()
+    return (memory_schema,)
+@app.cell(hide_code=True)
+def _(memory_schema, mo):
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## 🔍 Table Schema Information """)], align="center"
+            ),
+            mo.ui.table(memory_schema),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        r"""
+    # [3. Inserting Data Into Tables](https://duckdb.org/docs/stable/sql/statements/insert)
+    DuckDB supports multiple ways to insert data:
+    1. **INSERT INTO VALUES**: Insert specific values
+    2. **INSERT INTO SELECT**: Insert data from query results
+    3. **Parameterized inserts**: Using prepared statements
+    4. **Bulk inserts**: For efficient loading of multiple rows
+    Let's demonstrate these different insertion methods:
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(date):
+    today = date.today()
+    # First check if records already exist to avoid duplicate key errors
+    def safe_insert(connection, table_name, data):
+        """
+        Safely insert data into a table by checking for existing IDs first
+        """
+        # Check which IDs already exist in the table
+        existing_ids = (
+            connection.execute(f"SELECT id FROM {table_name}")
+            .fetchdf()["id"]
+            .tolist()
+        )
+        # Filter out data with IDs that already exist
+        new_data = [record for record in data if record[0] not in existing_ids]
+        if not new_data:
+            print(
+                f"No new records to insert into {table_name}. All IDs already exist."
+            )
+            return 0
+        # Prepare the placeholders for the SQL statement
+        placeholders = ", ".join(
+            ["(" + ", ".join(["?"] * len(new_data[0])) + ")"] * len(new_data)
+        )
+        # Flatten the list of tuples for parameter binding
+        flat_data = [item for sublist in new_data for item in sublist]
+        # Perform the insertion
+        if flat_data:
+            columns = "(id, name, age, email, registration_date, last_login, account_balance)"
+            connection.execute(
+                f"INSERT INTO {table_name} {columns} VALUES {placeholders}",
+                flat_data,
+            )
+            return len(new_data)
+        return 0
+    return (safe_insert,)
+@app.cell(hide_code=True)
+def _():
+    # Prepare the data
+    user_data = [
+        (
+            1,
+            "Alice",
+            25,
+            "alice@example.com",
+            "2021-01-01",
+            "2023-01-15 14:30:00",
+            1250.75,
+        ),
+        (
+            2,
+            "Bob",
+            30,
+            "bob@example.com",
+            "2021-02-01",
+            "2023-02-10 09:15:22",
+            750.50,
+        ),
+        (
+            3,
+            "Charlie",
+            35,
+            "charlie@example.com",
+            "2021-03-01",
+            "2023-03-05 17:45:10",
+            3200.25,
+        ),
+        (
+            4,
+            "David",
+            40,
+            "david@example.com",
+            "2021-04-01",
+            "2023-04-20 10:30:45",
+            1800.00,
+        ),
+        (
+            5,
+            "Emma",
+            45,
+            "emma@example.com",
+            "2021-05-01",
+            "2023-05-12 11:20:30",
+            2500.00,
+        ),
+        (
+            6,
+            "Frank",
+            50,
+            "frank@example.com",
+            "2021-06-01",
+            "2023-06-18 16:10:15",
+            900.25,
+        ),
+    ]
+    return (user_data,)
+@app.cell(hide_code=True)
+def _(file_db, new_memory_db, safe_insert, user_data):
+    # Safely insert data into memory database
+    safe_insert(new_memory_db, "users_memory", user_data)
+    # Safely insert data into file database
+    safe_insert(file_db, "users_file", user_data)
+    return
+@app.cell(hide_code=True)
+def _():
+    # If you need to add just one record, you can use a similar approach:
+    new_user = (
+        7,
+        "Grace",
+        28,
+        "grace@example.com",
+        "2021-07-01",
+        "2023-07-22 13:45:10",
+        1675.50,
+    )
+    return (new_user,)
+@app.cell(hide_code=True)
+def _(new_memory_db, new_user):
+    # Check if the ID exists before inserting
+    if not new_memory_db.execute(
+        "SELECT id FROM users_memory WHERE id = ?", [new_user[0]]
+    ).fetchone():
+        new_memory_db.execute(
+            """
+            INSERT INTO users_memory (id, name, age, email, registration_date, last_login, account_balance)
+            VALUES (?, ?, ?, ?, ?, ?, ?)
+            """,
+            new_user,
+        )
+        print(f"Added user {new_user[1]} to users_memory")
+    else:
+        print(f"User with ID {new_user[0]} already exists in users_memory")
+    return
+@app.cell(hide_code=True)
+def _(file_db, new_user):
+    # Do the same for the file database
+    if not file_db.execute(
+        "SELECT id FROM users_file WHERE id = ?", [new_user[0]]
+    ).fetchone():
+        file_db.execute(
+            """
+            INSERT INTO users_file (id, name, age, email, registration_date, last_login, account_balance)
+            VALUES (?, ?, ?, ?, ?, ?, ?)
+            """,
+            new_user,
+        )
+        print(f"Added user {new_user[1]} to users_file")
+    else:
+        print(f"User with ID {new_user[0]} already exists in users_file")
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # First try to update
+    cursor = new_memory_db.execute(
+        """
+        UPDATE users_memory
+        SET name = ?, age = ?, email = ?,
+            registration_date = ?, last_login = ?, account_balance = ?
+        WHERE id = ?
+        """,
+        (
+            "Henry",
+            33,
+            "henry@example.com",
+            "2021-08-01",
+            "2023-08-05 09:10:15",
+            3100.75,
+            8,  # ID should be the last parameter
+        ),
+    )
+    return (cursor,)
+@app.cell(hide_code=True)
+def _(cursor, mo, new_memory_db):
+    # If no rows were updated, perform an insert
+    if cursor.rowcount == 0:
+        new_memory_db.execute(
+            """
+            INSERT INTO users_memory
+            (id, name, age, email, registration_date, last_login, account_balance)
+            VALUES (?, ?, ?, ?, ?, ?, ?)
+            """,
+            (
+                8,
+                "Henry",
+                33,
+                "henry@example.com",
+                "2021-08-01",
+                "2023-08-05 09:10:15",
+                3100.75,
+            ),
+        )
+    mo.md(
+        f"""
+        Upserted Henry into users_memory.
+        """
+    )
+    return
+@app.cell(hide_code=True)
+def _(file_db, mo):
+    # For DuckDB using ON CONFLICT, we need to specify the conflict target column
+    file_db.execute(
+        """
+        INSERT INTO users_file (id, name, age, email, registration_date, last_login, account_balance)
+        VALUES (?, ?, ?, ?, ?, ?, ?)
+        ON CONFLICT (id) DO UPDATE SET
+            name = EXCLUDED.name,
+            age = EXCLUDED.age,
+            email = EXCLUDED.email,
+            registration_date = EXCLUDED.registration_date,
+            last_login = EXCLUDED.last_login,
+            account_balance = EXCLUDED.account_balance
+        """,
+        (
+            8,
+            "Henry",
+            33,
+            "henry@example.com",
+            "2021-08-01",
+            "2023-08-05 09:10:15",
+            3100.75,
+        ),
+    )
+    mo.md(
+        f"""
+        Upserted Henry into users_file.
+        """
+    )
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # Display memory data using DuckDB's query capabilities
+    memory_results = new_memory_db.execute("""
+        SELECT
+            id,
+            name,
+            age,
+            email,
+            registration_date,
+            last_login,
+            account_balance
+        FROM users_memory
+        ORDER BY id
+    """).df()
+    return (memory_results,)
+@app.cell(hide_code=True)
+def _(file_db):
+    # Display file data with formatting
+    file_results = file_db.execute("""
+        SELECT
+            id,
+            name,
+            age,
+            email,
+            registration_date,
+            last_login,
+            CAST(account_balance AS DECIMAL(10,2)) AS account_balance
+        FROM users_file
+        ORDER BY id
+    """).df()
+    return (file_results,)
+@app.cell(hide_code=True)
+def _(file_results, memory_results, mo):
+    tabs = mo.ui.tabs(
+        {
+            "In-Memory Database": mo.ui.table(memory_results),
+            "File-Based Database": mo.ui.table(file_results),
+        }
+    )
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## 📊 Database Contents After Insertion""")],
+                align="center",
+            ),
+            tabs,
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        r"""
+    # [4. Using SQL Directly in marimo](https://duckdb.org/docs/stable/sql/query_syntax/select)
+    There are multiple ways to leverage DuckDB's SQL capabilities in marimo:
+    1. **Direct execution**: Using DuckDB connections to execute SQL
+    2. **marimo SQL**: Using marimo's built-in SQL engine
+    3. **Interactive queries**: Combining UI elements with SQL execution
+    Let's explore these approaches:
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.vstack(
+        [
+            mo.vstack([mo.md(f"""## 🔍 Query with marimo SQL""")], align="center"),
+            mo.md(
+                "### marimo has its own [built-in SQL engine](https://docs.marimo.io/guides/working_with_data/sql/) that can work with DataFrames."
+            ),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(memory_results, mo):
+    # Create a SQL selector for users with age threshold
+    age_threshold = mo.ui.slider(
+        25, 50, value=30, label="Minimum Age", full_width=True, show_value=True
+    )
+    # Create a function to filter users based on the slider value
+    def filtered_users():
+        # Use DuckDB directly instead of mo.sql with users param
+        filtered_df = memory_results[memory_results["age"] >= age_threshold.value]
+        filtered_df = filtered_df.sort_values("age")
+        return mo.ui.table(filtered_df)
+    return age_threshold, filtered_users
+@app.cell(hide_code=True)
+def _(age_threshold, filtered_users, mo):
+    layout = mo.vstack(
+        [
+            mo.md("### Select minimum age:"),
+            age_threshold,
+            mo.md("### Users meeting age criteria:"),
+            filtered_users(),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    layout
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(r"""# [5. Working with Polars and DuckDB](https://duckdb.org/docs/stable/guides/python/polars.html)""")
+    return
+@app.cell(hide_code=True)
+def _(pl):
+    # Create a Polars DataFrame
+    polars_df = pl.DataFrame(
+        {
+            "id": [101, 102, 103],
+            "name": ["Product A", "Product B", "Product C"],
+            "price": [29.99, 49.99, 19.99],
+            "category": ["Electronics", "Furniture", "Books"],
+        }
+    )
+    return (polars_df,)
+@app.cell(hide_code=True)
+def _(mo, polars_df):
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Original Polars DataFrame:""")], align="center"
+            ),
+            mo.ui.table(polars_df),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db, polars_df):
+    # Register the Polars DataFrame as a DuckDB table in memory connection
+    new_memory_db.register("products_polars", polars_df)
+    # Query the registered table
+    polars_query_result = new_memory_db.execute(
+        "SELECT * FROM products_polars WHERE price > 25"
+    ).df()
+    return (polars_query_result,)
+@app.cell(hide_code=True)
+def _(mo, polars_query_result):
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## DuckDB Query Result (From Polars Data):""")],
+                align="center",
+            ),
+            mo.ui.table(polars_query_result),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # Demonstrate a more complex query
+    complex_query_result = new_memory_db.execute("""
+        SELECT
+            category,
+            COUNT(*) as product_count,
+            AVG(price) as avg_price,
+            MIN(price) as min_price,
+            MAX(price) as max_price
+        FROM products_polars
+        GROUP BY category
+        ORDER BY avg_price DESC
+    """).df()
+    return (complex_query_result,)
+@app.cell(hide_code=True)
+def _(complex_query_result, mo):
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Aggregated Product Data by Category:""")],
+                align="center",
+            ),
+            mo.ui.table(complex_query_result),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(r"""# [6. Advanced Queries: Joins Between Tables](https://duckdb.org/docs/stable/guides/performance/join_operations.html)""")
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # Create another table to join with
+    new_memory_db.execute("""
+    CREATE TABLE IF NOT EXISTS departments (
+        id INTEGER,
+        department_name VARCHAR,
+        manager_id INTEGER
+    )
+    """)
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    new_memory_db.execute("""
+    INSERT INTO departments VALUES
+    (101, 'Engineering', 1),
+    (102, 'Marketing', 2),
+    (103, 'Finance', NULL)
+    """)
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # Execute a join query
+    join_result = new_memory_db.execute("""
+    SELECT
+        u.id,
+        u.name,
+        u.age,
+        d.department_name
+    FROM users_memory u
+    LEFT JOIN departments d ON u.id = d.manager_id
+    ORDER BY u.id
+    """).df()
+    return (join_result,)
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        rf"""
+    <!-- Display the join result -->
+    ## Join Result (Users and Departments):
+    """
+    )
+    return
+@app.cell
+def _(join_result, mo):
+    mo.ui.table(join_result)
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        rf"""
+    <!-- Demonstrate different types of joins -->
+    ## Different Types of Joins
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # Inner join
+    inner_join = new_memory_db.execute("""
+    SELECT u.id, u.name, d.department_name
+    FROM users_memory u
+    INNER JOIN departments d ON u.id = d.manager_id
+    """).df()
+    # Right join
+    right_join = new_memory_db.execute("""
+    SELECT u.id, u.name, d.department_name
+    FROM users_memory u
+    RIGHT JOIN departments d ON u.id = d.manager_id
+    """).df()
+    # Full outer join
+    full_join = new_memory_db.execute("""
+    SELECT u.id, u.name, d.department_name
+    FROM users_memory u
+    FULL OUTER JOIN departments d ON u.id = d.manager_id
+    """).df()
+    # Cross join
+    cross_join = new_memory_db.execute("""
+    SELECT u.id, u.name, d.department_name
+    FROM users_memory u
+    CROSS JOIN departments d
+    """).df()
+    # Self join (Joining user table with itself to find users with the same age)
+    self_join = new_memory_db.execute("""
+    SELECT u1.id, u1.name, u2.name AS same_age_user
+    FROM users_memory u1
+    JOIN users_memory u2 ON u1.age = u2.age AND u1.id <> u2.id
+    """).df()
+    # Semi join (Finding users who are also managers)
+    semi_join = new_memory_db.execute("""
+    SELECT u.id, u.name, u.age
+        FROM users_memory u
+        WHERE EXISTS (
+            SELECT 1 FROM departments d
+            WHERE u.id = d.manager_id
+    )
+    """).df()
+    # Anti join (Finding users who are not managers)
+    anti_join = new_memory_db.execute("""
+    SELECT u.id, u.name, u.age
+        FROM users_memory u
+        WHERE NOT EXISTS (
+            SELECT 1 FROM departments d
+            WHERE u.id = d.manager_id
+    )
+    """).df()
+    return (
+        anti_join,
+        cross_join,
+        full_join,
+        inner_join,
+        right_join,
+        self_join,
+        semi_join,
+    )
+@app.cell(hide_code=True)
+def _(mo, new_memory_db):
+    # Display base table side by side
+    users = new_memory_db.execute("SELECT * FROM users_memory").df()
+    departments = new_memory_db.execute("SELECT * FROM departments").df()
+    base_tables = mo.vstack(
+        [
+            mo.vstack([mo.md(f"""# Base Tables""")], align="center"),
+            mo.ui.tabs(
+                {
+                    "User Table": mo.ui.table(users),
+                    "Departments Table": mo.ui.table(departments),
+                }
+            ),
+        ]
+    )
+    base_tables
+    return
+@app.cell(hide_code=True)
+def _(
+    anti_join,
+    cross_join,
+    full_join,
+    inner_join,
+    join_result,
+    mo,
+    right_join,
+    self_join,
+    semi_join,
+):
+    join_description = {
+        "Left Join": "Shows all records from the left table and matching records from the right table. Non-matches filled with NULL.",
+        "Inner Join": "Shows only the records where there's a match in both tables.",
+        "Right Join": "Shows all records from the right table and matching records from the left table. Non-matches filled with NULL.",
+        "Full Outer Join": "Shows all records from both tables, with NULL values where there's no match.",
+        "Cross Join": "Returns the Cartesian product - all possible combinations of rows from both tables.",
+        "Self Join": "Joins a table with itself, used to compare rows within the same table.",
+        "Semi Join": "Returns rows from the first table where one or more matches exist in the second table.",
+        "Anti Join": "Returns rows from the first table where no matches exist in the second table.",
+    }
+    join_tabs = mo.ui.tabs(
+        {
+            "Left Join": mo.ui.table(join_result),
+            "Inner Join": mo.ui.table(inner_join),
+            "Right Join": mo.ui.table(right_join),
+            "Full Outer Join": mo.ui.table(full_join),
+            "Cross Join": mo.ui.table(cross_join),
+            "Self Join": mo.ui.table(self_join),
+            "Semi Join": mo.ui.table(semi_join),
+            "Anti Join": mo.ui.table(anti_join),
+        }
+    )
+    return join_description, join_tabs
+@app.cell(hide_code=True)
+def _(join_description, join_tabs, mo):
+    join_display = mo.vstack(
+        [
+            mo.vstack([mo.md(f"""# SQL Join Operations""")], align="center"),
+            mo.md(f"**{join_tabs.value}**: {join_description[join_tabs.value]}"),
+            mo.md("## Join Results"),
+            join_tabs,
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    join_display
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(r"""# [7. Aggregate Functions in DuckDB](https://duckdb.org/docs/stable/sql/functions/aggregates.html)""")
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    # Execute an aggregate query
+    agg_result = new_memory_db.execute("""
+    SELECT
+        AVG(age) as avg_age,
+        MAX(age) as max_age,
+        MIN(age) as min_age,
+        COUNT(*) as total_users,
+        SUM(account_balance) as total_balance
+    FROM users_memory
+    """).df()
+    return (agg_result,)
+@app.cell(hide_code=True)
+def _(agg_result, mo):
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Aggregate Results (All Users):""")], align="center"
+            ),
+            mo.ui.table(agg_result),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    age_groups = new_memory_db.execute("""
+    SELECT
+        CASE
+            WHEN age < 30 THEN 'Under 30'
+            WHEN age BETWEEN 30 AND 40 THEN '30 to 40'
+            ELSE 'Over 40'
+        END as age_group,
+        COUNT(*) as count,
+        AVG(age) as avg_age,
+        AVG(account_balance) as avg_balance
+    FROM users_memory
+    GROUP BY 1
+    ORDER BY 1
+    """).df()
+    return (age_groups,)
+@app.cell(hide_code=True)
+def _(age_groups, mo):
+    mo.ui.table(age_groups)
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Aggregate Results (Grouped by Age Range):""")],
+                align="center",
+            ),
+            mo.ui.table(age_groups),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    window_result = new_memory_db.execute("""
+    SELECT
+        id,
+        name,
+        age,
+        account_balance,
+        RANK() OVER (ORDER BY account_balance DESC) as balance_rank,
+        account_balance - AVG(account_balance) OVER () as diff_from_avg,
+        account_balance / SUM(account_balance) OVER () * 100 as pct_of_total
+    FROM users_memory
+    ORDER BY balance_rank
+    """).df()
+    return (window_result,)
+@app.cell(hide_code=True)
+def _(mo, window_result):
+    mo.vstack(
+        [
+            mo.vstack([mo.md(f"""## Window Functions Example""")], align="center"),
+            mo.ui.table(window_result),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(r"""# [8. Converting DuckDB Results to Polars/Pandas](https://duckdb.org/docs/stable/guides/python/polars.html)""")
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    polars_result = new_memory_db.execute(
+        """SELECT * FROM users_memory WHERE age > 25 ORDER BY age"""
+    ).pl()
+    return (polars_result,)
+@app.cell(hide_code=True)
+def _(mo, polars_result):
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Query Result as Polars DataFrame:""")],
+                align="center",
+            ),
+            mo.ui.table(polars_result),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(new_memory_db):
+    pandas_result = new_memory_db.execute(
+        """SELECT * FROM users_memory WHERE age > 25 ORDER BY age"""
+    ).fetch_df()
+    return (pandas_result,)
+@app.cell(hide_code=True)
+def _(mo, pandas_result):
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Same Query Result as Pandas DataFrame:""")],
+                align="center",
+            ),
+            mo.ui.table(pandas_result),
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Differences in DataFrame Handling""")],
+                align="center",
+            ),
+            mo.vstack(
+                [
+                    mo.md(
+                        f"""## Polars: Filter users over 35 and calculate average balance"""
+                    )
+                ],
+                align="start",
+            ),
+        ],
+        gap=2, justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo, pl, polars_result):
+    def _():
+        polars_filtered = polars_result.filter(pl.col("age") > 35)
+        polars_avg = polars_filtered.select(
+            pl.col("account_balance").mean().alias("avg_balance")
+        )
+        layout = mo.vstack(
+            [
+                mo.md("### Filtered Polars DataFrame (Age > 35):"),
+                mo.ui.table(polars_filtered),
+                mo.md("### Average Account Balance:"),
+                mo.ui.table(polars_avg),
+            ],
+            gap=2,
+        )
+        return layout
+    _()
+    return
+@app.cell(hide_code=True)
+def _(mo, pandas_result):
+    pandas_avg = pandas_result[pandas_result["age"] > 35]["account_balance"].mean()
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Pandas: Same operation in pandas style""")],
+                align="center",
+            ),
+            mo.vstack(
+                [mo.md(f"""### Average balance: {pandas_avg:.2f}""")],
+                align="start",
+            ),
+        ]
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md("""# 9. Data Visualization with DuckDB and Plotly""")
+    return
+@app.cell(hide_code=True)
+def _(age_groups, mo, new_memory_db, plotly_express):
+    # User distribution by age group
+    fig1 = plotly_express.bar(
+        age_groups,
+        x="age_group",
+        y="count",
+        title="User Distribution by Age Group",
+        labels={"count": "Number of Users", "age_group": "Age Group"},
+        color="age_group",
+        color_discrete_sequence=plotly_express.colors.qualitative.Plotly,
+    )
+    fig1.update_traces(
+        text=age_groups["count"],
+        textposition="outside",
+    )
+    fig1.update_layout(
+        height=450,
+        margin=dict(t=50, b=50, l=50, r=25),
+        hoverlabel=dict(bgcolor="white", font_size=12),
+        template="plotly_white",
+    )
+    # Average balance by age group
+    fig2 = plotly_express.bar(
+        age_groups,
+        x="age_group",
+        y="avg_balance",
+        title="Average Account Balance by Age Group",
+        labels={"avg_balance": "Average Balance ($)", "age_group": "Age Group"},
+        color="age_group",
+        color_discrete_sequence=plotly_express.colors.qualitative.Plotly,
+    )
+    fig2.update_traces(
+        text=[f"${val:.2f}" for val in age_groups["avg_balance"]],
+        textposition="outside",
+    )
+    fig2.update_layout(
+        height=450,
+        margin=dict(t=50, b=50, l=50, r=25),
+        hoverlabel=dict(bgcolor="white", font_size=12),
+        template="plotly_white",
+    )
+    # Age vs Account Balance scatter plot
+    scatter_data = new_memory_db.execute(
+        """
+        SELECT
+            name,
+            age,
+            account_balance
+        FROM users_memory
+        ORDER BY age
+        """
+    ).df()
+    fig3 = plotly_express.scatter(
+        scatter_data,
+        x="age",
+        y="account_balance",
+        title="Age vs. Account Balance",
+        labels={"account_balance": "Account Balance ($)", "age": "Age"},
+        color_discrete_sequence=["#FF7F0E"],
+        trendline="ols",
+        hover_data=["age", "account_balance"],
+        size_max=15,
+    )
+    fig3.update_traces(marker=dict(size=12))
+    fig3.update_layout(
+        height=450,
+        margin=dict(t=50, b=50, l=50, r=25),
+        hoverlabel=dict(bgcolor="white", font_size=12),
+        template="plotly_white",
+    )
+    # Distribution of account balances
+    balance_data = new_memory_db.execute(
+        """
+        SELECT
+            name,
+            account_balance
+        FROM users_memory
+        ORDER BY account_balance DESC
+        """
+    ).df()
+    fig4 = plotly_express.pie(
+        balance_data,
+        names="name",
+        values="account_balance",
+        title="Distribution of Account Balances",
+        labels={"account_balance": "Account Balance ($)", "name": "User"},
+        color_discrete_sequence=plotly_express.colors.qualitative.Pastel,
+    )
+    fig4.update_traces(textinfo="percent+label", textposition="inside")
+    fig4.update_layout(
+        height=450,
+        margin=dict(t=50, b=50, l=50, r=25),
+        hoverlabel=dict(bgcolor="white", font_size=12),
+        template="plotly_white",
+    )
+    category_tabs = mo.ui.tabs(
+        {
+            "Age Group Analysis": mo.vstack(
+                [
+                    mo.ui.tabs(
+                        {
+                            "User Distribution": mo.ui.plotly(fig1),
+                            "Average Balance": mo.ui.plotly(fig2),
+                        }
+                    )
+                ],
+                gap=2,
+                justify="space-between",
+            ),
+            "Financial Analysis": mo.vstack(
+                [
+                    mo.ui.tabs(
+                        {
+                            "Age vs Balance": mo.ui.plotly(fig3),
+                            "Balance Distribution": mo.ui.plotly(fig4),
+                        }
+                    )
+                ],
+                gap=2,
+                justify="space-between",
+            ),
+        },
+        lazy=True,
+    )
+    mo.vstack(
+        [
+            mo.vstack(
+                [mo.md(f"""## Select a visualization category:""")],
+                align="start",
+            ),
+            category_tabs,
+        ],
+        gap=2,
+        justify="space-between",
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        r"""
+    /// admonition |
+    ## Database Management Best Practices
+    ///
+    ### Closing Connections
+    It's important to close database connections when you're done with them, especially for file-based connections:
+    ```python
+    memory_db.close()
+    file_db.close()
+    ```
+    ### Transaction Management
+    DuckDB supports transactions, which can be useful for more complex operations:
+    ```python
+    conn = duckdb.connect('mydb.db')
+    conn.begin()  # Start transaction
+    try:
+        conn.execute("INSERT INTO users VALUES (1, 'Test User')")
+        conn.execute("UPDATE balances SET amount = amount - 100 WHERE user_id = 1")
+        conn.commit()  # Commit changes
+    except:
+        conn.rollback()  # Undo changes if error
+        raise
+    ```
+    ### Query Performance
+    DuckDB is optimized for analytical queries. For best performance:
+    - Use appropriate data types
+    - Create indexes for frequently queried columns
+    - For large datasets, consider partitioning
+    - Use prepared statements for repeated queries
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(rf"""## 10. Interactive DuckDB Dashboard with marimo and Plotly""")
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    # Create an interactive filter for age range
+    min_age = mo.ui.slider(20, 50, value=25, label="Minimum Age")
+    max_age = mo.ui.slider(20, 50, value=50, label="Maximum Age")
+    return max_age, min_age
+@app.cell(hide_code=True)
+def _(max_age, min_age, new_memory_db):
+    # Create a function to filter data and update visualizations
+    def get_filtered_data(min_val=min_age.value, max_val=max_age.value):
+        # Get filtered data based on slider values using parameterized query for safety
+        return new_memory_db.execute(
+            """
+            SELECT
+                id,
+                name,
+                age,
+                email,
+                account_balance,
+                registration_date
+            FROM users_memory
+            WHERE age >= ? AND age <= ?
+            ORDER BY age
+            """,
+            [min_val, max_val],
+        ).df()
+    return (get_filtered_data,)
+@app.cell(hide_code=True)
+def _(get_filtered_data):
+    def get_metrics(data=get_filtered_data()):
+        return {
+            "user count": len(data),
+            "avg_balance": data["account_balance"].mean() if len(data) > 0 else 0,
+            "total_balance": data["account_balance"].sum() if len(data) > 0 else 0,
+        }
+    return (get_metrics,)
+@app.cell(hide_code=True)
+def _(get_metrics, mo):
+    def metrics_display(metrics=get_metrics()):
+        return mo.hstack(
+            [
+                mo.vstack(
+                    [
+                        mo.md("### Selected Users"),
+                        mo.md(f"## {metrics['user count']}"),
+                    ],
+                    align="center",
+                ),
+                mo.vstack(
+                    [
+                        mo.md("### Average Balance"),
+                        mo.md(f"## ${metrics['avg_balance']:.2f}"),
+                    ],
+                    align="center",
+                ),
+                mo.vstack(
+                    [
+                        mo.md("### Total Balance"),
+                        mo.md(f"## ${metrics['total_balance']:.2f}"),
+                    ],
+                    align="center",
+                ),
+            ],
+            justify="space-between",
+            gap=2,
+        )
+    return (metrics_display,)
+@app.cell(hide_code=True)
+def _(get_filtered_data, max_age, min_age, mo, plotly_express):
+    def create_visualization(
+        data=get_filtered_data(), min_val=min_age.value, max_val=max_age.value
+    ):
+        if len(data) == 0:
+            return mo.ui.text("No data available for the selected age range.")
+        # Create visualizations for filtered data
+        fig1 = plotly_express.bar(
+            data,
+            x="name",
+            y="account_balance",
+            title=f"Account Balance by User (Age {min_val} - {max_val})",
+            labels={"account_balance": "Account Balance ($)", "name": "User"},
+            color="account_balance",
+            color_continuous_scale=plotly_express.colors.sequential.Plasma,
+            text_auto=".2s",
+        )
+        fig1.update_layout(
+            height=400,
+            xaxis_tickangle=-45,
+            margin=dict(t=50, b=70, l=50, r=30),
+            hoverlabel=dict(bgcolor="white", font_size=12),
+            template="plotly_white",
+        )
+        fig1.update_traces(
+            textposition="outside",
+        )
+        fig2 = plotly_express.histogram(
+            data,
+            x="age",
+            nbins=min(10, len(set(data["age"]))),
+            title=f"Age Distribution (Age {min_val} - {max_val})",
+            color_discrete_sequence=["#4C78A8"],
+            opacity=0.8,
+            histnorm="probability density",
+        )
+        fig2.update_layout(
+            height=400,
+            margin=dict(t=50, b=70, l=50, r=30),
+            bargap=0.1,
+            hoverlabel=dict(bgcolor="white", font_size=12),
+            template="plotly_white",
+        )
+        fig3 = plotly_express.scatter(
+            data,
+            x="age",
+            y="account_balance",
+            title=f"Age vs. Account Balance (Age {min_val} - {max_val})",
+            labels={"account_balance": "Account Balance ($)", "age": "Age"},
+            color="age",
+            color_continuous_scale="Viridis",
+            size_max=25,
+            size="account_balance",
+            hover_name="name",
+        )
+        fig3.update_layout(
+            height=400,
+            margin=dict(t=50, b=70, l=50, r=30),
+            hoverlabel=dict(bgcolor="white", font_size=12),
+            template="plotly_white",
+        )
+        return mo.ui.tabs(
+            {
+                "Account Balance by User": mo.ui.plotly(fig1),
+                "Age Distribution": mo.ui.plotly(fig2),
+                "Age vs. Account Balance": mo.ui.plotly(fig3),
+            }
+        )
+    return (create_visualization,)
+@app.cell(hide_code=True)
+def _(
+    create_visualization,
+    get_filtered_data,
+    max_age,
+    metrics_display,
+    min_age,
+    mo,
+):
+    def dashboard(
+        min_val=min_age.value,
+        max_val=max_age.value,
+        metrics=metrics_display(),
+        data=get_filtered_data(),
+        visualization=create_visualization(),
+    ):
+        return mo.vstack(
+            [
+                mo.md(f"### Interactive Dashboard (Age {min_val} - {max_val})"),
+                metrics,
+                mo.md("### Data Table"),
+                mo.ui.table(data, page_size=5),
+                mo.md("### Visualizations"),
+                visualization,
+            ],
+            gap=2,
+            justify="space-between",
+        )
+    dashboard()
+    return
+@app.cell(hide_code=True)
+def _(mo):
+    mo.md(
+        rf"""
+    # Summary and Key Takeaways
+    In this notebook, we've explored DuckDB, a powerful embedded analytical database system. Here's what we covered:
+    1. **Connection types**: We learned the difference between in-memory databases (temporary) and file-based databases (persistent).
+    2. **Table creation**: We created tables with various data types, constraints, and primary keys.
+    3. **Data insertion**: We demonstrated different ways to insert data, including single inserts and bulk loading.
+    4. **SQL queries**: We executed various SQL queries directly and through marimo's UI components.
+    5. **Integration with Polars**: We showed how DuckDB can work seamlessly with Polars DataFrames.
+    6. **Joins and relationships**: We performed JOIN operations between tables to combine related data.
+    7. **Aggregation**: We used aggregate functions to summarize and analyze data.
+    8. **Data conversion**: We converted DuckDB results to both Polars and Pandas DataFrames.
+    9. **Best practices**: We reviewed best practices for managing DuckDB connections and transactions.
+    10. **Visualization**: We created interactive visualizations and dashboards with Plotly and marimo.
+    DuckDB is an excellent tool for data analysis, especially for analytical workloads. Its in-process nature makes it fast and easy to use, while its SQL compatibility makes it accessible for anyone familiar with SQL databases.
+    ### Next Steps
+    - Try loading larger datasets into DuckDB
+    - Experiment with more complex queries and window functions
+    - Use DuckDB's COPY functionality to import/export data from/to files
+    - Create more advanced interactive dashboards with marimo and Plotly
+    """
+    )
+    return
+@app.cell(hide_code=True)
+def _():
+    import marimo as mo
+    import duckdb
+    import polars as pl
+    import os
+    from datetime import date
+    import plotly.express as plotly_express
+    import plotly.graph_objects as plotly_graph_objects
+    import numpy as np
+    return date, duckdb, mo, os, pl, plotly_express
+if __name__ == "__main__":
+    app.run()