> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-mintlify-8a08bda2.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# DataStore debugging

> Debug DataStore operations with explain(), profiling, and logging

DataStore provides comprehensive debugging tools to understand and optimize your data pipelines.

<h2 id="overview">
  Debugging Tools Overview
</h2>

| Tool        | Purpose                | When to Use                  |
| ----------- | ---------------------- | ---------------------------- |
| `explain()` | View execution plan    | Understand what SQL will run |
| Profiler    | Measure performance    | Find slow operations         |
| Logging     | View execution details | Debug unexpected behavior    |

<h2 id="decision-matrix">
  Quick Decision Matrix
</h2>

| Need                | Tool        | Command                     |
| ------------------- | ----------- | --------------------------- |
| See execution plan  | `explain()` | `ds.explain()`              |
| Measure performance | Profiler    | `config.enable_profiling()` |
| Debug SQL queries   | Logging     | `config.enable_debug()`     |
| All of the above    | Combined    | See below                   |

<h2 id="quick-setup">
  Quick Setup
</h2>

<h3 id="enable-all">
  Enable All Debugging
</h3>

```python theme={null}
from chdb import datastore as pd
from chdb.datastore.config import config

# Enable all debugging
config.enable_debug()        # Verbose logging
config.enable_profiling()    # Performance tracking

ds = pd.read_csv("data.csv")
result = ds.filter(ds['age'] > 25).groupby('city').agg({'salary': 'mean'})

# View execution plan
result.explain()

# Get profiler report
from chdb.datastore.config import get_profiler
profiler = get_profiler()
profiler.report()
```

***

<h2 id="explain">
  explain() Method
</h2>

View the execution plan before running a query.

```python title="Query" theme={null}
ds = pd.read_csv("data.csv")

query = (ds
    .filter(ds['amount'] > 1000)
    .groupby('region')
    .agg({'amount': ['sum', 'mean']})
)

# View plan
query.explain()
```

```text title="Response" theme={null}
Pipeline:
  Source: file('data.csv', 'CSVWithNames')
  Filter: amount > 1000
  GroupBy: region
  Aggregate: sum(amount), avg(amount)

Generated SQL:
SELECT region, SUM(amount) AS sum, AVG(amount) AS mean
FROM file('data.csv', 'CSVWithNames')
WHERE amount > 1000
GROUP BY region
```

See [explain() Documentation](/products/chdb/debugging/explain) for details.

***

<h2 id="profiling">
  Profiling
</h2>

Measure execution time for each operation.

```python title="Query" theme={null}
from chdb.datastore.config import config, get_profiler

# Enable profiling
config.enable_profiling()

# Run operations
ds = pd.read_csv("large_data.csv")
result = (ds
    .filter(ds['amount'] > 100)
    .groupby('category')
    .agg({'amount': 'sum'})
    .sort('sum', ascending=False)
    .head(10)
    .to_df()
)

# View report
profiler = get_profiler()
profiler.report(min_duration_ms=0.1)
```

```text title="Response" theme={null}
Performance Report
==================
Step                          Duration    Calls
----                          --------    -----
read_csv                      1.234s      1
filter                        0.002s      1
groupby                       0.001s      1
agg                           0.089s      1
sort                          0.045s      1
head                          0.001s      1
to_df (SQL execution)         0.567s      1
----                          --------    -----
Total                         1.939s      7
```

See [Profiling Guide](/products/chdb/debugging/profiling) for details.

***

<h2 id="logging">
  Logging
</h2>

View detailed execution logs.

```python theme={null}
from chdb.datastore.config import config

# Enable debug logging
config.enable_debug()

# Run operations - logs will show:
# - SQL queries generated
# - Execution engine used
# - Cache hits/misses
# - Timing information
```

Log output example:

```text theme={null}
DEBUG - DataStore: Creating from file 'data.csv'
DEBUG - Query: SELECT region, SUM(amount) FROM ... WHERE amount > 1000 GROUP BY region
DEBUG - Engine: Using chdb for aggregation
DEBUG - Execution time: 0.089s
DEBUG - Cache: Storing result (key: abc123)
```

See [Logging Configuration](/products/chdb/debugging/logging) for details.

***

<h2 id="scenarios">
  Common Debugging Scenarios
</h2>

<h3 id="scenario-wrong-results">
  1. Query Not Returning Expected Results
</h3>

```python theme={null}
# Step 1: View the execution plan
query = ds.filter(ds['age'] > 25).groupby('city').sum()
query.explain(verbose=True)

# Step 2: Enable logging to see SQL
config.enable_debug()

# Step 3: Run and check logs
result = query.to_df()
```

<h3 id="scenario-slow">
  2. Query Running Slowly
</h3>

```python theme={null}
# Step 1: Enable profiling
config.enable_profiling()

# Step 2: Run your query
result = process_data()

# Step 3: Check profiler report
profiler = get_profiler()
profiler.report()

# Step 4: Identify slow operations and optimize
```

<h3 id="scenario-engine">
  3. Understanding Engine Selection
</h3>

```python theme={null}
# Enable verbose logging
config.enable_debug()

# Run operations
result = ds.filter(ds['x'] > 10).apply(custom_func)

# Logs will show which engine was used for each operation:
# DEBUG - filter: Using chdb engine
# DEBUG - apply: Using pandas engine (custom function)
```

<h3 id="scenario-cache">
  4. Debugging Cache Issues
</h3>

```python theme={null}
# Enable debug to see cache operations
config.enable_debug()

# First run
result1 = ds.filter(ds['x'] > 10).to_df()
# LOG: Cache miss, executing query

# Second run (should use cache)
result2 = ds.filter(ds['x'] > 10).to_df()
# LOG: Cache hit, returning cached result

# If not caching when expected, check:
# - Are operations identical?
# - Is cache enabled? config.cache_enabled
```

***

<h2 id="best-practices">
  Best Practices
</h2>

<h3 id="best-practice-1">
  1. Debug in Development, Not Production
</h3>

```python theme={null}
# Development
config.enable_debug()
config.enable_profiling()

# Production
config.set_log_level(logging.WARNING)
config.set_profiling_enabled(False)
```

<h3 id="best-practice-2">
  2. Use explain() Before Running Large Queries
</h3>

```python theme={null}
# Build query
query = ds.filter(...).groupby(...).agg(...)

# Check plan first
query.explain()

# If plan looks good, execute
result = query.to_df()
```

<h3 id="best-practice-3">
  3. Profile Before Optimizing
</h3>

```python theme={null}
# Don't guess what's slow - measure it
config.enable_profiling()
result = your_pipeline()
get_profiler().report()
```

<h3 id="best-practice-4">
  4. Check SQL When Results Are Wrong
</h3>

```python theme={null}
# View generated SQL
print(query.to_sql())

# Compare with expected SQL
# Run SQL directly in ClickHouse to verify
```

***

<h2 id="summary">
  Debugging Tools Summary
</h2>

| Tool             | Command                     | Output                |
| ---------------- | --------------------------- | --------------------- |
| Explain plan     | `ds.explain()`              | Execution steps + SQL |
| Verbose explain  | `ds.explain(verbose=True)`  | + Metadata            |
| View SQL         | `ds.to_sql()`               | SQL query string      |
| Enable debug     | `config.enable_debug()`     | Detailed logs         |
| Enable profiling | `config.enable_profiling()` | Timing data           |
| Profiler report  | `get_profiler().report()`   | Performance summary   |
| Clear profiler   | `get_profiler().reset()`    | Clear timing data     |

***

<h2 id="next-steps">
  Next Steps
</h2>

* [explain() Method](/products/chdb/debugging/explain) - Detailed execution plan documentation
* [Profiling Guide](/products/chdb/debugging/profiling) - Performance measurement
* [Logging Configuration](/products/chdb/debugging/logging) - Log level and format setup
