> ## Documentation Index > Fetch the complete documentation index at: https://private-7c7dfe99-mintlify-8a08bda2.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # Execution engine configuration > Configure DataStore execution engine - auto, chdb, or pandas DataStore can execute operations using different backends. This guide explains how to configure and optimize engine selection.

Available Engines

| Engine | Description | Best For | | -------- | ----------------------------------------------- | ----------------------------------------------- | | `auto` | Automatically selects best engine per operation | General use (default) | | `chdb` | Forces all operations through ClickHouse SQL | Large datasets, aggregations | | `pandas` | Forces all operations through pandas | Compatibility testing, pandas-specific features |

Setting the Engine

Global Configuration

```python theme={null} from chdb.datastore.config import config # Option 1: Using set method config.set_execution_engine('auto') # Default config.set_execution_engine('chdb') # Force ClickHouse config.set_execution_engine('pandas') # Force pandas # Option 2: Using shortcuts config.use_auto() # Auto-select config.use_chdb() # Force ClickHouse config.use_pandas() # Force pandas ```

Checking Current Engine

```python theme={null} print(config.execution_engine) # 'auto', 'chdb', or 'pandas' ``` ***

Auto Mode

In `auto` mode (default), DataStore selects the optimal engine for each operation:

Operations Executed in chDB

* SQL-compatible filtering (`filter()`, `where()`) * Column selection (`select()`) * Sorting (`sort()`, `orderby()`) * Grouping and aggregation (`groupby().agg()`) * Joins (`join()`, `merge()`) * Distinct (`distinct()`, `drop_duplicates()`) * Limiting (`limit()`, `head()`, `tail()`)

Operations Executed in pandas

* Custom apply functions (`apply(custom_func)`) * Complex pivot tables with custom aggregations * Operations not expressible in SQL * When input is already a pandas DataFrame

Example

```python theme={null} from chdb import datastore as pd from chdb.datastore.config import config config.use_auto() # Default ds = pd.read_csv("data.csv") # This uses chDB (SQL) result = (ds .filter(ds['amount'] > 100) # SQL: WHERE .groupby('region') # SQL: GROUP BY .agg({'amount': 'sum'}) # SQL: SUM() ) # This uses pandas (custom function) result = ds.apply(lambda row: complex_calculation(row), axis=1) ``` ***

chDB Mode

Force all operations through ClickHouse SQL: ```python theme={null} config.use_chdb() ```

When to Use

* Processing large datasets (millions of rows) * Heavy aggregation workloads * When you want maximum SQL optimization * Consistent behavior across all operations

Performance Characteristics

| Operation Type | Performance | | --------------------- | ---------------------------- | | GroupBy/Aggregation | Excellent (up to 20x faster) | | Complex Filtering | Excellent | | Sorting | Very Good | | Simple Single Filters | Good (slight overhead) |

Limitations

* Custom Python functions may not be supported * Some pandas-specific features require conversion ***

pandas Mode

Force all operations through pandas: ```python theme={null} config.use_pandas() ```

When to Use

* Compatibility testing with pandas * Using pandas-specific features * Debugging pandas-related issues * When data is already in pandas format

Performance Characteristics

| Operation Type | Performance | | ------------------------ | ---------------- | | Simple Single Operations | Good | | Custom Functions | Excellent | | Complex Aggregations | Slower than chDB | | Large Datasets | Memory intensive | ***

Cross-DataStore Engine

Configure the engine for operations that combine columns from different DataStores: ```python theme={null} # Set cross-DataStore engine config.set_cross_datastore_engine('auto') config.set_cross_datastore_engine('chdb') config.set_cross_datastore_engine('pandas') ```

Example

```python theme={null} ds1 = pd.read_csv("sales.csv") ds2 = pd.read_csv("inventory.csv") # This operation involves two DataStores result = ds1.join(ds2, on='product_id') # Uses cross_datastore_engine setting ``` ***

Engine Selection Logic

Auto Mode Decision Tree

```text theme={null} Operation requested │ ├─ Can be expressed in SQL? │ │ │ ├─ Yes → Use chDB │ │ │ └─ No → Use pandas │ └─ Cross-DataStore operation? │ └─ Use cross_datastore_engine setting ```

Function-Level Override

Some functions can have their engine explicitly configured: ```python theme={null} from chdb.datastore.config import function_config # Force specific functions to use specific engine function_config.use_chdb('length', 'substring') function_config.use_pandas('upper', 'lower') ``` See [Function Config](/products/chdb/configuration/function-config) for details. ***

Performance Comparison

Benchmark results on 10M rows: | Operation | pandas (ms) | chdb (ms) | Speedup | | ---------------- | ----------- | --------- | ------- | | GroupBy count | 347 | 17 | 19.93x | | Combined ops | 1,535 | 234 | 6.56x | | Complex pipeline | 2,047 | 380 | 5.39x | | Filter+Sort+Head | 1,537 | 350 | 4.40x | | GroupBy agg | 406 | 141 | 2.88x | | Single filter | 276 | 526 | 0.52x | **Key insights:** * chDB excels at aggregations and complex pipelines * pandas is slightly faster for simple single operations * Use `auto` mode to get the best of both ***

Best Practices

1. Start with Auto Mode

```python theme={null} config.use_auto() # Let DataStore decide ```

2. Profile Before Forcing

```python theme={null} config.enable_profiling() # Run your workload # Check profiler report to see where time is spent ```

3. Force Engine for Specific Workloads

```python theme={null} # For heavy aggregation workloads config.use_chdb() # For pandas compatibility testing config.use_pandas() ```

4. Use explain() to Understand Execution

```python theme={null} ds = pd.read_csv("data.csv") query = ds.filter(ds['age'] > 25).groupby('city').agg({'salary': 'sum'}) # See what SQL will be generated query.explain() ``` ***

Troubleshooting

Issue: Operation slower than expected

```python theme={null} # Check current engine print(config.execution_engine) # Enable debug to see what's happening config.enable_debug() # Try forcing specific engine config.use_chdb() # or config.use_pandas() ```

Issue: Unsupported operation in chdb mode

```python theme={null} # Some pandas operations aren't supported in SQL # Solution: use auto mode config.use_auto() # Or explicitly convert to pandas first df = ds.to_df() result = df.some_pandas_specific_operation() ```

Issue: Memory issues with large data

```python theme={null} # Use chdb engine to avoid loading all data into memory config.use_chdb() # Filter early to reduce data size result = ds.filter(ds['date'] >= '2024-01-01').to_df() # For maximum throughput on large datasets, use performance mode # which enables parallel Parquet reading and single-SQL aggregation config.use_performance_mode() ``` **Performance Mode** If you are running heavy aggregation workloads and don't need exact pandas output compatibility (row order, MultiIndex, dtype corrections), consider using [Performance Mode](/products/chdb/configuration/performance-mode). It automatically sets the engine to `chdb` and removes all pandas compatibility overhead.