Known Limitations

This page outlines known limitations when using Data API v2 with different cloud providers and output formats.

AWS Limitations

Parquet Mode (AWS)

Timestamp with Timezone Issue

Issue: Nekt dates and datetimes are usually stored with timestamp precision, which can cause the following error:

NOT_SUPPORTED: Unsupported Hive type: timestamp(3) with time zone

Workaround: Wrap timestamp columns with CAST in your SELECT query:

SELECT
    CAST(created_at AS TIMESTAMP) AS created_at,
    CAST(updated_at AS TIMESTAMP) AS updated_at,
    other_column
FROM
    "nekt_raw"."table_name"

Duplicate Column Names

Issue: Parquet format does not support duplicate column names in the result set. Error Example:

Column name 'column_name' is duplicated

Workaround: Use column aliases to ensure all columns have unique names:

SELECT
    table1.id AS table1_id,
    table2.id AS table2_id,
    table1.name
FROM
    "nekt_raw"."table1"
JOIN
    "nekt_raw"."table2" ON table1.id = table2.ref_id

All NULL Values (Unknown Type)

Issue: When a column contains only NULL values, Parquet cannot infer the data type, which may cause errors. Error Example:

Unable to infer schema for Parquet. It must be specified manually

Workaround: Explicitly cast NULL columns to a specific type:

SELECT
    column_with_data,
    CAST(column_with_all_nulls AS STRING) AS column_with_all_nulls
FROM
    "nekt_raw"."table_name"

Complex Nested Types

Issue: Deeply nested structures (arrays of structs, maps with complex types) may not serialize correctly to Parquet. Workaround:

Flatten nested structures using lateral views and explode functions
Convert complex types to JSON strings:

SELECT
    id,
    TO_JSON(complex_column) AS complex_column_json
FROM
    "nekt_raw"."table_name"

Large String Values

Issue: Extremely large string values (> 2GB) may cause memory issues during Parquet serialization. Workaround:

Filter or truncate large text fields
Use CSV mode instead for tables with very large text columns

CSV Mode (AWS)

Special Characters in Data

Issue: Data containing special characters like commas, quotes, or newlines may not be properly escaped. Workaround: The CSV mode uses standard escaping, but verify your data after download if you have complex text fields.

Data Type Loss

Issue: All values are represented as strings in CSV format, losing type information. Impact: You’ll need to manually convert data types when importing the CSV into your target system.

Decimal Precision

Issue: Very large decimal values may lose precision when represented as strings. Workaround: Review decimal columns after import to ensure precision is maintained.

GCP Limitations

Parquet Mode (GCP)

Timestamp Formatting

Issue: GCP may handle timestamp formats slightly differently than AWS. Workaround: Similar to AWS, use CAST for timestamp columns:

SELECT
    CAST(timestamp_column AS TIMESTAMP) AS timestamp_column
FROM
    "nekt_raw"."table_name"

BigDecimal Precision

Issue: Very high precision decimal numbers may be rounded in Parquet format. Workaround: If exact precision is critical, consider using CSV mode or casting to string:

SELECT
    CAST(high_precision_decimal AS STRING) AS high_precision_decimal
FROM
    "nekt_raw"."table_name"

CSV Mode (GCP)

The CSV mode limitations for GCP are similar to AWS. Refer to the AWS CSV Mode section above.

General Limitations

Query Timeout

Issue: Queries that take longer than 30 minutes will timeout. Workaround:

Add filters to reduce the data volume
Break large queries into smaller chunks
Use LIMIT clause for testing

Result Set Size

Issue: Very large result sets (> 100GB) may fail or take a long time to generate. Workaround:

Use pagination with LIMIT and OFFSET
Filter data to only include necessary columns and rows
Consider using data snapshots for very large datasets

Memory-Intensive Operations

Issue: Operations like large JOINs, window functions on huge datasets, or complex aggregations may fail due to memory constraints. Workaround:

Optimize queries by filtering early in the query
Use smaller time windows for analysis
Pre-aggregate data in transformations before querying

Best Practices

To avoid common limitations:

Always test queries in the Explore module first to validate they work correctly
Start with a LIMIT clause to test the query structure before processing large datasets
Use explicit type casting for timestamp and decimal columns when using Parquet mode
Monitor query execution time and optimize queries that approach the timeout limit
Choose the right format: Use Parquet for analytical queries and CSV for maximum compatibility
Handle NULL values explicitly by casting them to appropriate types

Need Help?

If you encounter an issue not listed here, please contact support with:

Your SQL query
The error message you received
Your cloud provider (AWS/GCP)
The output format you’re using (Parquet/CSV)

Data API v2

Endpoints

​AWS Limitations

​Parquet Mode (AWS)

​Timestamp with Timezone Issue

​Duplicate Column Names

​All NULL Values (Unknown Type)

​Complex Nested Types

​Large String Values

​CSV Mode (AWS)

​Special Characters in Data

​Data Type Loss

​Decimal Precision

​GCP Limitations

​Parquet Mode (GCP)

​Timestamp Formatting

​BigDecimal Precision

​CSV Mode (GCP)

​General Limitations

​Query Timeout

​Result Set Size

​Memory-Intensive Operations

​Best Practices

​Need Help?