AWS Limitations
Parquet Mode (AWS)
Timestamp with Timezone Issue
Issue: Nekt dates and datetimes are usually stored with timestamp precision, which can cause the following error:Duplicate Column Names
Issue: Parquet format does not support duplicate column names in the result set. Error Example:All NULL Values (Unknown Type)
Issue: When a column contains only NULL values, Parquet cannot infer the data type, which may cause errors. Error Example:Complex Nested Types
Issue: Deeply nested structures (arrays of structs, maps with complex types) may not serialize correctly to Parquet. Workaround:- Flatten nested structures using lateral views and explode functions
- Convert complex types to JSON strings:
Large String Values
Issue: Extremely large string values (> 2GB) may cause memory issues during Parquet serialization. Workaround:- Filter or truncate large text fields
- Use CSV mode instead for tables with very large text columns
CSV Mode (AWS)
Special Characters in Data
Issue: Data containing special characters like commas, quotes, or newlines may not be properly escaped. Workaround: The CSV mode uses standard escaping, but verify your data after download if you have complex text fields.Data Type Loss
Issue: All values are represented as strings in CSV format, losing type information. Impact: You’ll need to manually convert data types when importing the CSV into your target system.Decimal Precision
Issue: Very large decimal values may lose precision when represented as strings. Workaround: Review decimal columns after import to ensure precision is maintained.GCP Limitations
Parquet Mode (GCP)
Timestamp Formatting
Issue: GCP may handle timestamp formats slightly differently than AWS. Workaround: Similar to AWS, use CAST for timestamp columns:BigDecimal Precision
Issue: Very high precision decimal numbers may be rounded in Parquet format. Workaround: If exact precision is critical, consider using CSV mode or casting to string:CSV Mode (GCP)
The CSV mode limitations for GCP are similar to AWS. Refer to the AWS CSV Mode section above.General Limitations
Query Timeout
Issue: Queries that take longer than 30 minutes will timeout. Workaround:- Add filters to reduce the data volume
- Break large queries into smaller chunks
- Use LIMIT clause for testing
Result Set Size
Issue: Very large result sets (> 100GB) may fail or take a long time to generate. Workaround:- Use pagination with LIMIT and OFFSET
- Filter data to only include necessary columns and rows
- Consider using data snapshots for very large datasets
Memory-Intensive Operations
Issue: Operations like large JOINs, window functions on huge datasets, or complex aggregations may fail due to memory constraints. Workaround:- Optimize queries by filtering early in the query
- Use smaller time windows for analysis
- Pre-aggregate data in transformations before querying
Best Practices
To avoid common limitations:- Always test queries in the Explore module first to validate they work correctly
- Start with a LIMIT clause to test the query structure before processing large datasets
- Use explicit type casting for timestamp and decimal columns when using Parquet mode
- Monitor query execution time and optimize queries that approach the timeout limit
- Choose the right format: Use Parquet for analytical queries and CSV for maximum compatibility
- Handle NULL values explicitly by casting them to appropriate types
Need Help?
If you encounter an issue not listed here, please contact support with:- Your SQL query
- The error message you received
- Your cloud provider (AWS/GCP)
- The output format you’re using (Parquet/CSV)