Skip to main content
BigQuery is Google Cloud’s fully managed, serverless data warehouse that enables super-fast SQL queries using the processing power of Google’s infrastructure. It’s designed for large-scale data analytics and provides tools for data analysis, machine learning, and business intelligence.

Configuring BigQuery as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the BigQuery option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

You’ll need a Google Cloud service account with the appropriate permissions to connect to BigQuery.
Required IAM Roles: Your service account must have the following roles assigned at the project level:
  • BigQuery Data Viewer (roles/bigquery.dataViewer) - Allows reading data from tables and views
  • BigQuery Job User (roles/bigquery.jobUser) - Required to run queries and extract data
Without the BigQuery Job User role, the connector will fail to execute queries even if data viewer permissions are granted.
1

Create a Service Account

  1. Go to the Google Cloud Console
  2. Navigate to IAM & AdminService Accounts
  3. Click Create Service Account
  4. Enter a name and description for the service account
  5. Click Create and Continue
2

Grant Required Roles

  1. In the “Grant this service account access to project” section
  2. Add the role BigQuery Data Viewer (roles/bigquery.dataViewer)
  3. Click Add Another Role
  4. Add the role BigQuery Job User (roles/bigquery.jobUser)
  5. Click Continue and then Done
3

Create and Download the JSON Key

  1. Click on the newly created service account
  2. Go to the Keys tab
  3. Click Add KeyCreate new key
  4. Select JSON format
  5. Click Create - the JSON file will be downloaded automatically
  6. Keep this file secure - you’ll upload it to Nekt

2. Configuration Options

  • GCP Project ID: The Google Cloud Platform project ID where your BigQuery instance is located. Find your Project ID
  • Service Account Credentials (JSON): The JSON credentials file for the service account. Upload the file downloaded in the previous step.
  • Filter Datasets (Advanced): Filter to extract only specific datasets. If left blank, all available datasets will be discovered.
  • Filter Tables (Advanced): Filter to extract only specific tables. Shell patterns are supported (e.g., orders_*, *_backup). If left blank, all available tables will be discovered.
  • Google Storage Bucket (Advanced): Required for large datasets. When supplied, data is extracted as files and temporarily saved in the bucket.
Performance Tip: It’s recommended to use the dataset and table filters to speed up the discovery and extraction process, especially in projects with many datasets.

3. Google Storage Bucket (for Large Datasets)

For extracting large datasets efficiently, you can optionally configure a Google Cloud Storage bucket. When configured, the connector will:
  1. Export query results to temporary files in the bucket
  2. Stream the data from those files
  3. Automatically delete the temporary files after extraction
Additional Permissions Required for GCS Bucket: If you use this option, your service account needs these additional permissions on the bucket:
  • storage.objects.create
  • storage.objects.delete
  • storage.objects.get
  • storage.objects.list
You can grant these by assigning the Storage Object Admin (roles/storage.objectAdmin) role on the specific bucket.

4. Select streams

The connector automatically discovers all tables and views in your BigQuery project (filtered by your configuration). Choose which streams you want to sync.
Tip: The stream can be found more easily by typing its name.
If you don’t see a stream you were expecting to find, please check if your service account has access to the dataset containing it.
Some complex nested types (deeply nested STRUCTs or ARRAYs) may not be fully supported. If you encounter issues during discovery, please contact Nekt support.
Select the streams and click Next.

5. Configure data streams

Customize how you want your data to appear in your catalog. Select a name for each table and the type of sync.
  • Table name: We suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once.
  • Sync Type: You can choose between INCREMENTAL and FULL_TABLE.
    • Incremental: Every time the extraction happens, we’ll get only the new data based on the replication key you configure. Good for append-only tables or when you want to keep historical records.
    • Full Table: Every time the extraction happens, we’ll get the current state of the data. Good when you want the latest snapshot without deleted records.
For incremental sync to work, you must specify a replication key column (e.g., updated_at, created_at, or an auto-incrementing ID).
Once you are done configuring, click Next.

6. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can define additional settings:
  • Configure Delta Log Retention to determine how long old states of this table should be stored. Read more about this resource here.
  • Determine when to execute an Additional Full Sync. This will complement incremental data extractions, ensuring your data is completely synchronized periodically.
Once you are ready, click Next to finalize the setup.

7. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.
For you to be able to see it on your Catalog, you need at least one successful source run.

Streams and Fields

BigQuery is a generic database connector - streams are dynamically discovered based on your project’s datasets and tables. Each table or view in your BigQuery project becomes a stream with fields matching the table’s schema.

Supported Data Types

BigQuery TypeMapped Type
STRINGString
INT64, INTEGERInteger
FLOAT, FLOAT64, NUMERICNumber
BOOLBoolean
DATEDate
DATETIME, TIMESTAMPDateTime
ARRAYArray
STRUCTObject

Stream Naming

Streams are named using the pattern {dataset}-{table}, for example:
  • analytics-events (from dataset analytics, table events)
  • sales-orders (from dataset sales, table orders)

Implementation Notes

Required Permissions Summary

Permission/RolePurposeRequired
BigQuery Data ViewerRead access to tables and viewsYes
BigQuery Job UserExecute queries for data extractionYes
Storage Object Admin (on bucket)Temporary file storage for large extractsOnly if using GCS bucket

Performance Considerations

  • Filter your datasets: Use the Filter Datasets option to limit discovery to only the datasets you need
  • Filter your tables: Use shell patterns in Filter Tables to include only relevant tables
  • Large datasets: Configure a Google Storage bucket for efficient extraction of large tables
  • Incremental sync: When possible, use incremental sync with a proper replication key to minimize data transfer

Data Quality

  • Primary Keys: The connector attempts to detect primary keys from table constraints and unique indexes
  • Nested Data: STRUCT and ARRAY types are supported and mapped to nested JSON objects/arrays
  • Views: Both tables and views are discovered and can be synced

Troubleshooting

IssuePossible CauseSolution
No streams discoveredMissing permissionsEnsure service account has BigQuery Data Viewer role
Query execution failedMissing job permissionsEnsure service account has BigQuery Job User role
Discovery is slowToo many datasets/tablesUse filter options to limit scope
Large table extraction failsMemory/timeout issuesConfigure a Google Storage bucket
Permission denied on bucketMissing storage permissionsGrant Storage Object Admin on the specific bucket