
Configuring BigQuery as a Source
In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the BigQuery option from the list of connectors. Click Next and you’ll be prompted to add your access.1. Add account access
You’ll need a Google Cloud service account with the appropriate permissions to connect to BigQuery.How to create and configure a Service Account
How to create and configure a Service Account
Create a Service Account
- Go to the Google Cloud Console
- Navigate to IAM & Admin → Service Accounts
- Click Create Service Account
- Enter a name and description for the service account
- Click Create and Continue
Grant Required Roles
- In the “Grant this service account access to project” section
- Add the role BigQuery Data Viewer (
roles/bigquery.dataViewer) - Click Add Another Role
- Add the role BigQuery Job User (
roles/bigquery.jobUser) - Click Continue and then Done
2. Configuration Options
- GCP Project ID: The Google Cloud Platform project ID where your BigQuery instance is located. Find your Project ID
- Service Account Credentials (JSON): The JSON credentials file for the service account. Upload the file downloaded in the previous step.
- Filter Datasets (Advanced): Filter to extract only specific datasets. If left blank, all available datasets will be discovered.
- Filter Tables (Advanced): Filter to extract only specific tables. Shell patterns are supported (e.g.,
orders_*,*_backup). If left blank, all available tables will be discovered. - Google Storage Bucket (Advanced): Required for large datasets. When supplied, data is extracted as files and temporarily saved in the bucket.
Performance Tip: It’s recommended to use the dataset and table filters to speed up the discovery and extraction process, especially in projects with many datasets.
3. Google Storage Bucket (for Large Datasets)
For extracting large datasets efficiently, you can optionally configure a Google Cloud Storage bucket. When configured, the connector will:- Export query results to temporary files in the bucket
- Stream the data from those files
- Automatically delete the temporary files after extraction
4. Select streams
The connector automatically discovers all tables and views in your BigQuery project (filtered by your configuration). Choose which streams you want to sync.Tip: The stream can be found more easily by typing its name.
If you don’t see a stream you were expecting to find, please check if your service account has access to the dataset containing it.
Some complex nested types (deeply nested STRUCTs or ARRAYs) may not be fully supported. If you encounter issues during discovery, please contact Nekt support.
5. Configure data streams
Customize how you want your data to appear in your catalog. Select a name for each table and the type of sync.- Table name: We suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once.
-
Sync Type: You can choose between INCREMENTAL and FULL_TABLE.
- Incremental: Every time the extraction happens, we’ll get only the new data based on the replication key you configure. Good for append-only tables or when you want to keep historical records.
- Full Table: Every time the extraction happens, we’ll get the current state of the data. Good when you want the latest snapshot without deleted records.
For incremental sync to work, you must specify a replication key column (e.g.,
updated_at, created_at, or an auto-incrementing ID).6. Configure data source
Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can define additional settings:- Configure Delta Log Retention to determine how long old states of this table should be stored. Read more about this resource here.
- Determine when to execute an Additional Full Sync. This will complement incremental data extractions, ensuring your data is completely synchronized periodically.
7. Check your new source
You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.Streams and Fields
BigQuery is a generic database connector - streams are dynamically discovered based on your project’s datasets and tables. Each table or view in your BigQuery project becomes a stream with fields matching the table’s schema.Supported Data Types
| BigQuery Type | Mapped Type |
|---|---|
| STRING | String |
| INT64, INTEGER | Integer |
| FLOAT, FLOAT64, NUMERIC | Number |
| BOOL | Boolean |
| DATE | Date |
| DATETIME, TIMESTAMP | DateTime |
| ARRAY | Array |
| STRUCT | Object |
Stream Naming
Streams are named using the pattern{dataset}-{table}, for example:
analytics-events(from datasetanalytics, tableevents)sales-orders(from datasetsales, tableorders)
Implementation Notes
Required Permissions Summary
| Permission/Role | Purpose | Required |
|---|---|---|
| BigQuery Data Viewer | Read access to tables and views | Yes |
| BigQuery Job User | Execute queries for data extraction | Yes |
| Storage Object Admin (on bucket) | Temporary file storage for large extracts | Only if using GCS bucket |
Performance Considerations
- Filter your datasets: Use the
Filter Datasetsoption to limit discovery to only the datasets you need - Filter your tables: Use shell patterns in
Filter Tablesto include only relevant tables - Large datasets: Configure a Google Storage bucket for efficient extraction of large tables
- Incremental sync: When possible, use incremental sync with a proper replication key to minimize data transfer
Data Quality
- Primary Keys: The connector attempts to detect primary keys from table constraints and unique indexes
- Nested Data: STRUCT and ARRAY types are supported and mapped to nested JSON objects/arrays
- Views: Both tables and views are discovered and can be synced
Troubleshooting
| Issue | Possible Cause | Solution |
|---|---|---|
| No streams discovered | Missing permissions | Ensure service account has BigQuery Data Viewer role |
| Query execution failed | Missing job permissions | Ensure service account has BigQuery Job User role |
| Discovery is slow | Too many datasets/tables | Use filter options to limit scope |
| Large table extraction fails | Memory/timeout issues | Configure a Google Storage bucket |
| Permission denied on bucket | Missing storage permissions | Grant Storage Object Admin on the specific bucket |