> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nekt.com/llms.txt
> Use this file to discover all available pages before exploring further.

# BigQuery as a data source

> Bring data from BigQuery to Nekt.

BigQuery is Google Cloud's fully managed, serverless data warehouse that enables super-fast SQL queries using the processing power of Google's infrastructure. It's designed for large-scale data analytics and provides tools for data analysis, machine learning, and business intelligence.

<img height="50" src="https://mintcdn.com/nekt/3MFKt2g7jzqFpztO/assets/logo/logo-bigquery.png?fit=max&auto=format&n=3MFKt2g7jzqFpztO&q=85&s=629d8f3939ae9504f09c13c27b54e6b1" data-path="assets/logo/logo-bigquery.png" />

## Configuring BigQuery as a Source

In the [Sources](https://app.nekt.ai/sources) tab, click on the "Add source" button located on the top right of your screen. Then, select the BigQuery option from the list of connectors.

Click **Next** and you'll be prompted to add your access.

### 1. Add account access

You'll need a Google Cloud service account with the appropriate permissions to connect to BigQuery.

<Warning>
  **Required IAM Roles**: Your service account **must** have the following roles assigned at the project level:

  * **BigQuery Data Viewer** (`roles/bigquery.dataViewer`) - Allows reading data from tables and views
  * **BigQuery Job User** (`roles/bigquery.jobUser`) - Required to run queries and extract data

  Without the **BigQuery Job User** role, the connector will fail to execute queries even if data viewer permissions are granted.
</Warning>

<Accordion title="How to create and configure a Service Account">
  <Steps>
    <Step title="Create a Service Account">
      1. Go to the [Google Cloud Console](https://console.cloud.google.com/)
      2. Navigate to **IAM & Admin** → **Service Accounts**
      3. Click **Create Service Account**
      4. Enter a name and description for the service account
      5. Click **Create and Continue**
    </Step>

    <Step title="Grant Required Roles">
      1. In the "Grant this service account access to project" section
      2. Add the role **BigQuery Data Viewer** (`roles/bigquery.dataViewer`)
      3. Click **Add Another Role**
      4. Add the role **BigQuery Job User** (`roles/bigquery.jobUser`)
      5. Click **Continue** and then **Done**
    </Step>

    <Step title="Create and Download the JSON Key">
      1. Click on the newly created service account
      2. Go to the **Keys** tab
      3. Click **Add Key** → **Create new key**
      4. Select **JSON** format
      5. Click **Create** - the JSON file will be downloaded automatically
      6. Keep this file secure - you'll upload it to Nekt
    </Step>
  </Steps>
</Accordion>

### 2. Configuration Options

* **GCP Project ID**: The Google Cloud Platform project ID where your BigQuery instance is located. [Find your Project ID](https://support.google.com/googleapi/answer/7014113?hl=en)
* **Service Account Credentials (JSON)**: The JSON credentials file for the service account. Upload the file downloaded in the previous step.
* **Filter Datasets** (Advanced): Filter to extract only specific datasets. If left blank, all available datasets will be discovered.
* **Filter Tables** (Advanced): Filter to extract only specific tables. Shell patterns are supported (e.g., `orders_*`, `*_backup`). If left blank, all available tables will be discovered.
* **Google Storage Bucket** (Advanced): Required for large datasets. When supplied, data is extracted as files and temporarily saved in the bucket.

<Note>
  **Performance Tip**: It's recommended to use the dataset and table filters to speed up the discovery and extraction process, especially in projects with many datasets.
</Note>

### 3. Google Storage Bucket (for Large Datasets)

For extracting large datasets efficiently, you can optionally configure a Google Cloud Storage bucket. When configured, the connector will:

1. Export query results to temporary files in the bucket
2. Stream the data from those files
3. Automatically delete the temporary files after extraction

<Warning>
  **Additional Permissions Required for GCS Bucket**: If you use this option, your service account needs these additional permissions on the bucket:

  * `storage.objects.create`
  * `storage.objects.delete`
  * `storage.objects.get`
  * `storage.objects.list`

  You can grant these by assigning the **Storage Object Admin** (`roles/storage.objectAdmin`) role on the specific bucket.
</Warning>

### 4. Select streams

The connector automatically discovers all tables and views in your BigQuery project (filtered by your configuration). Choose which streams you want to sync.

> Tip: The stream can be found more easily by typing its name.

<Info>If you don't see a stream you were expecting to find, please check if your service account has access to the dataset containing it.</Info>

<Note>Some complex nested types (deeply nested STRUCTs or ARRAYs) may not be fully supported. If you encounter issues during discovery, please contact Nekt support.</Note>

Select the streams and click **Next**.

### 5. Configure data streams

Customize how you want your data to appear in your catalog. Select a name for each table and the type of sync.

* **Table name**: We suggest a name, but feel free to customize it. You have the option to add a **prefix** to all tables at once.

* **Sync Type**: You can choose between INCREMENTAL and FULL\_TABLE.
  * **Incremental**: Every time the extraction happens, we'll get only the new data based on the replication key you configure. Good for append-only tables or when you want to keep historical records.
  * **Full Table**: Every time the extraction happens, we'll get the current state of the data. Good when you want the latest snapshot without deleted records.

<Note>
  For incremental sync to work, you must specify a **replication key** column (e.g., `updated_at`, `created_at`, or an auto-incrementing ID).
</Note>

Once you are done configuring, click **Next**.

### 6. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters.

To define your [Trigger](https://docs.nekt.com/get-started/core-concepts/triggers), consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).

Optionally, you can define additional settings:

* Configure **Delta Log Retention** to determine how long old states of this table should be stored. Read more about this resource [here](https://docs.nekt.com/get-started/core-concepts/resource-control).
* Determine when to execute an **Additional [Full Sync](https://docs.nekt.com/get-started/core-concepts/types-of-sync#additional-full-sync)**. This will complement incremental data extractions, ensuring your data is completely synchronized periodically.

Once you are ready, click **Next** to finalize the setup.

### 7. Check your new source

You can view your new source on the [Sources](https://app.nekt.ai/sources) page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.

<Warning>For you to be able to see it on your [Catalog](https://app.nekt.ai/catalog), you need at least one successful source run.</Warning>

## Streams and Fields

BigQuery is a generic database connector - streams are dynamically discovered based on your project's datasets and tables. Each table or view in your BigQuery project becomes a stream with fields matching the table's schema.

### Supported Data Types

| BigQuery Type           | Mapped Type |
| ----------------------- | ----------- |
| STRING                  | String      |
| INT64, INTEGER          | Integer     |
| FLOAT, FLOAT64, NUMERIC | Number      |
| BOOL                    | Boolean     |
| DATE                    | Date        |
| DATETIME, TIMESTAMP     | DateTime    |
| ARRAY                   | Array       |
| STRUCT                  | Object      |

### Stream Naming

Streams are named using the pattern `{dataset}-{table}`, for example:

* `analytics-events` (from dataset `analytics`, table `events`)
* `sales-orders` (from dataset `sales`, table `orders`)

## Implementation Notes

### Required Permissions Summary

| Permission/Role                  | Purpose                                   | Required                 |
| -------------------------------- | ----------------------------------------- | ------------------------ |
| BigQuery Data Viewer             | Read access to tables and views           | Yes                      |
| BigQuery Job User                | Execute queries for data extraction       | Yes                      |
| Storage Object Admin (on bucket) | Temporary file storage for large extracts | Only if using GCS bucket |

### Performance Considerations

* **Filter your datasets**: Use the `Filter Datasets` option to limit discovery to only the datasets you need
* **Filter your tables**: Use shell patterns in `Filter Tables` to include only relevant tables
* **Large datasets**: Configure a Google Storage bucket for efficient extraction of large tables
* **Incremental sync**: When possible, use incremental sync with a proper replication key to minimize data transfer

### Data Quality

* **Primary Keys**: The connector attempts to detect primary keys from table constraints and unique indexes
* **Nested Data**: STRUCT and ARRAY types are supported and mapped to nested JSON objects/arrays
* **Views**: Both tables and views are discovered and can be synced

### Troubleshooting

| Issue                        | Possible Cause              | Solution                                               |
| ---------------------------- | --------------------------- | ------------------------------------------------------ |
| No streams discovered        | Missing permissions         | Ensure service account has `BigQuery Data Viewer` role |
| Query execution failed       | Missing job permissions     | Ensure service account has `BigQuery Job User` role    |
| Discovery is slow            | Too many datasets/tables    | Use filter options to limit scope                      |
| Large table extraction fails | Memory/timeout issues       | Configure a Google Storage bucket                      |
| Permission denied on bucket  | Missing storage permissions | Grant `Storage Object Admin` on the specific bucket    |

## Skills for agents

<Snippet file="agent-skills-intro.mdx" />

<Card title="Download BigQuery skills file" icon="wand-magic-sparkles" href="/sources/bigquery.md">
  BigQuery connector documentation as plain markdown, for use in AI agent contexts.
</Card>
