1. Add your BigQuery access

  1. In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the BigQuery option from the list of connectors.

  2. Click Next and you’ll be prompted to add the connector configurations.

    Default configuration

    • Project ID: the Google Cloud Platform project ID where your BigQuery instance is located. You can find more information on where to find the Project ID on this link.
    • Service account credentials (JSON): The credentials JSON file for the service account linked to your BigQuery project. Make sure the account has access to perform read operations on your datasets and tables. You must ensure this account has the following roles: BigQuery Data Viewer and BigQuery Job User. For more info on how to create the service account, please check this link.

    Advanced configuration

    • Filter datasets: if you want to extract only a subset of datasets from BigQuery, use this filter. If left blank, all available datasets will be discovered.
    • Filter tables: if you want to extract only specific tables from BigQuery, use this filter. Shell patterns are supported. If left blank, all available tables will be discovered.

    It’s a good practice to filter datasets and tables to speed up the discovery and extraction process.

  3. Click Next.

2. Select your BigQuery streams

  1. The next step is letting us know which streams you want to bring. You can select entire groups of streams or only a subset of them.

    Tip: The stream can be found more easily by typing its name.

Some complex nested types are currently not supported by the connector. If an error pops up during the discovery process, please get in touch with Nekt support.

  1. Click Next.

3. Configure your BigQuery data streams

  1. Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a name for each table (which will effectively contain the fetched data) and the type of sync.
  • Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
  • Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
  • Sync Type: depending on the data you are bringing to the lake, you can choose between INCREMENTAL and FULL_TABLE. Read more about Sync Types here.
  1. Click Next.

4. Configure your BigQuery data source

  1. Describe your data source for easy identification within your organization. You can inform things like what data it brings, to which team it belongs, etc.

  2. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).

  3. Optionally, you can define some additional settings (if available).

  • Configure Delta Log Retention and determine for how log we should store old states of this table as it gets updated. Read more about this resource here.
  • Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.

Check your new source!

  1. Click Next to finalize the setup. Once completed, you’ll receive confirmation that your new source is set up!

  2. You can view your new source on the Sources page. Now, for you to be able to see it on your Catalog, you have to wait for the pipeline to run. You can now monitor it on the Sources page to see its execution and completion. If needed, manually trigger the pipeline by clicking on the refresh icon. Once executed, your new table will appear in the Catalog section.

If you encounter any issues, reach out to us via Slack, and we’ll gladly assist you!