Google Drive is Google’s cloud storage and collaboration platform. This connector supports two operating modes:Documentation Index
Fetch the complete documentation index at: https://docs.nekt.com/llms.txt
Use this file to discover all available pages before exploring further.
- Records mode (default): Downloads Microsoft Excel files (
.xlsxand.xls) from Google Drive and extracts each worksheet as a separate data stream, so you can load spreadsheet data from Drive, including files in Shared Drives, into Nekt. - Unstructured mode: Downloads any file type (PDFs, images, XML, etc.) from a Google Drive folder and uploads them to a Nekt volume, emitting one metadata record per file. Useful for materializing a catalog of unstructured assets into your warehouse.
Records mode (Excel extraction)
Configuring Google Drive Excel as a Source
In the Sources tab, click on the Add source button on the top right, then select Google Drive (Excel) from the list of connectors. Click Next and you’ll be prompted to add your access.1. Add account access
You need to authorize Nekt to read files from Google Drive and, optionally, pick the file or folder to extract from.- Authentication: Use the Google Authorization flow. Sign in with a Google account that has access to the Drive (and, if applicable, the Shared Drive) where your Excel files are stored. The connector uses OAuth and stores a refresh token so it can keep accessing Drive without re-authorizing.
- File or folder: Use the in-app picker to select the file or folder; the connector will resolve the correct ID.
2. Select streams
The connector discovers streams dynamically from the file or folder you selected:- One stream per sheet: Each worksheet in each Excel file becomes one stream.
- Stream names:
{file_name}_{sheet_name}(lowercase, special characters replaced with underscores), e.g.monthly_sales_january,budget_2024_summary. - Tab config: You can optionally configure per-stream behavior for specific sheets:
- Range: A sheet range (e.g.
A:DorA1:E100) so only part of the sheet is read. If not set, the full sheet is used. - Skip rows: Number of rows to skip from the top before treating the next row as the header. Use this when the sheet has title or empty rows above the data. If headers are on the first row, leave this at 0.
Stream names are generated as
{sanitized_file_name}_{sanitized_sheet_name} (e.g. sales_report_2024_sheet1). When tab config is provided, only the streams you configure use their custom range/skip_rows; others use full sheet and no skip.Tip: You can search for a stream by typing its name.Select the streams and click Next.
3. Configure data streams
Customize how you want the data to appear in your catalog: layer, folder, table names, and sync type.- Layer: Choose the layer where the new tables will live.
- Folder: Optionally create or select a folder inside the layer to group tables from this source.
- Table name: A default name is suggested per stream; you can change it or add a prefix for all tables at once.
- Sync type: Only FULL_TABLE is supported. Each run re-downloads the Excel file(s) and re-reads the selected sheets, so your tables always reflect the current content of the files.
4. Configure data source
Add a short description of the source (e.g. what data it brings or which team owns it), and define your Trigger (how often the extraction runs). Optionally:- Configure Delta Log Retention for how long old table states are kept. See Resource control.
- Schedule an Additional Full Sync if you want periodic full refreshes in addition to your normal schedule.
5. Check your new source
Your new source appears on the Sources page. Trigger a run manually if needed; after a successful run, the tables will appear in your Catalog.Unstructured mode (raw file extraction)
Unstructured mode downloads files from a Google Drive folder, uploads them to a Nekt volume, and emits one metadata record per file through agoogle_drive_files stream. This is useful for extracting PDFs, images, XML files, or any other file type that doesn’t fit a tabular format.
Configuring unstructured mode
In the Sources tab, click Add source and select Google Drive from the list of connectors.1. Add account access
Same as Records mode: authorize Nekt via Google Authorization and select the folder containing your files.2. Select mode
Under Advanced Settings, set the Mode to Unstructured.3. Configure volume
When Unstructured mode is selected, the following fields become available under Nekt volume:| Setting | Description | Required |
|---|---|---|
| API key | Nekt API key with write permission to the destination volume. It can be generated from Settings > API Keys. | Yes |
| Layer slug | Slug of the Nekt layer that owns the destination volume (e.g. files). | Yes |
| Volume slug | Slug of the Nekt volume to upload files to (e.g. volume-wxyz). | Yes |
When you select a volume in your Catalog, you can find the layer and volume slug.For example:
https://app.nekt.ai/catalog?selectedVolume=volumes.volume-MMFXThe layer is volumes and the slug is volume-MMFX.4. Optional: filter files
You can optionally set a File name filter (under Advanced Settings) to only process files matching a wildcard pattern (e.g.*.pdf, invoice_*). Files that don’t match the pattern are skipped.
5. Configure and finish
Complete the remaining steps (data source description, trigger schedule) as with any other source.How it works
- The connector lists all files in the selected Google Drive folder (with pagination for large folders).
- If a File name filter is set, only matching files are processed.
- Each file is downloaded, uploaded to the configured Nekt volume via multipart presigned URLs, and a metadata record is emitted.
- On subsequent runs, only files modified after the last extraction are processed (incremental replication via
modified_at).
Stream: google_drive_files
The unstructured mode emits a single stream with the following fields:
| Field | Type | Description |
|---|---|---|
file_id | String | Google Drive file ID (primary key) |
file_name | String | Original file name |
file_size | Integer | File size in bytes |
mime_type | String | MIME type (e.g. application/pdf, image/png) |
google_drive_url | String | Web view URL for the file in Google Drive |
created_at | DateTime | File creation timestamp |
modified_at | DateTime | Last modification timestamp (replication key) |
uploaded_at | DateTime | Timestamp of when the file was uploaded to the Nekt volume |
nekt_file_id | String | ID assigned by Nekt after upload |
nekt_layer | String | Destination layer slug |
nekt_volume | String | Destination volume slug |
Upload retry behavior can be fine-tuned with Max upload retries (default 3) and Retry backoff seconds (default 2) under Advanced Settings.
Streams and Fields (Records mode)
Streams are discovered from your chosen file or folder. There is no fixed list of streams or fields: each stream corresponds to one worksheet, and its columns are inferred from the Excel file.Sheet streams (one per worksheet)
Sheet streams (one per worksheet)
Each selected Excel file contributes one stream per sheet (tab). The stream name is built from the file name (without extension) and the sheet name, sanitized (e.g.
Data behavior:
revenue_2024_q1).Schema: Column names and types are inferred from the first 1,000 rows of the sheet (after applying skip_rows and any range, if configured). Column headers in the file are slugified: spaces and special characters become underscores, and names are lowercased (e.g. Revenue (USD) → revenue_usd).Field types: The connector maps Excel/pandas types to schema types:| Inferred type | Schema type |
|---|---|
| datetime | DateTime |
| number | Number |
| integer | Integer |
| boolean | Boolean |
| other | String |
- Rows where every cell is empty are dropped.
- Excel blanks become
nullin the output. - Records are cleansed (e.g. invalid values normalized) before being written.
Implementation notes
Authentication
- Google OAuth: The connector uses Google OAuth (client ID, client secret, refresh token) to obtain access tokens for the Google Drive API. Credentials are stored securely.
- Scopes: The connected account must have read access to the chosen file or folder (and to the Shared Drive, if applicable).
- Shared Drives: Supported via
supportsAllDrivesandincludeItemsFromAllDriveswhen resolving the item and listing folder contents.
File and folder behavior
- Records mode: Only Excel files are processed:
.xlsxand.xls. Other files in a selected folder are skipped. Ifitem_idpoints to a single file, it must be one of the supported Excel types. If it points to a folder, the connector lists its contents (including from Shared Drives), filters to Excel files only, and for each file discovers one stream per sheet. - Unstructured mode: Any file type is accepted. The connector lists all files in the folder (with pagination), optionally filters by
search_pattern, downloads each file, uploads it to a Nekt volume, and emits one metadata record per file. Google Workspace files (Docs, Sheets, Slides) that cannot be downloaded as binary are skipped.
Schema and tab config
- Discovery: Schema is built from the first 1,000 rows (after
skip_rowsand optionalrange). If your header row is not in the first row, set Skip rows so the correct row is used as the header. - Tab config: Optional. Keys are the stream names (e.g.
my_file_my_sheet). For each key you can setrangeand/orskip_rows. Streams not present in tab config use the full sheet and no skip. - Column names: All column names are slugified (lowercase, underscores) for consistency and compatibility with the catalog.
Sync type
- Records mode (FULL_TABLE): There is no replication key. Every run re-downloads the file(s) and re-reads the sheets, so sync type is effectively full table.
- Unstructured mode (INCREMENTAL): Uses
modified_atas the replication key. Only files modified after the last successful run are downloaded and uploaded.
Best practices
- Use a dedicated service account or folder: Prefer a Google account or folder used only for this integration, so permissions are clear and revocable.
- Set skip rows when needed: If the first rows of the sheet are empty or contain titles, set Skip rows so the header row is detected correctly.
- Use range for large sheets: If only a subset of columns or rows is needed, set Range in tab config to reduce payload and improve performance.
- Pick only needed streams: Selecting only the sheets you need keeps runs faster and the catalog simpler.
- Schedule according to updates: Run the source as often as your Excel files are updated (e.g. daily or weekly).
Troubleshooting
| Issue | Possible cause | Solution |
|---|---|---|
| Auth or token errors | Invalid or expired OAuth credentials | Re-run Google Authorization and ensure the account has access to the file or folder. |
| ”Cannot use a file that is not xlsx extension” | item_id points to a non-Excel file | Select an .xlsx or .xls file, or a folder that contains only Excel files when using a single-file selection. |
| Wrong columns or headers | Header not in first row or wrong range | Set Skip rows and/or Range in tab config for that stream. |
| Missing or wrong types | Schema inferred from first 1,000 rows | Ensure the first 1,000 rows are representative; mixed or later-format changes can cause type mismatches. |
| Slow or timeout | Very large Excel file or many sheets | Reduce the number of streams (sheets) or use Range to limit data; ensure the Drive account has good network access. |
| Empty or partial data | Range too narrow or skip_rows too high | Check Range and Skip rows for the stream; verify the sheet has data in the selected area. |
| Upload fails (404) | Wrong layer or volume slug | Verify the Layer slug and Volume slug match the values shown in your Nekt catalog. |
| Upload fails (403/401) | Invalid or insufficient API key | Ensure the API key has write permission to the target volume. |
| No files processed in unstructured mode | File name filter too restrictive | Check the File name filter pattern; use * to match all files. |
Skills for agents
Download Google Drive Excel skills file
Google Drive Excel connector documentation as plain markdown, for use in AI agent contexts.