> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nekt.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Gmail as a data source

> Bring data from Gmail to Nekt.

Gmail is Google's email service. This connector extracts mailbox metadata and full message content via the Gmail API so you can analyze labels, headers, and message structure in your catalog.

<img width="200" src="https://mintcdn.com/nekt/0tn1_nwKYqAHn7jo/assets/logo/logo-gmail.png?fit=max&auto=format&n=0tn1_nwKYqAHn7jo&q=85&s=6503bfe91b3477e44fd8d82c5de3737b" data-path="assets/logo/logo-gmail.png" />

## Configuring Gmail as a Source

In the [Sources](https://app.nekt.ai/sources) tab, click on the "Add source" button located on the top right of your screen. Then, select the Gmail option from the list of connectors.

Click **Next** and you'll be prompted to add your access.

### 1. Add account access

Authorize Nekt to access your Gmail data. Click the **Google Authorization** button and sign in with the Google account whose mailbox you want to sync. Grant the scopes required for reading messages and labels.

The following configurations are available:

* **Start date**: Only messages received on or after this date are included. The connector uses Gmail search (`after:YYYY/MM/DD`) based on this value.

* **Include SPAM and Trash**: When enabled, messages in Spam and Trash are included in the extraction. When disabled, they follow Gmail's default visibility for listings.

Once you're done, click **Next**.

### 2. Select streams

Choose which data streams you want to sync. For faster extractions, select only the streams that are relevant to your analysis. You can select entire groups of streams or pick specific ones.

> Tip: The stream can be found more easily by typing its name.

Select the streams and click **Next**.

### 3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.

* **Layer**: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
* **Folder**: a folder can be created inside the selected layer to group all tables being created from this new data source.
* **Table name**: we suggest a name, but feel free to customize it. You have the option to add a **prefix** to all tables at once and make this process faster!
* **Sync Type**: you can choose between INCREMENTAL and FULL\_TABLE.
  * **Incremental** (recommended for **Email**): each run brings new or updated messages based on the replication key (`internalDate`). Fits keeping a growing history of mail in the catalog.
  * **Full table** (typical for **Label**): each run replaces the table with the current set of labels and their counts from Gmail.

Once you are done configuring, click **Next**.

### 4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters.

To define your [Trigger](https://docs.nekt.com/get-started/core-concepts/triggers), consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).

Optionally, you can define some additional settings:

* Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource [here](https://docs.nekt.com/get-started/core-concepts/resource-control).
* Determine when to execute an **Additional [Full Sync](https://docs.nekt.com/get-started/core-concepts/types-of-sync#additional-full-sync)**. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.

Once you are ready, click **Next** to finalize the setup.

### 5. Check your new source

You can view your new source on the [Sources](https://app.nekt.ai/sources) page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.

<Warning>For you to be able to see it on your [Catalog](https://app.nekt.ai/catalog), you need at least one successful source run.</Warning>

# Streams and Fields

Below you'll find all available data streams from Gmail and their corresponding fields:

<AccordionGroup>
  <Accordion title="Email">
    Full Gmail messages for the authenticated user. The connector lists message IDs, then fetches each message resource so rows include headers, MIME structure, and nested parts. Sync is incremental on **`internalDate`** (message receive time).

    **Key fields:**

    | Field                               | Type     | Description                                                        |
    | :---------------------------------- | :------- | :----------------------------------------------------------------- |
    | `id`                                | String   | Unique identifier of the record.                                   |
    | `threadId`                          | String   | Identifier of the thread this message belongs to.                  |
    | `labelIds`                          | Array    | Identifiers of labels applied to the message.                      |
    | `snippet`                           | String   | Short excerpt of the message text.                                 |
    | `internalDate`                      | DateTime | Timestamp when the message was received by Gmail.                  |
    | `historyId`                         | String   | Identifier of the last history record that modified this message.  |
    | `sizeEstimate`                      | Integer  | Estimated size of the message in bytes.                            |
    | `payload`                           | Object   | Parsed email structure including headers, body, and nested parts.  |
    | `payload.partId`                    | String   | Identifier of the MIME message part.                               |
    | `payload.mimeType`                  | String   | MIME type of the message part.                                     |
    | `payload.filename`                  | String   | Original filename of the attachment if present.                    |
    | `payload.headers`                   | Array    | RFC 2822 headers on this message part.                             |
    | `payload.headers[].name`            | String   | Name of the email header field.                                    |
    | `payload.headers[].value`           | String   | Value of the email header field.                                   |
    | `payload.body`                      | Object   | Body of this message part.                                         |
    | `payload.body.size`                 | Integer  | Size of the body in bytes.                                         |
    | `payload.parts`                     | Array    | Child MIME parts for multipart messages.                           |
    | `payload.parts[].partId`            | String   | Identifier of the MIME message part.                               |
    | `payload.parts[].mimeType`          | String   | MIME type of the message part.                                     |
    | `payload.parts[].filename`          | String   | Original filename of the attachment if present.                    |
    | `payload.parts[].headers`           | Array    | RFC 2822 headers on this message part.                             |
    | `payload.parts[].headers[].name`    | String   | Name of the email header field.                                    |
    | `payload.parts[].headers[].value`   | String   | Value of the email header field.                                   |
    | `payload.parts[].body`              | Object   | Body of this message part.                                         |
    | `payload.parts[].body.size`         | Integer  | Size of the body in bytes.                                         |
    | `payload.parts[].body.data`         | String   | Body data as a base64url-encoded string before decoding.           |
    | `payload.parts[].body.attachmentId` | String   | Identifier of the attachment when fetched separately from the API. |
  </Accordion>

  <Accordion title="Label">
    All labels in the mailbox (system and user-defined), including visibility settings and aggregate counts.

    **Key fields:**

    | Field                   | Type    | Description                                     |
    | :---------------------- | :------ | :---------------------------------------------- |
    | `id`                    | String  | Unique identifier of the record.                |
    | `name`                  | String  | Name of the record.                             |
    | `messageListVisibility` | String  | Whether the label is shown in the message list. |
    | `labelListVisibility`   | String  | Whether the label is shown in the label list.   |
    | `type`                  | String  | Type classification of the record.              |
    | `messagesTotal`         | Integer | Total number of messages with this label.       |
    | `messagesUnread`        | Integer | Number of unread messages with this label.      |
    | `threadsTotal`          | Integer | Total number of threads with this label.        |
    | `threadsUnread`         | Integer | Number of unread threads with this label.       |
    | `color`                 | Object  | Display colors for the label in the Gmail UI.   |
    | `color.textColor`       | String  | Hex color for the label text.                   |
    | `color.backgroundColor` | String  | Hex color for the label background.             |
  </Accordion>
</AccordionGroup>

# Data Model

```mermaid theme={null}
graph LR;
    Email("Email");
    Label("Label");
    Email -- "labelIds[] contains label.id" --> Label;
```

# Use Cases for Data Analysis

**Join messages to label names.** Resolve `labelIds` on each email to human-readable names using the **Label** table.

<Accordion title="Example SQL (Trino / AWS Athena)">
  ```sql theme={null}
  SELECT
    e.id AS message_id,
    e.threadId,
    e.internalDate,
    l.name AS label_name
  FROM nekt_raw.gmail_email e
  CROSS JOIN nekt_raw.gmail_label l
  WHERE contains(e.labelIds, l.id)
  ORDER BY e.internalDate DESC
  LIMIT 100;
  ```
</Accordion>

<Note>Replace schema and table names (`nekt_raw`, `gmail_email`, `gmail_label`) with the layer and table names you configured for the source. On BigQuery, use `JOIN LATERAL UNNEST(e.labelIds) AS label_id` (or equivalent) instead of `contains`.</Note>

## Implementation Notes

### Extraction Behavior

* **Email** uses the Gmail `messages.list` query with `maxResults: 500` per page and `messages.get` per message, so runtime grows with mailbox volume.
* The **Start date** filter applies to the list query; ensure OAuth and API quotas are sufficient for initial backfills.
* **Label** is a single API call (`users.labels.list`); it does not use incremental state.

### Payload and Timestamps

* `internalDate` is normalized to Unix time in **seconds** in post-processing (API returns milliseconds).
* Top-level `payload.body` in the schema exposes size only; inline and attachment content typically appears under `payload.parts` with optional decoded `data`.
* Threading and deduplication are based on `threadId` and `id` as returned by Gmail.

## Skills for agents

<Snippet file="agent-skills-intro.mdx" />

<Card title="Download Gmail skills file" icon="wand-magic-sparkles" href="/sources/gmail.md">
  Gmail connector documentation as plain markdown, for use in AI agent contexts.
</Card>
