Skip to main content
GitHub is a software development platform for source control, collaboration, and code review. The Nekt GitHub connector uses the GitHub REST API to extract repository metadata, pull requests, and commits into your Catalog.

Configuring GitHub as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the GitHub option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

You’ll need a GitHub Personal Access Token (classic or fine-grained) with permission to read the repositories you want to sync. The following configurations are available:
  • Access Token: Your GitHub Personal Access Token. This field is required and stored securely.
  • Repositories: Optional list of repositories in owner/repo format (for example: nekt-ai/nekt-core). If provided, only these repositories are synced. If left empty, the connector syncs all repositories accessible by the token.
  • Start Date: Optional starting point used by incremental commit syncs. When no prior state exists, commits are fetched from this date forward.
Once you’re done, click Next.

2. Select streams

Choose which data streams you want to sync:
  • repositories
  • pull_requests
  • commits
For faster extractions, select only the streams you need. Select the streams and click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table, and the type of sync.
  • Layer: Choose the layer where extracted GitHub tables will be created.
  • Folder: Optionally group all GitHub tables inside a folder.
  • Table name: A default name is suggested, but you can customize it. You can also add a prefix to all tables.
  • Sync Type: Choose between INCREMENTAL and FULL_TABLE.
    • Incremental: Recommended for commits, using committed_at as the replication key.
    • Full table: Useful for one-off backfills or full refreshes.
Once you are done configuring, click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often your repositories change:
  • Hourly / every few hours for active engineering analytics.
  • Daily for standard operational reporting.
  • Weekly for low-change repositories.
Optionally, you can define:
  • Delta Log Retention: How long Nekt keeps previous table states. See Resource control.
  • Additional Full Sync: Periodic full syncs in addition to incrementals.
When you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the extraction by clicking on the arrow button. Once a run completes successfully, your data appears in the Catalog.
You need at least one successful source run to see the tables in your Catalog.

Streams and Fields

Below you’ll find the available GitHub streams and their core fields.
Repository metadata for all repositories accessible by the token (or only the configured list in repositories).Key fields:
  • id - Repository numeric ID (primary key)
  • full_name - Repository name in owner/repo format
  • private - Indicates whether the repository is private
  • visibility - Repository visibility (public, private, etc.)
  • default_branch - Default branch
  • language - Primary detected language
  • stargazers_count - Number of stars
  • forks_count - Number of forks
  • open_issues_count - Number of open issues
  • created_at, updated_at, pushed_at - Repository lifecycle timestamps
Notes:
  • Primary key: id
  • Replication: full-table style (no replication key)
  • Child context: each repository emits owner and repo context used by pull_requests and commits
Pull requests for each repository. The connector fetches all pull request states (open, closed, and merged).Key fields:
  • id - Pull request ID (primary key)
  • number - Pull request number inside the repository
  • title, body, state, draft, locked
  • user - Pull request author
  • head - Source branch metadata
  • base - Target branch metadata
  • merged_at, closed_at, created_at, updated_at
  • additions, deletions, changed_files
  • comments, review_comments, commits
  • _sdc_repository - Repository context in owner/repo format
Notes:
  • Primary key: id
  • Replication: full-table style (no replication key)
  • Includes repository context fields (owner, repo, _sdc_repository) for easier joins
Commits for each repository. This stream supports incremental sync using commit timestamp.Key fields:
  • sha - Commit SHA (primary key)
  • commit.message - Commit message
  • commit.author.* - Embedded author info from commit payload
  • commit.committer.* - Embedded committer info from commit payload
  • author / committer - GitHub user objects when available
  • parents - Parent commit references
  • stats.additions, stats.deletions, stats.total
  • committed_at - Replication key (derived from commit.committer.date)
  • _sdc_repository - Repository context in owner/repo format
Notes:
  • Primary key: sha
  • Replication key: committed_at
  • Incremental sync sends since to GitHub API based on state bookmark (or start_date when state is not available)

Data Model

The connector follows a repository-centered model:

Use Cases for Data Analysis

This section includes practical SQL examples you can run in Explorer.

1. Pull Request throughput by repository

Measure how many pull requests are created, closed, and merged by repository.
SELECT
   _sdc_repository AS repository,
   COUNT(*) AS total_prs,
   SUM(CASE WHEN state = 'open' THEN 1 ELSE 0 END) AS open_prs,
   SUM(CASE WHEN closed_at IS NOT NULL THEN 1 ELSE 0 END) AS closed_prs,
   SUM(CASE WHEN merged_at IS NOT NULL THEN 1 ELSE 0 END) AS merged_prs
FROM
   nekt_raw.github_pull_requests
GROUP BY
   1
ORDER BY
   total_prs DESC;

2. Commit activity in the last 30 days

Track commit volume and active contributors by repository.
SELECT
   _sdc_repository AS repository,
   COUNT(*) AS commits_last_30d,
   COUNT(DISTINCT COALESCE(author.login, commit.author.email)) AS active_authors
FROM
   nekt_raw.github_commits
WHERE
   CAST(committed_at AS timestamp) >= current_timestamp - interval '30' day
GROUP BY
   1
ORDER BY
   commits_last_30d DESC;

Skills for agents

Download GitHub skills file

GitHub connector documentation as plain markdown, for use in AI agent contexts.