
Configuring GitHub as a Source
In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the GitHub option from the list of connectors. Click Next and you’ll be prompted to add your access.1. Add account access
You’ll need a GitHub Personal Access Token (classic or fine-grained) with permission to read the repositories you want to sync. The following configurations are available:- Access Token: Your GitHub Personal Access Token. This field is required and stored securely.
-
Repositories: Optional list of repositories in
owner/repoformat (for example:nekt-ai/nekt-core). If provided, only these repositories are synced. If left empty, the connector syncs all repositories accessible by the token. - Start Date: Optional starting point used by incremental commit syncs. When no prior state exists, commits are fetched from this date forward.
2. Select streams
Choose which data streams you want to sync:- repositories
- pull_requests
- commits
3. Configure data streams
Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table, and the type of sync.- Layer: Choose the layer where extracted GitHub tables will be created.
- Folder: Optionally group all GitHub tables inside a folder.
- Table name: A default name is suggested, but you can customize it. You can also add a prefix to all tables.
- Sync Type: Choose between INCREMENTAL and FULL_TABLE.
- Incremental: Recommended for
commits, usingcommitted_atas the replication key. - Full table: Useful for one-off backfills or full refreshes.
- Incremental: Recommended for
4. Configure data source
Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often your repositories change:- Hourly / every few hours for active engineering analytics.
- Daily for standard operational reporting.
- Weekly for low-change repositories.
- Delta Log Retention: How long Nekt keeps previous table states. See Resource control.
- Additional Full Sync: Periodic full syncs in addition to incrementals.
5. Check your new source
You can view your new source on the Sources page. If needed, manually trigger the extraction by clicking on the arrow button. Once a run completes successfully, your data appears in the Catalog.Streams and Fields
Below you’ll find the available GitHub streams and their core fields.Repositories
Repositories
Repository metadata for all repositories accessible by the token (or only the configured list in
repositories).Key fields:id- Repository numeric ID (primary key)full_name- Repository name inowner/repoformatprivate- Indicates whether the repository is privatevisibility- Repository visibility (public,private, etc.)default_branch- Default branchlanguage- Primary detected languagestargazers_count- Number of starsforks_count- Number of forksopen_issues_count- Number of open issuescreated_at,updated_at,pushed_at- Repository lifecycle timestamps
- Primary key:
id - Replication: full-table style (no replication key)
- Child context: each repository emits
ownerandrepocontext used bypull_requestsandcommits
Pull Requests
Pull Requests
Pull requests for each repository. The connector fetches all pull request states (
open, closed, and merged).Key fields:id- Pull request ID (primary key)number- Pull request number inside the repositorytitle,body,state,draft,lockeduser- Pull request authorhead- Source branch metadatabase- Target branch metadatamerged_at,closed_at,created_at,updated_atadditions,deletions,changed_filescomments,review_comments,commits_sdc_repository- Repository context inowner/repoformat
- Primary key:
id - Replication: full-table style (no replication key)
- Includes repository context fields (
owner,repo,_sdc_repository) for easier joins
Commits
Commits
Commits for each repository. This stream supports incremental sync using commit timestamp.Key fields:
sha- Commit SHA (primary key)commit.message- Commit messagecommit.author.*- Embedded author info from commit payloadcommit.committer.*- Embedded committer info from commit payloadauthor/committer- GitHub user objects when availableparents- Parent commit referencesstats.additions,stats.deletions,stats.totalcommitted_at- Replication key (derived fromcommit.committer.date)_sdc_repository- Repository context inowner/repoformat
- Primary key:
sha - Replication key:
committed_at - Incremental sync sends
sinceto GitHub API based on state bookmark (orstart_datewhen state is not available)
Data Model
The connector follows a repository-centered model:Use Cases for Data Analysis
This section includes practical SQL examples you can run in Explorer.1. Pull Request throughput by repository
Measure how many pull requests are created, closed, and merged by repository.SQL query
SQL query
- AWS
- GCP
2. Commit activity in the last 30 days
Track commit volume and active contributors by repository.SQL query
SQL query
- AWS
- GCP
Skills for agents
Download GitHub skills file
GitHub connector documentation as plain markdown, for use in AI agent contexts.