Skip to main content
Oracle is a multinational technology company that provides database management systems, cloud services, and enterprise software solutions. Its database platform is widely used for enterprise applications, data warehousing, and business intelligence, offering robust data management and analytics capabilities.

Required pre-work

In order to connect Nekt to a database, you have to do some pre work to ensure access is granted in a secure way. For FULL_TABLE and INCREMENTAL replication, the Oracle user only needs CONNECT plus SELECT privileges on the target schemas and tables. For LOG_BASED replication, additional permissions and configurations are required.
  • Establish a peering connection between Nekt VPC and database VPC
    • On your AWS Console, access the VPC service and go to Peering Connection. Make sure you are logged in with the account that contains the database you want to connect with.
    • Select Create peering connection
    • Setup as requested
      • Give your connection a name (something as ‘Nekt <> Database’)
      • Select nekt-vpc ID as requester (on VPC ID field)
      • Choose ‘Another account’ on the Select another VPC to peer with section. Inform the Account ID.
        • To get the Account ID, go to the RDS service (you’ll find it searching on the AWS Console).
        • Click on DB Instances
        • Select the desired database
        • Copy the Account ID associated with this database.
      • Inform the VPC ID (Accepter)
        • On the desired database details, on the Connectivity and Security section, click on the VPC.
        • Copy the VPC ID
      • Click on Create peering connection. You’ll notice the status is ‘Pending acceptance’.
    • Go to Peering Connection again and your new peering connection should be listed, yet still pending acceptance.
    • On the Actions menu, click on Accept Request and confirm it.
    • You should edit your Peering Connection name to ‘Nekt <> Database’, to keep the pattern.
    • Nekt VPC
      • Access the created Peering Connection - that now should have the status ‘Active’ and a value on Accepter CIDRs. Copy this value, it will be the Nekt VPC IP.
      • In the VPC dashboard menu, go to Route Tables
      • In every route table with ‘nekt’ in its name, follow these steps:
        • Click on the Routes tab
        • Click on Edit routes
        • Click on Add route
        • On the ‘Destination’ column, paste the Nekt VPC IP (Accepter CIDRs previously copied)
        • On the ‘Target’ column, choose ‘Peering Connection’ and the Nekt <> Database option in the field that will open.
        • Keep ‘Status’ and ‘Propagated’ columns as default
        • Target: Peering connection established between Nekt and database
        • Save changes
    • Database VPC
      • Repeat the process done for Nekt VPC, but now use the Nekt VPC IP as Destination
  • Search for RDS on your AWS Console and access it.
  • Select your database and go to Connectivity & Security.
  • Click on VPC security groups.
  • Selecting your DB security group, go to the Inbound Rules tab
  • Click on Edit inbound rules
    • Add the following inbound rule to the security group:
      • Type: Oracle-RDS
      • Source: Custom with the Nekt VPC IP as value
      • Add a description to better identify it. Something like ‘Nekt’
      • Save rule
Done! With that, you are ready to follow the next steps and connect Nekt with your database hosted on AWS through the interface of our application.
  • Ask Nekt to create a fixed IP in your AWS infra.
  • In your database provider, give access to the IP provided by Nekt.
Done! With that, you are ready to follow the next steps and connect Nekt with your database through the interface of our application.
Log-based replication requires additional database configuration. This setup must be completed by a database administrator before enabling LOG_BASED sync in Nekt.
Log-based replication uses Oracle LogMiner to capture data changes (inserts, updates, deletes) in near real-time. This is the most efficient method for capturing changes without impacting source system performance.

Prerequisites

  1. Archive Log Mode: The database must be in ARCHIVELOG mode
  2. Supplemental Logging: Must be enabled for the tables you want to replicate
  3. User Permissions: The extraction user needs specific Oracle privileges depending on your database version

Standard Oracle Setup

Connect to your Oracle database as SYSDBA and execute:
-- 1. Enable archive log mode (requires database restart)
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;

-- 2. Enable minimal supplemental logging at database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

-- 3. Grant common LogMiner permissions to your extraction user
GRANT SELECT ON V_$ARCHIVED_LOG     TO your_username;
GRANT SELECT ON V_$LOG              TO your_username;
GRANT SELECT ON V_$LOGFILE          TO your_username;
GRANT SELECT ON V_$DATABASE         TO your_username;
GRANT SELECT ON V_$LOGMNR_CONTENTS  TO your_username;
GRANT EXECUTE ON DBMS_LOGMNR        TO your_username;
GRANT EXECUTE ON DBMS_LOGMNR_D      TO your_username;

-- 4. Version-specific grants
-- For Oracle 12c and later:
GRANT LOGMINING TO your_username;

-- For Oracle 11g (LOGMINING role does not exist):
GRANT EXECUTE_CATALOG_ROLE   TO your_username;
GRANT SELECT ANY TRANSACTION TO your_username;

-- 5. For specific tables you want to sync, enable supplemental logging
ALTER TABLE schema.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

AWS RDS Oracle Setup

For AWS RDS Oracle instances, use the RDS admin procedures:
-- 1. Enable supplemental logging
exec rdsadmin.rdsadmin_util.alter_supplemental_logging(p_action=>'ADD');

-- 2. Enable force logging
exec rdsadmin.rdsadmin_util.force_logging(p_enable => true);

-- 3. Grant LogMiner access to your extraction user
exec rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR', 'your_username', 'EXECUTE', true);
exec rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D', 'your_username', 'EXECUTE', true);
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS', 'your_username', 'SELECT', true);
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS', 'your_username', 'SELECT', true);
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG', 'your_username', 'SELECT', true);
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG', 'your_username', 'SELECT', true);
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE', 'your_username', 'SELECT', true);
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE', 'your_username', 'SELECT', true);

-- 4. Version-specific grants (AWS RDS is typically 12c+)
GRANT LOGMINING TO your_username;
Additionally, ensure your RDS instance has backup retention period > 0 to enable archive logs:
aws rds modify-db-instance --db-instance-identifier your-instance --backup-retention-period 1

Oracle Multitenant (CDB/PDB) Setup

For Oracle multitenant environments with Pluggable Databases:
-- Connect to CDB root as SYSDBA
-- Enable supplemental logging at CDB level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

-- Grant permissions to common user (prefixed with C##)
GRANT EXECUTE ON DBMS_LOGMNR TO C##your_username CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##your_username CONTAINER=ALL;
-- ... (same grants as above with CONTAINER=ALL)
GRANT LOGMINING TO C##your_username CONTAINER=ALL;

-- Configure pdb_services in Nekt to discover PDB schemas

Verification

Verify your LogMiner setup:
-- Check if archivelog mode is enabled
SELECT LOG_MODE FROM V$DATABASE;
-- Should return: ARCHIVELOG

-- Check supplemental logging status
SELECT SUPPLEMENTAL_LOG_DATA_MIN, SUPPLEMENTAL_LOG_DATA_ALL FROM V$DATABASE;
-- Should show YES for at least MIN

-- Check archive logs are being generated
SELECT SEQUENCE#, FIRST_CHANGE#, NEXT_CHANGE#, STATUS FROM V$ARCHIVED_LOG WHERE ROWNUM <= 5 ORDER BY SEQUENCE# DESC;

-- Validate privileges
SELECT * FROM session_roles WHERE role IN ('LOGMINING', 'EXECUTE_CATALOG_ROLE');
SELECT * FROM session_privs WHERE privilege = 'SELECT ANY TRANSACTION';

Configuring Oracle as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the Oracle option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

Once you have done the pre-work defined above, you can inform your database accesses. The following configurations are available:
  • Host (required): The hostname or IP address of your Oracle database server
  • Port (required): The port for Oracle connection (default: 1521)
  • User (required): Database user for authentication
  • Password (required): Password for authentication
  • Service Name (required): Oracle service name for the connection (also referred to as schema name)
  • Pluggable Database Services (optional): List of Oracle PDB service names for multitenant (CDB/PDB) environments
  • Thick Mode (optional, default: true): Enable Oracle thick mode for enhanced performance. Required for LogMiner operations
  • Chunk Size (optional, default: 25000): Number of rows to fetch at a time. Reduce if your row data is too large
Advanced Configuration
  • SID: Alternative to Service Name for older Oracle configurations
  • Filter Schemas: Array of schema names to include (if empty, all schemas are discovered)
  • Use batch query: Enable keyset pagination with retry logic. Breaks large table extraction into smaller batches ordered by primary key, each with a fresh connection. Tables without an Oracle primary key need their unique key declared under Primary key overrides, otherwise this falls back to a single long-running cursor.
  • Primary key overrides: For tables that have no Oracle PRIMARY KEY, declare the unique key here. It is used to paginate the table safely when ‘Use batch query’ is on (avoiding ORA-01555 ‘snapshot too old’), as the key_properties sent to downstream targets so they can deduplicate/merge instead of only appending, and to retype number key columns as integer (a primary key can’t be a float downstream).
  • Invalid date handling: How to handle Oracle date/timestamp values outside Python’s representable year range (1–9999). Options are coerce (coerce to nearest valid boundary), null (convert to null), or error (raise error and stop pipeline).
  • SSH Tunnel: Configuration for secure connections through a bastion server
  • SSL/TLS: Enable encrypted connections with certificate configuration
  • Connect timeout: Oracle Net CONNECT_TIMEOUT in seconds (listener-level handshake budget).
  • Transport connect timeout: Oracle Net TRANSPORT_CONNECT_TIMEOUT in seconds (TCP-level connect budget).
  • TNS retry count: Oracle Net RETRY_COUNT — driver-level retries on transient TNS errors before surfacing them.
  • TNS retry delay: Oracle Net RETRY_DELAY in seconds between driver-level connect retries.
  • Keepalive interval (minutes): Oracle Net EXPIRE_TIME in minutes — TCP keepalive interval to detect dead connections in long runs. Set to 0 to disable.
  • Discovery retry max: Maximum attempts for discovery-phase operations on transient Oracle errors.
  • Discovery retry delay: Initial delay in seconds between discovery retries. Uses exponential backoff (capped at 60s).
  • Connection pool size: SQLAlchemy QueuePool pool_size. Raise for high-throughput workloads.
  • Connection pool max overflow: SQLAlchemy QueuePool max_overflow.
  • Connection pool recycle (seconds): Seconds after which a pooled connection is recycled.
  • Pool pre-ping: Whether SQLAlchemy should pre-ping pooled connections before use.
Once you are done configuring, click Next.

2. Select streams

The next step is letting us know which streams you want to bring. You can select entire groups of streams or only a subset of them.
Tip: The stream can be found more easily by typing its name.
Click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.
  • Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
  • Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
  • Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
  • Sync Type: you can choose between INCREMENTAL, FULL_TABLE, and LOG_BASED:
    • Incremental: Every time the extraction happens, we’ll get only the new data based on a replication key column. Good for append-only tables or when you want to keep historical records.
    • Full Table: Every time the extraction happens, we’ll get the current state of the data. Good if you don’t want to have deleted data in your catalog or for small reference tables.
    • Log Based: Uses Oracle LogMiner to capture data changes (inserts, updates, deletes) from the database transaction logs. This is the most efficient method for capturing all changes including deletes, with minimal impact on source database performance.
Log-based replication requires additional database setup. See the “Log-Based Replication Setup” section in the pre-work above.
Log-Based Sync Details: Uses Oracle LogMiner to read transaction logs and capture:
  • INSERT: New records with all column values
  • UPDATE: Modified records with new values
  • DELETE: Removed records (marked with _sdc_deleted_at timestamp)
Key Features:
  • Near real-time change data capture (CDC)
  • Captures deletes (not possible with incremental sync)
  • Minimal impact on source database performance
  • Supports both traditional (non-CDB) and Oracle Multitenant (CDB/PDB) environments
  • AWS RDS Oracle compatibility with automatic fallback to archived logs
System Columns Added:
  • _sdc_lsn: The Oracle System Change Number (SCN) when the change was committed
  • _sdc_deleted_at: Timestamp when the record was deleted (null for inserts/updates)
Click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can define some additional settings:
  • Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource here.
  • Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.
Once you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.
For you to be able to see it on your Catalog, you need at least one successful source run.

Streams and Fields

Because Oracle is a relational database, the streams and fields correspond directly to the tables, views, and columns available in the schemas you have granted access to. During the setup process, Nekt will automatically discover all accessible tables and views. You will be able to select specifically which ones you want to sync into your catalog. The data types from Oracle (e.g., VARCHAR2, NUMBER, DATE, TIMESTAMP) will be automatically mapped to standard Nekt data types during the extraction. Binary fields (such as BLOB, RAW, and LONG RAW) are automatically converted to uppercase hex strings, while character large objects (CLOB and NCLOB) are fetched directly as standard strings.

Implementation Notes

Oracle Multitenant Support

The connector supports both traditional (non-CDB) and multitenant (CDB/PDB) Oracle environments. For Oracle 12c+ multitenant architecture (CDB/PDB), configure the pdb_services option with your PDB service names. The tap will:
  • Connect to each PDB to discover schemas and tables
  • Properly switch containers during extraction
  • Handle LogMiner operations across the multitenant environment seamlessly, safely fetching System Change Numbers (SCN) regardless of the architecture
  • Automatically detect CDB/non-CDB environments to ensure compatibility

Data Type Mapping and Primary Key Overrides

  • Scale-0 NUMBER Columns: Oracle NUMBER columns with a scale of 0 (e.g., NUMBER(p,0) or NUMBER(p,-s)) are automatically mapped to JSON integer to ensure they are stored correctly as INT64 (instead of FLOAT) in downstream targets.
  • Primary Key Overrides Coercion: For tables missing a native Oracle primary key, using the primary_key_overrides setting guarantees the correct keys are defined. If any of the declared key columns are discovered as bare NUMBER (which might otherwise default to a float), they will be forcibly retyped to integer. A primary key cannot be a float downstream as engines rely on absolute equality for deduplication and merges.

Column Name Sanitization & Preservation

To ensure compatibility with data warehouses like BigQuery, column names and primary keys are automatically sanitized during extraction. Any characters that are not letters, numbers, or underscores (such as $) are replaced with underscores. Additionally, if a column name starts with a number, an underscore is prepended. Original casing is preserved to match data extraction safely. The connector also preserves original, full-length column names directly from the table schema. This avoids historical Oracle database dialect limitations where long column names were truncated to 30 characters and appended with disambiguation suffixes in the extracted records.

Connection Resilience

The connector includes built-in resilience against transient network and database errors (e.g., ORA-12170, ORA-12547, ORA-03113, DPY-4011). It automatically applies driver-level connection retries, SQLAlchemy pool pre-pinging, and exponential backoff during discovery. You can tune these behaviors using the Advanced Configuration options (such as timeouts, retry counts, and pool settings) for high-latency or unstable network environments.

Performance Considerations

SettingRecommendation
Chunk SizeReduce from 25,000 if memory issues occur with wide tables
Use Batch QueryEnable for very large tables to use keyset pagination and avoid connection timeouts
Primary key overridesDefine for tables missing a primary key when using Use Batch Query, or to enable downstream deduplication/merging
Thick ModeKeep enabled (true) for LogMiner and optimal performance
Filter SchemasSpecify schemas to reduce discovery time on large databases
Stream SelectionDuring extraction, the connector automatically limits schema discovery to only the selected streams, significantly reducing discovery overhead on large databases.

Troubleshooting

IssueSolution
LogMiner fails to startVerify supplemental logging is enabled and user has required grants
ORA-00942: table or view does not existMissing dictionary view grants. Ensure the extraction user has SELECT on V_$ARCHIVED_LOG, V_$LOG, V_$LOGFILE, V_$DATABASE, and V_$LOGMNR_CONTENTS
ORA-01031: insufficient privilegesMissing permissions to read mined rows after LogMiner starts. Grant LOGMINING (12c+) or EXECUTE_CATALOG_ROLE + SELECT ANY TRANSACTION (11g)
No changes capturedCheck archive logs exist and haven’t been purged
AWS RDS permission deniedUse rdsadmin.rdsadmin_util procedures for grants
SCN gaps in dataNormal behavior - LogMiner processes committed transactions only
Slow extractionEnable thick mode, adjust chunk_size, filter to needed schemas, or enable Use Batch Query
ORA-01555: snapshot too oldThis typically happens on huge tables when a single streaming cursor outlives the source database’s UNDO retention. To mitigate, enable Use batch query. If the table lacks a primary key constraint, you must also define its unique columns in the Primary key overrides setting to allow proper keyset pagination.
Dates arriving in unexpected formatThe connector attempts to set Oracle NLS variables (NLS_LANG, NLS_DATE_FORMAT, etc.) automatically. If initialization fails, dates might fall back to strings. Check your Instant Client installation.
Transient connection errors (ORA-12170, DPY-4011)Normal in volatile network environments. The connector will automatically retry. If issues persist, consider increasing TNS retry parameters in the advanced settings.
SSH Tunnel connection crashes immediatelyThis was caused by a known incompatibility with recent SSH libraries (paramiko >= 4.0) and has been automatically resolved in the latest connector version.
If you encounter any issues, reach out to us via Slack, and we’ll gladly assist you!

Skills for agents

Download Oracle skills file

Oracle connector documentation as plain markdown, for use in AI agent contexts.