Skip to main content
MongoDB is a popular NoSQL document database that stores data in flexible, JSON-like documents. It’s designed for scalability and performance, making it ideal for applications that need to handle large amounts of unstructured or semi-structured data with high availability and horizontal scaling.

1. Add your MongoDB access

  1. In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the MongoDB option from the list of connectors.
  2. Click Next and you’ll be prompted to add your access. Default configurations
    • Host: the full connection URL of your MongoDB database. For detailed instructions, check MongoDB docs. Please make sure your Mongo instance accepts connections coming from your AWS account associated with Nekt.
    Advanced configurations
    • Include databases: allows you to filter only the databases of interest inside your Mongo instance. This helps speed up the discovery process, as it tends to be a time-consuming operation for large databases.
    • Batch size: defines the batch size for fetching records from your Mongo instance. Setting a smaller number helps reduce the overload on the database, however the extraction time will increase as more requests need to be made to fetch the entire collection. We recommend leaving it blank at first, changing it only if needed.
  3. Click Next.

2. Select your MongoDB streams

  1. The next step is letting us know which streams you want to bring. A stream represents a collection in your database.
All streams have the following properties in their schema:
  • _id: a string field corresponding to the document ID.
  • document: a stringified JSON version of the document.
Additionally, there are some properties that might be included depending on the sync type. For incremental sync, the incremental key is mapped as an additional field in the schema:
  • incremental_key: a date-time, integer or timestamp field corresponding to the incremental key defined by the user during the stream configuration.
Tip: The stream can be found more easily by typing its name.
  1. Click Next.

3. Configure your MongoDB data streams

  1. Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a name for each table (which will effectively contain the fetched data) and the type of sync.
    • Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
    • Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
    • Sync Type: depending on the data you are bringing to the lake, you can choose between INCREMENTAL and FULL_TABLE. Read more about Sync Types here.
      Important info regarding sync types: - Incremental sync is allowed for collections that contain a valid incremental key. The incremental key must be either Int32, Int64, Date or Timestamp. - When defining the incremental key, you must type the exact name of the field that exists in your documents. - If an index doesn’t exist for the selected incremental key, a new index will be created the first time the connector runs.
  2. Click Next.

4. Configure your MongoDB data source

  1. Describe your data source for easy identification within your organization. You can inform things like what data it brings, to which team it belongs, etc.
  2. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).
  3. Optionally, you can define some additional settings (if available).
    • Configure Delta Log Retention and determine for how log we should store old states of this table as it gets updated. Read more about this resource here.
    • Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.

Check your new source!

  1. Click Next to finalize the setup. Once completed, you’ll receive confirmation that your new source is set up!
  2. You can view your new source on the Sources page. Now, for you to be able to see it on your Catalog, you have to wait for the pipeline to run. You can now monitor it on the Sources page to see its execution and completion. If needed, manually trigger the pipeline by clicking on the refresh icon. Once executed, your new table will appear in the Catalog section.
If you encounter any issues, reach out to us via Slack, and we’ll gladly assist you!
I