Skip to main content
Amazon DynamoDB is a fully managed NoSQL database service provided by AWS that offers seamless scalability and high performance. It’s designed for applications that need consistent, single-digit millisecond performance at any scale, making it ideal for mobile, web, gaming, and IoT applications.

1. Introduction

By connecting DynamoDB to your Nekt catalog, you can centralize your NoSQL data alongside other data sources, enabling comprehensive analysis and data integration across your organization. This guide will walk you through the process of setting up DynamoDB as a data source, including the necessary AWS permissions configuration and connection settings. It’s important to note we support incremental sync for this source, making the whole process more efficient. For that to work, you need to have a valid replication key in your documents (either a date or timestamp field).

2. Setting up permissions for accessing your Dynamo DB tables

You need to define some permissions to allow Nekt to access your DynamoDB tables. Check the instructions below:
Create role and add custom policy
  • Open the AWS Console using the account that hosts the DynamoDB table you’d like to extract data from.
  • Enter the IAM (Identity and Access Management) service page.
  • In the left panel under Access management, select Roles.
  • Click Create role.
  • In Trusted entity type, select Custom trust policy.
  • Change the Principal value to:
{
    "AWS": "arn:aws:iam::{AWS_ACCOUNT_ID}:role/nekt-ecs-task-role"
}
The variable must be replaced by the ID of the AWS account where the Nekt workspace is deployed. In order to find this information, click the dropdown in the top-right corner of the AWS console page after you log in with your account, then click on the “copy” icon to the right of your Account ID.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::{AWS_ACCOUNT_ID}:role/nekt-ecs-task-role"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
  • Click Next.
  • Click Next again without adding any permission. We’ll create an inline policy further on.
  • Under Role details, you can type any Role name as you want. Example: nekt-dynamodb-source.
  • Optionally, type a Description. Example: Used by Nekt to extract data from DynamoDB.
  • Click Create role .
You should get redirected back to the IAM roles page and a message of success should appear at the top.Add permissions to role
  • Open the role you just created by clicking View role in the right end of the green banner that appeared at the top of the page. If you dismissed it already, you can type the given Role name in the search field and then select it when it appears in the list.
  • Within Permission policies (0), click the Add permissions.
  • Select Create inline policy.
  • Within Policy editor, select JSON.
  • Paste the following policy in the Policy editor:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchGetItem",
                "dynamodb:ConditionCheckItem",
                "dynamodb:DescribeContributorInsights",
                "dynamodb:Scan",
                "dynamodb:ListTagsOfResource",
                "dynamodb:Query",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:PartiQLSelect",
                "dynamodb:DescribeTable",
                "dynamodb:GetItem",
                "dynamodb:DescribeContinuousBackups",
                "dynamodb:DescribeKinesisStreamingDestination",
                "dynamodb:DescribeTableReplicaAutoScaling"
            ],
            "Resource": "{DYNAMO_DB_ARN}"
        }
    ]
}
  • Replace the {DYNAMO_DB_ARN} part with the ARN of the DynamoDB table which you’d like to extract data from.
    • If you don’t know it, open a new tab in your browser and enter the DynamoDB service page.
    • In the left panel, click Tables.
    • Open the desired table by clicking on its name in the list.
    • Under General information, click Additional info.
    • Click the “copy” icon just below Amazon Resource Name (ARN). A tool tip should appear containing the message ARN copied.
    • Go back to the IAM policy editor that you left open in the previous tab and paste the ARN you just copied over {DYNAMO_DB_ARN}.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchGetItem",
                "dynamodb:ConditionCheckItem",
                "dynamodb:DescribeContributorInsights",
                "dynamodb:Scan",
                "dynamodb:ListTagsOfResource",
                "dynamodb:Query",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:PartiQLSelect",
                "dynamodb:DescribeTable",
                "dynamodb:GetItem",
                "dynamodb:DescribeContinuousBackups",
                "dynamodb:DescribeKinesisStreamingDestination",
                "dynamodb:DescribeTableReplicaAutoScaling"
            ],
            "Resource": "arn:aws:dynamodb:{AWS_REGION}:{AWS_ACCOUNT_ID}:table/{TABLE_NAME}"
        }
    ]
}
  • Click Next.
  • Under Policy details type any Policy name as you want. Example: nekt-dynamodb-source-policy.
  • Click Create policy.
You should get redirected back to the role you just created and a message of success should appear at the top.
When setting up the connector at Nekt, you will use the ARN of the role you’ve just created.
Get in touch with us if you face any problem setting up this permission.

3. Add your DynamoDB access

  1. In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the DynamoDB option from the list of connectors.
  2. Click Next and you’ll be prompted to inform:
    • Table names: provide the name of the tables you want to extract. Write them exactly as you see in your Dynamo DB.
    • Assume role ARN (AWS): the ARN role you’ve just created from Step 2 - Setting up permissions.
    • Infer schema sample size: defines how many records you want to use to infer your table’s schema. The more consistent your schema is, the smaller your sample can be.
  3. Click Next.

4. Select your DynamoDB streams

  1. The next step is letting us know which streams you want to bring. You can select entire groups of streams or only a subset of them.
    Tip: The stream can be found more easily by typing its name.
  2. Click Next.

5. Configure your DynamoDB data streams

  1. Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.
  • Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
  • Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
  • Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
  • Sync Type: depending on the data you are bringing to the lake, you can choose between INCREMENTAL and FULL_TABLE. Read more about Sync Types here.
If you define INCREMENTAL as the sync type for your table, you will have to add an incremental key. Incremental keys must be of integer or string types, as long as they’re formatted as ISO8601 dates. To ensure consistency between extractions, make sure the field you select represents the last modification date of a document.
  1. Click Next.

6. Configure your DynamoDB data source

  1. Describe your data source for easy identification within your organization. You can inform things like what data it brings, to which team it belongs, etc.
  2. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).
  3. Optionally, you can define some additional settings (if available).
  • Configure Delta Log Retention and determine for how log we should store old states of this table as it gets updated. Read more about this resource here.
  • Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.

Check your new source!

  1. Click Next to finalize the setup. Once completed, you’ll receive confirmation that your new source is set up!
  2. You can view your new source on the Sources page. Now, for you to be able to see it on your Catalog, you have to wait for the pipeline to run. You can now monitor it on the Sources page to see its execution and completion. If needed, manually trigger the pipeline by clicking on the refresh icon. Once executed, your new table will appear in the Catalog section.
If you encounter any issues, reach out to us via Slack, and we’ll gladly assist you!