Bring data from AWS S3 CSV files to your Lakehouse.
AWS S3 CSV refers to CSV (Comma-Separated Values) files stored in Amazon Simple Storage Service (S3), which is a scalable cloud storage service. This allows you to access and process CSV data files that are stored in the cloud, providing reliable and secure data storage with high availability.
In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the AWS S3 (CSV) option from the list of connectors.
Click Next and you’ll be prompted to add your access. You will need to provide your AWS S3 bucket name and specify the tables you want to import from the S3 bucket.
Customize how you want your data to appear in your catalog. Select a name for each table (which will contain the fetched data) and the type of sync.
Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix and make this process faster!
Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
Incremental: every time the extraction happens, we’ll get only the new data - which is good if, for example, you want to keep every record ever fetched.
Full table: every time the extraction happens, we’ll get the current state of the data - which is good if, for example, you don’t want to have deleted data in your catalog.
Describe your data source for easy identification within your organization. You can inform things like what data it brings, to which team it belongs, etc.
To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).
Click Next to finalize the setup. Once completed, you’ll receive confirmation that your new source is set up!
You can view your new source on the Sources page. Now, for you to be able to see it on your Catalog, you have to wait for the pipeline to run. You can now monitor it on the Sources page to see its execution and completion. If needed, manually trigger the pipeline by clicking on the refresh icon. Once executed, your new table will appear in the Catalog section.
If you encounter any issues, reach out to us via Slack, and we’ll gladly assist you!