Skip to main content
Firestore is Google’s NoSQL cloud database designed for mobile and web applications. It provides real-time data synchronization, offline support, and automatic scaling, making it ideal for building responsive applications that need to work across multiple devices and platforms.

Configuring Firestore as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the Firestore option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

Check the instructions next to each configuration option to discover where you can find the required parameters for the connection. These are the available configurations for this source:
  • Credentials file: The credentials file for the service account linked to your Firestore project. Make sure the account has access to perform read operations on your collections. You should upload the JSON credentials file directly.
  • Database name: The database name to extract data from. If not provided, the default database for the account will be used.
  • Batch size: The number of documents to process in a single stream. Keep in mind higher batch sizes may cause timeouts when reading the document stream.
  • Subcollection extraction mode: Determines how subcollections are handled by the connector.
    • Nested documents: Recursively fetches subcollections for each parent collection, embedding subcollections within parent documents.
    • Collection group: Extracts subcollections as separate streams, using Collection Group queries to speed up the extraction process.
    • None: Ignores subcollections.
  • Filter collections by name: This parameter allows you to filter only specific collections from the database. This is useful to speed up the discovery process when you’re interested in just a few collections, but your database has a lot of available ones to be discovered.
  • Filter subcollections by name: Filter nested subcollections to avoid extracting unnecessary data. This is useful when you just need a subset of subcollections from a given document. This configuration depends on the Subcollection extraction mode chosen.
    • For Nested documents mode, you should use the notation collection.sub_collection, for example conversations.messages if you want to extract the subcollection messages from the top-level collection conversations. A wildcard is also accepted if you want to get all nested subcollections, for example conversations.messages.* will extract all nested subcollections under conversations -> messages.
    • For Collection group mode you can simply enter the name of each subcollection you want to extract. Please note subcollections with the same name under different root level collections will be mapped to the same stream. It’s a good practice to use unique names for subcollections to avoid this behavior.
  • Start date: Starting point for incremental syncs (ISO-8601 format).
Once you’re done, click Next.

2. Select streams

The next step is letting us know which streams you want to bring. Each stream available in that list corresponds to a top-level collection or subcollection on Firestore. You can select entire groups of streams or only a subset of them.
Tip: The stream can be found more easily by typing its name.
Select the streams and click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will contain the fetched data) and the type of sync.
  • Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
  • Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
  • Table name: we suggest the same name as the collection, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
  • Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
    • Incremental: every time the extraction happens, we’ll get only the new data - which is good if, for example, you want to keep every record ever fetched. In order for that to work, you need to have a valid datetime or integer incremental field inside your documents.
    • Full table: every time the extraction happens, we’ll get the current state of the data - which is good if, for example, you don’t want to have deleted data in your catalog. However, keep in mind this increases resource usage such as computing time and storage.
Once you are done configuring, click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can define some additional settings:
  • Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource here.
  • Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.
Once you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.
For you to be able to see it on your Catalog, you need at least one successful source run.

Streams and Fields

Firestore streams correspond dynamically to the collections and subcollections present in your database.
All extracted streams have the following standard properties:
FieldTypeDescription
_idStringA string field corresponding to the document ID.
documentStringA stringified JSON version of the document payload.
replication_keyDatetime / IntegerFor incremental syncs, this corresponds to the incremental key selected by the user during the stream configuration.
pathStringFor subcollections extracted using Collection groups, a string indicating the full path to reach the document.

Implementation Notes

Subcollections

The connector supports extracting subcollections. When configuring it, you need to explicitly specify which ones you want to extract. The most efficient way for extracting subcollections is by using Collection Groups, which allows retrieving subcollections from different parent documents in a single query. Imagine the following structure: When using Collection Groups, the connector would retrieve all reviews from all restaurants in a single query, significantly speeding up the extraction process, especially for large collections with nested subcollections.

Incremental vs Full Sync

Incremental sync is allowed for both collections and subcollections when using Collection group queries.
Mixed-Type Replication Keys WarningFirestore queries are strictly type-sensitive. If you are using incremental sync with a date-time field, ensure that the replication key is consistently stored as a native Firestore Timestamp across all documents.If your collection contains mixed types for the replication key (e.g., some documents store the date as a native Timestamp and others as a string like "2024-01-01T00:00:00Z"), a single incremental query will silently skip the documents that do not match the parsed type. Ensure data type consistency across your collections to prevent missing data during incremental runs.

Best Practices

  • Configure indexes for Collection Groups: Set up indexes for using Collection group queries for subcollections whenever possible. This significantly improves extraction time, saving time and resources.
    Step-by-step to configure a collection group index on Firestore:
    1. In your Firestore console, click on the Indexes tab.
    2. Select the Single field tab.
    3. On the Exemptions section click on Add exemption.
    4. On the Collection ID field, enter the name of the subcollection.
    5. On the Field path field, enter the name of the attribute you want to use for the incremental index (generally a timestamp or date field).
    6. In Query scope mark the Collection group checkbox.
    7. Click Save.
    It takes a while for the changes to propagate, but once it’s done you’re good to go.
  • Consistent incremental keys: If configuring your streams as incremental, make sure to include a date-time field that indicates when the document was last updated. This is necessary to ensure data integrity and consistency.
  • Explicit filtering: Be explicit when defining filters for collections and subcollections to avoid extracting data that won’t be useful for you. This helps reduce costs from both the cloud resources needed to perform the data extraction and Firestore itself. You can always add more streams later if needed.

Skills for agents

Download Firestore skills file

Firestore connector documentation as plain markdown, for use in AI agent contexts.