> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nekt.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Portal da Transparência (CNPJs) as a data source

> Bring data from Portal da Transparência (CNPJs) to Nekt.

Portal da Transparência CNPJs provides access to Brazil's official company registration data (CNPJ - Cadastro Nacional da Pessoa Jurídica) from the Receita Federal (Brazilian Federal Revenue Service). This connector extracts public business data including company information, establishments, partners, tax regime status, and reference tables for standardized codes.

## Configuring Portal da Transparência (CNPJs) as a Source

In the [Sources](https://app.nekt.ai/sources) tab, click on the "Add source" button located on the top right of your screen. Then, select the Portal da Transparência (CNPJs) option from the list of connectors.

Click **Next** and you'll be prompted to configure the extraction.

### 1. Configure extraction settings

We currently only support extracting files from the current month, if available, otherwise we take the files from the previous month. Extracting files from older months is no longer supported due to the size of the files and the performance of the extraction.

<Info>
  The Receita Federal publishes updated CNPJ data monthly. Each monthly release contains the complete snapshot of all registered companies in Brazil.
</Info>

<Warning>
  **Large Data Volume:** This data source contains millions of records. The Empresas and Estabelecimentos streams alone contain over 50 million records each. Plan for adequate resources on the Settings tab of your source. The recommended configuration is:

  * Python execution:
    * vCPU: 2 cores
    * Memory: 10GB
  * Spark Driver:
    * vCPU: 2 cores
    * Memory: 10GB
  * Spark Executor:
    * vCPU: 2 cores
    * Memory: 10GB
    * Instances: 3
  * Spark Executor Disk:
    * 150GB
</Warning>

Once you're done, click **Next**.

### 2. Select streams

Choose which data streams you want to sync. For faster extractions, select only the streams that are relevant to your analysis. You can select entire groups of streams or pick specific ones.

> Tip: The stream can be found more easily by typing its name.

<Warning>
  **Large Data Volume:** This data source contains millions of records. The Empresas and Estabelecimentos streams alone contain over 50 million records each. Plan for adequate storage and consider selecting only the streams you need for your analysis.
</Warning>

Select the streams and click **Next**.

### 3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.

* **Layer**: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
* **Folder**: a folder can be created inside the selected layer to group all tables being created from this new data source.
* **Table name**: we suggest a name, but feel free to customize it. You have the option to add a **prefix** to all tables at once and make this process faster!
* **Sync Type**: you can choose between INCREMENTAL and FULL\_TABLE.
  * Incremental: every time the extraction happens, we'll get only the new data (new monthly releases) - which is good for tracking historical changes.
  * Full table: every time the extraction happens, we'll get the current state of the data - which is good if you only need the latest company information.

Once you are done configuring, click **Next**.

### 4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters.

To define your [Trigger](https://docs.nekt.com/get-started/core-concepts/triggers), consider how often you want data to be extracted from this source. Since Receita Federal updates the data monthly, a monthly extraction is typically sufficient.

Optionally, you can define some additional settings:

* Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource [here](https://docs.nekt.com/get-started/core-concepts/resource-control).
* Determine when to execute an **Additional [Full Sync](https://docs.nekt.com/get-started/core-concepts/types-of-sync#additional-full-sync)**. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.

Once you are ready, click **Next** to finalize the setup.

### 5. Check your new source

You can view your new source on the [Sources](https://app.nekt.ai/sources) page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.

<Warning>For you to be able to see it on your [Catalog](https://app.nekt.ai/catalog), you need at least one successful source run.</Warning>

# Streams and Fields

Below you'll find all available data streams from Portal da Transparência CNPJs and their corresponding fields:

<AccordionGroup>
  <Accordion title="Empresas">
    Core company data containing the basic registration information for each CNPJ root (first 8 digits).

    **Key Fields:**

    * `cnpj_basico` - CNPJ root (first 8 digits) - identifies the company
    * `razao_social` - Legal name of the company
    * `natureza_juridica` - Legal nature code (join with Natureza Jurídica stream)
    * `qualificacao_responsavel` - Qualification code of the responsible person
    * `capital_social` - Registered capital (in BRL, as string)
    * `porte` - Company size: 00 (Not informed), 01 (Micro), 03 (Small), 05 (Other)
    * `ente_federativo` - Federative entity (for public entities)

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>

  <Accordion title="Estabelecimentos">
    Establishment/branch data containing detailed information about each business location, including full address and contact details.

    **Identification:**

    * `cnpj_basico` - CNPJ root (first 8 digits)
    * `cnpj_ordem` - Branch number (4 digits)
    * `cnpj_dv` - CNPJ check digits (2 digits)
    * `identificador_matriz_filial` - 1 = Headquarters, 2 = Branch
    * `nome_fantasia` - Trade name

    **Status:**

    * `situacao_cadastral` - Registration status: 01 (Null), 02 (Active), 03 (Suspended), 04 (Unsuitable), 08 (Closed)
    * `data_situacao_cadastral` - Date of status change (YYYYMMDD)
    * `motivo_situacao_cadastral` - Reason code for status
    * `situacao_especial` - Special situation description
    * `data_situacao_especial` - Date of special situation

    **Activity:**

    * `data_inicio_atividade` - Activity start date (YYYYMMDD)
    * `cnae_fiscal_principal` - Primary CNAE code (join with CNAEs stream)
    * `cnae_fiscal_secundaria` - Secondary CNAE codes (comma-separated)

    **Address:**

    * `tipo_logradouro` - Street type (Rua, Avenida, etc.)
    * `logradouro` - Street name
    * `numero` - Street number
    * `complemento` - Address complement
    * `bairro` - Neighborhood
    * `cep` - Postal code (8 digits)
    * `sigla_uf` - State abbreviation (2 letters)
    * `id_municipio` - Municipality code (join with Municípios stream)
    * `nome_cidade_exterior` - City name (for foreign addresses)
    * `id_pais` - Country code (join with Países stream)

    **Contact:**

    * `ddd_1` - Area code (phone 1)
    * `telefone_1` - Phone number 1
    * `ddd_2` - Area code (phone 2)
    * `telefone_2` - Phone number 2
    * `ddd_fax` - Area code (fax)
    * `fax` - Fax number
    * `email` - Email address

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>

  <Accordion title="Socios">
    Partners/shareholders data containing information about company ownership and legal representatives.

    **Identification:**

    * `cnpj_basico` - CNPJ root of the company
    * `tipo` - Partner type: 1 (Individual), 2 (Legal Entity), 3 (Foreign)
    * `nome` - Partner name
    * `documento` - Partner document (CPF/CNPJ, partially masked)
    * `qualificacao` - Partner qualification code (join with Qualificações stream)
    * `data_de_entrada` - Entry date as partner (YYYYMMDD)

    **Foreign Partners:**

    * `id_pais` - Country code for foreign partners (join with Países stream)

    **Legal Representative:**

    * `cpf_representante_legal` - CPF of legal representative (partially masked)
    * `nome_representante_legal` - Name of legal representative
    * `qualificacao_representante_legal` - Qualification code of legal representative

    **Demographics:**

    * `faixa_etaria` - Age range code (for individuals)

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>

  <Accordion title="Simples">
    Simples Nacional and MEI (Microempreendedor Individual) tax regime enrollment data.

    **Key Fields:**

    * `cnpj_basico` - CNPJ root of the company

    **Simples Nacional:**

    * `opcao_simples` - Simples Nacional status: S (Enrolled), N (Not enrolled), empty (Other)
    * `data_opcao_simples` - Enrollment date (YYYYMMDD)
    * `data_exclusao_simples` - Exclusion date (YYYYMMDD)

    **MEI (Microempreendedor Individual):**

    * `opcao_mei` - MEI status: S (Enrolled), N (Not enrolled), empty (Other)
    * `data_opcao_mei` - Enrollment date (YYYYMMDD)
    * `data_exclusao_mei` - Exclusion date (YYYYMMDD)

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>

  <Accordion title="CNAEs">
    Reference table for CNAE (Classificação Nacional de Atividades Econômicas) - Brazilian business activity classification codes.

    **Key Fields:**

    * `codigo` - CNAE code (7 digits)
    * `nome` - Activity description

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>

  <Accordion title="Municípios">
    Reference table for Brazilian municipality codes.

    **Key Fields:**

    * `codigo` - Municipality code (IBGE code)
    * `nome` - Municipality name

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>

  <Accordion title="Natureza Jurídica">
    Reference table for legal nature codes (company types).

    **Key Fields:**

    * `codigo` - Legal nature code
    * `nome` - Legal nature description (e.g., "Sociedade Limitada", "Empresário Individual")

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>

  <Accordion title="Países">
    Reference table for country codes.

    **Key Fields:**

    * `codigo` - Country code
    * `nome` - Country name

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>

  <Accordion title="Qualificações">
    Reference table for partner/shareholder qualification codes.

    **Key Fields:**

    * `codigo` - Qualification code
    * `nome` - Qualification description (e.g., "Sócio-Administrador", "Diretor")

    **Metadata:**

    * `year_month` - Reference month of the data (YYYY-MM format, replication key)
  </Accordion>
</AccordionGroup>

# Data Model

The following diagram illustrates the relationships between the data streams. The arrows indicate the join keys that link the different entities.

```mermaid theme={null}
graph TD;
    subgraph "Core Data"
        Empresas("Empresas");
        Estabelecimentos("Estabelecimentos");
        Socios("Sócios");
        Simples("Simples");
    end

    subgraph "Reference Tables"
        CNAEs("CNAEs");
        Municipios("Municípios");
        NaturezaJuridica("Natureza Jurídica");
        Paises("Países");
        Qualificacoes("Qualificações");
    end

    Empresas -- "cnpj_basico" --> Estabelecimentos;
    Empresas -- "cnpj_basico" --> Socios;
    Empresas -- "cnpj_basico" --> Simples;
    Empresas -- "natureza_juridica" --> NaturezaJuridica;
    Estabelecimentos -- "cnae_fiscal_principal" --> CNAEs;
    Estabelecimentos -- "id_municipio" --> Municipios;
    Estabelecimentos -- "id_pais" --> Paises;
    Socios -- "qualificacao" --> Qualificacoes;
    Socios -- "id_pais" --> Paises;
```

# Use Cases for Data Analysis

This guide outlines valuable business intelligence use cases when consolidating Portal da Transparência CNPJ data, along with ready-to-use SQL queries that you can run on [Explorer](https://app.nekt.ai/explorer).

### 1. Company Profile Lookup

Get complete company information by combining core data with establishment details.

**Business Value:**

* Due diligence for business partnerships
* Customer/supplier validation
* Market research and prospecting

<Accordion title="SQL query">
  <Tabs>
    <Tab title="AWS">
      ```sql theme={null}
      SELECT
         e.cnpj_basico,
         CONCAT(e.cnpj_basico, est.cnpj_ordem, est.cnpj_dv) AS cnpj_completo,
         e.razao_social,
         est.nome_fantasia,
         nj.nome AS natureza_juridica,
         e.capital_social,
         CASE e.porte
            WHEN '01' THEN 'Microempresa'
            WHEN '03' THEN 'Pequena Empresa'
            WHEN '05' THEN 'Demais'
            ELSE 'Não Informado'
         END AS porte,
         CASE est.situacao_cadastral
            WHEN '02' THEN 'Ativa'
            WHEN '03' THEN 'Suspensa'
            WHEN '04' THEN 'Inapta'
            WHEN '08' THEN 'Baixada'
            ELSE 'Outro'
         END AS situacao,
         est.data_inicio_atividade,
         cnae.nome AS atividade_principal,
         CONCAT(est.logradouro, ', ', est.numero, ' - ', est.bairro, ', ', est.sigla_uf) AS endereco,
         est.email
      FROM
         nekt_raw.portal_transparencia_empresas e
         JOIN nekt_raw.portal_transparencia_estabelecimentos est 
            ON e.cnpj_basico = est.cnpj_basico
         LEFT JOIN nekt_raw.portal_transparencia_natureza_juridica nj 
            ON e.natureza_juridica = nj.codigo
         LEFT JOIN nekt_raw.portal_transparencia_cnaes cnae 
            ON est.cnae_fiscal_principal = cnae.codigo
      WHERE
         e.cnpj_basico = '12345678'  -- Replace with target CNPJ root
         AND est.identificador_matriz_filial = '1'  -- Headquarters only
      ```
    </Tab>

    <Tab title="GCP">
      ```sql theme={null}
      SELECT
         e.cnpj_basico,
         CONCAT(e.cnpj_basico, est.cnpj_ordem, est.cnpj_dv) AS cnpj_completo,
         e.razao_social,
         est.nome_fantasia,
         nj.nome AS natureza_juridica,
         e.capital_social,
         CASE e.porte
            WHEN '01' THEN 'Microempresa'
            WHEN '03' THEN 'Pequena Empresa'
            WHEN '05' THEN 'Demais'
            ELSE 'Não Informado'
         END AS porte,
         CASE est.situacao_cadastral
            WHEN '02' THEN 'Ativa'
            WHEN '03' THEN 'Suspensa'
            WHEN '04' THEN 'Inapta'
            WHEN '08' THEN 'Baixada'
            ELSE 'Outro'
         END AS situacao,
         est.data_inicio_atividade,
         cnae.nome AS atividade_principal,
         CONCAT(est.logradouro, ', ', est.numero, ' - ', est.bairro, ', ', est.sigla_uf) AS endereco,
         est.email
      FROM
         `nekt_raw.portal_transparencia_empresas` e
         JOIN `nekt_raw.portal_transparencia_estabelecimentos` est 
            ON e.cnpj_basico = est.cnpj_basico
         LEFT JOIN `nekt_raw.portal_transparencia_natureza_juridica` nj 
            ON e.natureza_juridica = nj.codigo
         LEFT JOIN `nekt_raw.portal_transparencia_cnaes` cnae 
            ON est.cnae_fiscal_principal = cnae.codigo
      WHERE
         e.cnpj_basico = '12345678'  -- Replace with target CNPJ root
         AND est.identificador_matriz_filial = '1'  -- Headquarters only
      ```
    </Tab>
  </Tabs>
</Accordion>

### 2. Active Companies by State and Activity

Analyze the distribution of active companies by state and business activity.

**Business Value:**

* Market sizing and opportunity analysis
* Regional business intelligence
* Industry trend analysis

<Accordion title="SQL query">
  <Tabs>
    <Tab title="AWS">
      ```sql theme={null}
      SELECT
         est.sigla_uf AS estado,
         cnae.nome AS atividade,
         COUNT(DISTINCT est.cnpj_basico) AS total_empresas,
         SUM(CASE WHEN e.porte = '01' THEN 1 ELSE 0 END) AS microempresas,
         SUM(CASE WHEN e.porte = '03' THEN 1 ELSE 0 END) AS pequenas_empresas
      FROM
         nekt_raw.portal_transparencia_estabelecimentos est
         JOIN nekt_raw.portal_transparencia_empresas e 
            ON est.cnpj_basico = e.cnpj_basico
         LEFT JOIN nekt_raw.portal_transparencia_cnaes cnae 
            ON est.cnae_fiscal_principal = cnae.codigo
      WHERE
         est.situacao_cadastral = '02'  -- Active only
         AND est.identificador_matriz_filial = '1'  -- Headquarters only
      GROUP BY
         est.sigla_uf,
         cnae.nome
      ORDER BY
         total_empresas DESC
      LIMIT 50
      ```
    </Tab>

    <Tab title="GCP">
      ```sql theme={null}
      SELECT
         est.sigla_uf AS estado,
         cnae.nome AS atividade,
         COUNT(DISTINCT est.cnpj_basico) AS total_empresas,
         SUM(CASE WHEN e.porte = '01' THEN 1 ELSE 0 END) AS microempresas,
         SUM(CASE WHEN e.porte = '03' THEN 1 ELSE 0 END) AS pequenas_empresas
      FROM
         `nekt_raw.portal_transparencia_estabelecimentos` est
         JOIN `nekt_raw.portal_transparencia_empresas` e 
            ON est.cnpj_basico = e.cnpj_basico
         LEFT JOIN `nekt_raw.portal_transparencia_cnaes` cnae 
            ON est.cnae_fiscal_principal = cnae.codigo
      WHERE
         est.situacao_cadastral = '02'  -- Active only
         AND est.identificador_matriz_filial = '1'  -- Headquarters only
      GROUP BY
         est.sigla_uf,
         cnae.nome
      ORDER BY
         total_empresas DESC
      LIMIT 50
      ```
    </Tab>
  </Tabs>
</Accordion>

### 3. Simples Nacional and MEI Analysis

Identify companies enrolled in simplified tax regimes.

**Business Value:**

* Tax regime analysis for business development
* Understanding SME market composition
* Compliance verification

<Accordion title="SQL query">
  <Tabs>
    <Tab title="AWS">
      ```sql theme={null}
      SELECT
         est.sigla_uf AS estado,
         COUNT(DISTINCT e.cnpj_basico) AS total_empresas,
         SUM(CASE WHEN s.opcao_simples = 'S' THEN 1 ELSE 0 END) AS empresas_simples,
         SUM(CASE WHEN s.opcao_mei = 'S' THEN 1 ELSE 0 END) AS empresas_mei,
         ROUND(SUM(CASE WHEN s.opcao_simples = 'S' THEN 1 ELSE 0 END) * 100.0 / 
               NULLIF(COUNT(DISTINCT e.cnpj_basico), 0), 2) AS pct_simples,
         ROUND(SUM(CASE WHEN s.opcao_mei = 'S' THEN 1 ELSE 0 END) * 100.0 / 
               NULLIF(COUNT(DISTINCT e.cnpj_basico), 0), 2) AS pct_mei
      FROM
         nekt_raw.portal_transparencia_empresas e
         JOIN nekt_raw.portal_transparencia_estabelecimentos est 
            ON e.cnpj_basico = est.cnpj_basico
         LEFT JOIN nekt_raw.portal_transparencia_simples s 
            ON e.cnpj_basico = s.cnpj_basico
      WHERE
         est.situacao_cadastral = '02'
         AND est.identificador_matriz_filial = '1'
      GROUP BY
         est.sigla_uf
      ORDER BY
         total_empresas DESC
      ```
    </Tab>

    <Tab title="GCP">
      ```sql theme={null}
      SELECT
         est.sigla_uf AS estado,
         COUNT(DISTINCT e.cnpj_basico) AS total_empresas,
         SUM(CASE WHEN s.opcao_simples = 'S' THEN 1 ELSE 0 END) AS empresas_simples,
         SUM(CASE WHEN s.opcao_mei = 'S' THEN 1 ELSE 0 END) AS empresas_mei,
         ROUND(SAFE_DIVIDE(SUM(CASE WHEN s.opcao_simples = 'S' THEN 1 ELSE 0 END) * 100.0, 
               COUNT(DISTINCT e.cnpj_basico)), 2) AS pct_simples,
         ROUND(SAFE_DIVIDE(SUM(CASE WHEN s.opcao_mei = 'S' THEN 1 ELSE 0 END) * 100.0, 
               COUNT(DISTINCT e.cnpj_basico)), 2) AS pct_mei
      FROM
         `nekt_raw.portal_transparencia_empresas` e
         JOIN `nekt_raw.portal_transparencia_estabelecimentos` est 
            ON e.cnpj_basico = est.cnpj_basico
         LEFT JOIN `nekt_raw.portal_transparencia_simples` s 
            ON e.cnpj_basico = s.cnpj_basico
      WHERE
         est.situacao_cadastral = '02'
         AND est.identificador_matriz_filial = '1'
      GROUP BY
         est.sigla_uf
      ORDER BY
         total_empresas DESC
      ```
    </Tab>
  </Tabs>
</Accordion>

## Implementation Notes

<AccordionGroup>
  <Accordion title="Data Volume and Performance">
    This data source contains extremely large datasets:

    * **Empresas**: \~55+ million records
    * **Estabelecimentos**: \~60+ million records
    * **Sócios**: \~25+ million records

    **Recommendations:**

    * Schedule extractions during off-peak hours
    * Consider extracting only the streams you need
    * Initial extraction can take several hours depending on network speed
  </Accordion>

  <Accordion title="CNPJ Structure">
    The Brazilian CNPJ has 14 digits with the following structure:

    * `cnpj_basico` (8 digits): Identifies the company
    * `cnpj_ordem` (4 digits): Identifies the branch (0001 = headquarters)
    * `cnpj_dv` (2 digits): Check digits

    To reconstruct the full CNPJ:

    ```sql theme={null}
    CONCAT(cnpj_basico, cnpj_ordem, cnpj_dv) AS cnpj_completo
    ```
  </Accordion>

  <Accordion title="Date Formats">
    Dates in this dataset are stored as strings in `YYYYMMDD` format. To convert:

    ```sql theme={null}
    -- AWS Athena
    DATE_PARSE(data_inicio_atividade, '%Y%m%d')

    -- GCP BigQuery
    PARSE_DATE('%Y%m%d', data_inicio_atividade)
    ```
  </Accordion>

  <Accordion title="Reference Table Joins">
    The main streams contain codes that should be joined with reference tables for human-readable values:

    * `natureza_juridica` → **Natureza Jurídica** stream
    * `cnae_fiscal_principal` → **CNAEs** stream
    * `id_municipio` → **Municípios** stream
    * `id_pais` → **Países** stream
    * `qualificacao` → **Qualificações** stream
  </Accordion>

  <Accordion title="Incremental Sync Strategy">
    The connector uses `year_month` as the replication key. Each month, Receita Federal publishes a complete snapshot of all CNPJs. When using incremental sync:

    * New monthly data is appended to existing data
    * You can track historical changes by comparing records across different `year_month` values
    * For the latest company state, filter by the most recent `year_month`
  </Accordion>
</AccordionGroup>

## Skills for agents

<Snippet file="agent-skills-intro.mdx" />

<Card title="Download Portal da Transparência (CNPJs) skills file" icon="wand-magic-sparkles" href="/sources/portal-da-transparencia-cnpjs.md">
  Portal da Transparência (CNPJs) connector documentation as plain markdown, for use in AI agent contexts.
</Card>
