Skip to main content
Portal da Transparência CNPJs provides access to Brazil’s official company registration data (CNPJ - Cadastro Nacional da Pessoa Jurídica) from the Receita Federal (Brazilian Federal Revenue Service). This connector extracts public business data including company information, establishments, partners, tax regime status, and reference tables for standardized codes.

Configuring Portal da Transparência (CNPJs) as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the Portal da Transparência (CNPJs) option from the list of connectors. Click Next and you’ll be prompted to configure the extraction.

1. Configure extraction settings

We currently only support extracting files from the current month, if available, otherwise we take the files from the previous month. Extracting files from older months is no longer supported due to the size of the files and the performance of the extraction.
The Receita Federal publishes updated CNPJ data monthly. Each monthly release contains the complete snapshot of all registered companies in Brazil.
Large Data Volume: This data source contains millions of records. The Empresas and Estabelecimentos streams alone contain over 50 million records each. Plan for adequate resources on the Settings tab of your source. The recommended configuration is:
  • Python execution:
    • vCPU: 2 cores
    • Memory: 10GB
  • Spark Driver:
    • vCPU: 2 cores
    • Memory: 10GB
  • Spark Executor:
    • vCPU: 2 cores
    • Memory: 10GB
    • Instances: 3
  • Spark Executor Disk:
    • 150GB
Once you’re done, click Next.

2. Select streams

Choose which data streams you want to sync. For faster extractions, select only the streams that are relevant to your analysis. You can select entire groups of streams or pick specific ones.
Tip: The stream can be found more easily by typing its name.
Large Data Volume: This data source contains millions of records. The Empresas and Estabelecimentos streams alone contain over 50 million records each. Plan for adequate storage and consider selecting only the streams you need for your analysis.
Select the streams and click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.
  • Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
  • Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
  • Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
  • Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
    • Incremental: every time the extraction happens, we’ll get only the new data (new monthly releases) - which is good for tracking historical changes.
    • Full table: every time the extraction happens, we’ll get the current state of the data - which is good if you only need the latest company information.
Once you are done configuring, click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. Since Receita Federal updates the data monthly, a monthly extraction is typically sufficient. Optionally, you can define some additional settings:
  • Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource here.
  • Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.
Once you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.
For you to be able to see it on your Catalog, you need at least one successful source run.

Streams and Fields

Below you’ll find all available data streams from Portal da Transparência CNPJs and their corresponding fields:
Core company data containing the basic registration information for each CNPJ root (first 8 digits).Key Fields:
  • cnpj_basico - CNPJ root (first 8 digits) - identifies the company
  • razao_social - Legal name of the company
  • natureza_juridica - Legal nature code (join with Natureza Jurídica stream)
  • qualificacao_responsavel - Qualification code of the responsible person
  • capital_social - Registered capital (in BRL, as string)
  • porte - Company size: 00 (Not informed), 01 (Micro), 03 (Small), 05 (Other)
  • ente_federativo - Federative entity (for public entities)
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)
Establishment/branch data containing detailed information about each business location, including full address and contact details.Identification:
  • cnpj_basico - CNPJ root (first 8 digits)
  • cnpj_ordem - Branch number (4 digits)
  • cnpj_dv - CNPJ check digits (2 digits)
  • identificador_matriz_filial - 1 = Headquarters, 2 = Branch
  • nome_fantasia - Trade name
Status:
  • situacao_cadastral - Registration status: 01 (Null), 02 (Active), 03 (Suspended), 04 (Unsuitable), 08 (Closed)
  • data_situacao_cadastral - Date of status change (YYYYMMDD)
  • motivo_situacao_cadastral - Reason code for status
  • situacao_especial - Special situation description
  • data_situacao_especial - Date of special situation
Activity:
  • data_inicio_atividade - Activity start date (YYYYMMDD)
  • cnae_fiscal_principal - Primary CNAE code (join with CNAEs stream)
  • cnae_fiscal_secundaria - Secondary CNAE codes (comma-separated)
Address:
  • tipo_logradouro - Street type (Rua, Avenida, etc.)
  • logradouro - Street name
  • numero - Street number
  • complemento - Address complement
  • bairro - Neighborhood
  • cep - Postal code (8 digits)
  • sigla_uf - State abbreviation (2 letters)
  • id_municipio - Municipality code (join with Municípios stream)
  • nome_cidade_exterior - City name (for foreign addresses)
  • id_pais - Country code (join with Países stream)
Contact:
  • ddd_1 - Area code (phone 1)
  • telefone_1 - Phone number 1
  • ddd_2 - Area code (phone 2)
  • telefone_2 - Phone number 2
  • ddd_fax - Area code (fax)
  • fax - Fax number
  • email - Email address
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)
Partners/shareholders data containing information about company ownership and legal representatives.Identification:
  • cnpj_basico - CNPJ root of the company
  • tipo - Partner type: 1 (Individual), 2 (Legal Entity), 3 (Foreign)
  • nome - Partner name
  • documento - Partner document (CPF/CNPJ, partially masked)
  • qualificacao - Partner qualification code (join with Qualificações stream)
  • data_de_entrada - Entry date as partner (YYYYMMDD)
Foreign Partners:
  • id_pais - Country code for foreign partners (join with Países stream)
Legal Representative:
  • cpf_representante_legal - CPF of legal representative (partially masked)
  • nome_representante_legal - Name of legal representative
  • qualificacao_representante_legal - Qualification code of legal representative
Demographics:
  • faixa_etaria - Age range code (for individuals)
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)
Simples Nacional and MEI (Microempreendedor Individual) tax regime enrollment data.Key Fields:
  • cnpj_basico - CNPJ root of the company
Simples Nacional:
  • opcao_simples - Simples Nacional status: S (Enrolled), N (Not enrolled), empty (Other)
  • data_opcao_simples - Enrollment date (YYYYMMDD)
  • data_exclusao_simples - Exclusion date (YYYYMMDD)
MEI (Microempreendedor Individual):
  • opcao_mei - MEI status: S (Enrolled), N (Not enrolled), empty (Other)
  • data_opcao_mei - Enrollment date (YYYYMMDD)
  • data_exclusao_mei - Exclusion date (YYYYMMDD)
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)
Reference table for CNAE (Classificação Nacional de Atividades Econômicas) - Brazilian business activity classification codes.Key Fields:
  • codigo - CNAE code (7 digits)
  • nome - Activity description
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)
Reference table for Brazilian municipality codes.Key Fields:
  • codigo - Municipality code (IBGE code)
  • nome - Municipality name
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)
Reference table for legal nature codes (company types).Key Fields:
  • codigo - Legal nature code
  • nome - Legal nature description (e.g., “Sociedade Limitada”, “Empresário Individual”)
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)
Reference table for country codes.Key Fields:
  • codigo - Country code
  • nome - Country name
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)
Reference table for partner/shareholder qualification codes.Key Fields:
  • codigo - Qualification code
  • nome - Qualification description (e.g., “Sócio-Administrador”, “Diretor”)
Metadata:
  • year_month - Reference month of the data (YYYY-MM format, replication key)

Data Model

The following diagram illustrates the relationships between the data streams. The arrows indicate the join keys that link the different entities.

Use Cases for Data Analysis

This guide outlines valuable business intelligence use cases when consolidating Portal da Transparência CNPJ data, along with ready-to-use SQL queries that you can run on Explorer.

1. Company Profile Lookup

Get complete company information by combining core data with establishment details. Business Value:
  • Due diligence for business partnerships
  • Customer/supplier validation
  • Market research and prospecting
SELECT
   e.cnpj_basico,
   CONCAT(e.cnpj_basico, est.cnpj_ordem, est.cnpj_dv) AS cnpj_completo,
   e.razao_social,
   est.nome_fantasia,
   nj.nome AS natureza_juridica,
   e.capital_social,
   CASE e.porte
      WHEN '01' THEN 'Microempresa'
      WHEN '03' THEN 'Pequena Empresa'
      WHEN '05' THEN 'Demais'
      ELSE 'Não Informado'
   END AS porte,
   CASE est.situacao_cadastral
      WHEN '02' THEN 'Ativa'
      WHEN '03' THEN 'Suspensa'
      WHEN '04' THEN 'Inapta'
      WHEN '08' THEN 'Baixada'
      ELSE 'Outro'
   END AS situacao,
   est.data_inicio_atividade,
   cnae.nome AS atividade_principal,
   CONCAT(est.logradouro, ', ', est.numero, ' - ', est.bairro, ', ', est.sigla_uf) AS endereco,
   est.email
FROM
   nekt_raw.portal_transparencia_empresas e
   JOIN nekt_raw.portal_transparencia_estabelecimentos est 
      ON e.cnpj_basico = est.cnpj_basico
   LEFT JOIN nekt_raw.portal_transparencia_natureza_juridica nj 
      ON e.natureza_juridica = nj.codigo
   LEFT JOIN nekt_raw.portal_transparencia_cnaes cnae 
      ON est.cnae_fiscal_principal = cnae.codigo
WHERE
   e.cnpj_basico = '12345678'  -- Replace with target CNPJ root
   AND est.identificador_matriz_filial = '1'  -- Headquarters only

2. Active Companies by State and Activity

Analyze the distribution of active companies by state and business activity. Business Value:
  • Market sizing and opportunity analysis
  • Regional business intelligence
  • Industry trend analysis
SELECT
   est.sigla_uf AS estado,
   cnae.nome AS atividade,
   COUNT(DISTINCT est.cnpj_basico) AS total_empresas,
   SUM(CASE WHEN e.porte = '01' THEN 1 ELSE 0 END) AS microempresas,
   SUM(CASE WHEN e.porte = '03' THEN 1 ELSE 0 END) AS pequenas_empresas
FROM
   nekt_raw.portal_transparencia_estabelecimentos est
   JOIN nekt_raw.portal_transparencia_empresas e 
      ON est.cnpj_basico = e.cnpj_basico
   LEFT JOIN nekt_raw.portal_transparencia_cnaes cnae 
      ON est.cnae_fiscal_principal = cnae.codigo
WHERE
   est.situacao_cadastral = '02'  -- Active only
   AND est.identificador_matriz_filial = '1'  -- Headquarters only
GROUP BY
   est.sigla_uf,
   cnae.nome
ORDER BY
   total_empresas DESC
LIMIT 50

3. Simples Nacional and MEI Analysis

Identify companies enrolled in simplified tax regimes. Business Value:
  • Tax regime analysis for business development
  • Understanding SME market composition
  • Compliance verification
SELECT
   est.sigla_uf AS estado,
   COUNT(DISTINCT e.cnpj_basico) AS total_empresas,
   SUM(CASE WHEN s.opcao_simples = 'S' THEN 1 ELSE 0 END) AS empresas_simples,
   SUM(CASE WHEN s.opcao_mei = 'S' THEN 1 ELSE 0 END) AS empresas_mei,
   ROUND(SUM(CASE WHEN s.opcao_simples = 'S' THEN 1 ELSE 0 END) * 100.0 / 
         NULLIF(COUNT(DISTINCT e.cnpj_basico), 0), 2) AS pct_simples,
   ROUND(SUM(CASE WHEN s.opcao_mei = 'S' THEN 1 ELSE 0 END) * 100.0 / 
         NULLIF(COUNT(DISTINCT e.cnpj_basico), 0), 2) AS pct_mei
FROM
   nekt_raw.portal_transparencia_empresas e
   JOIN nekt_raw.portal_transparencia_estabelecimentos est 
      ON e.cnpj_basico = est.cnpj_basico
   LEFT JOIN nekt_raw.portal_transparencia_simples s 
      ON e.cnpj_basico = s.cnpj_basico
WHERE
   est.situacao_cadastral = '02'
   AND est.identificador_matriz_filial = '1'
GROUP BY
   est.sigla_uf
ORDER BY
   total_empresas DESC

Implementation Notes

This data source contains extremely large datasets:
  • Empresas: ~55+ million records
  • Estabelecimentos: ~60+ million records
  • Sócios: ~25+ million records
Recommendations:
  • Schedule extractions during off-peak hours
  • Consider extracting only the streams you need
  • Initial extraction can take several hours depending on network speed
The Brazilian CNPJ has 14 digits with the following structure:
  • cnpj_basico (8 digits): Identifies the company
  • cnpj_ordem (4 digits): Identifies the branch (0001 = headquarters)
  • cnpj_dv (2 digits): Check digits
To reconstruct the full CNPJ:
CONCAT(cnpj_basico, cnpj_ordem, cnpj_dv) AS cnpj_completo
Dates in this dataset are stored as strings in YYYYMMDD format. To convert:
-- AWS Athena
DATE_PARSE(data_inicio_atividade, '%Y%m%d')

-- GCP BigQuery
PARSE_DATE('%Y%m%d', data_inicio_atividade)
The main streams contain codes that should be joined with reference tables for human-readable values:
  • natureza_juridicaNatureza Jurídica stream
  • cnae_fiscal_principalCNAEs stream
  • id_municipioMunicípios stream
  • id_paisPaíses stream
  • qualificacaoQualificações stream
The connector uses year_month as the replication key. Each month, Receita Federal publishes a complete snapshot of all CNPJs. When using incremental sync:
  • New monthly data is appended to existing data
  • You can track historical changes by comparing records across different year_month values
  • For the latest company state, filter by the most recent year_month