Configuring Portal da Transparência (CNPJs) as a Source
In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the Portal da Transparência (CNPJs) option from the list of connectors. Click Next and you’ll be prompted to configure the extraction.1. Configure extraction settings
We currently only support extracting files from the current month, if available, otherwise we take the files from the previous month. Extracting files from older months is no longer supported due to the size of the files and the performance of the extraction.The Receita Federal publishes updated CNPJ data monthly. Each monthly release contains the complete snapshot of all registered companies in Brazil.
2. Select streams
Choose which data streams you want to sync. For faster extractions, select only the streams that are relevant to your analysis. You can select entire groups of streams or pick specific ones.Tip: The stream can be found more easily by typing its name.Select the streams and click Next.
3. Configure data streams
Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.- Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
- Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
- Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
- Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
- Incremental: every time the extraction happens, we’ll get only the new data (new monthly releases) - which is good for tracking historical changes.
- Full table: every time the extraction happens, we’ll get the current state of the data - which is good if you only need the latest company information.
4. Configure data source
Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. Since Receita Federal updates the data monthly, a monthly extraction is typically sufficient. Optionally, you can define some additional settings:- Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource here.
- Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.
5. Check your new source
You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.Streams and Fields
Below you’ll find all available data streams from Portal da Transparência CNPJs and their corresponding fields:Empresas
Empresas
Core company data containing the basic registration information for each CNPJ root (first 8 digits).Key Fields:
cnpj_basico- CNPJ root (first 8 digits) - identifies the companyrazao_social- Legal name of the companynatureza_juridica- Legal nature code (join with Natureza Jurídica stream)qualificacao_responsavel- Qualification code of the responsible personcapital_social- Registered capital (in BRL, as string)porte- Company size: 00 (Not informed), 01 (Micro), 03 (Small), 05 (Other)ente_federativo- Federative entity (for public entities)
year_month- Reference month of the data (YYYY-MM format, replication key)
Estabelecimentos
Estabelecimentos
Establishment/branch data containing detailed information about each business location, including full address and contact details.Identification:
cnpj_basico- CNPJ root (first 8 digits)cnpj_ordem- Branch number (4 digits)cnpj_dv- CNPJ check digits (2 digits)identificador_matriz_filial- 1 = Headquarters, 2 = Branchnome_fantasia- Trade name
situacao_cadastral- Registration status: 01 (Null), 02 (Active), 03 (Suspended), 04 (Unsuitable), 08 (Closed)data_situacao_cadastral- Date of status change (YYYYMMDD)motivo_situacao_cadastral- Reason code for statussituacao_especial- Special situation descriptiondata_situacao_especial- Date of special situation
data_inicio_atividade- Activity start date (YYYYMMDD)cnae_fiscal_principal- Primary CNAE code (join with CNAEs stream)cnae_fiscal_secundaria- Secondary CNAE codes (comma-separated)
tipo_logradouro- Street type (Rua, Avenida, etc.)logradouro- Street namenumero- Street numbercomplemento- Address complementbairro- Neighborhoodcep- Postal code (8 digits)sigla_uf- State abbreviation (2 letters)id_municipio- Municipality code (join with Municípios stream)nome_cidade_exterior- City name (for foreign addresses)id_pais- Country code (join with Países stream)
ddd_1- Area code (phone 1)telefone_1- Phone number 1ddd_2- Area code (phone 2)telefone_2- Phone number 2ddd_fax- Area code (fax)fax- Fax numberemail- Email address
year_month- Reference month of the data (YYYY-MM format, replication key)
Socios
Socios
Partners/shareholders data containing information about company ownership and legal representatives.Identification:
cnpj_basico- CNPJ root of the companytipo- Partner type: 1 (Individual), 2 (Legal Entity), 3 (Foreign)nome- Partner namedocumento- Partner document (CPF/CNPJ, partially masked)qualificacao- Partner qualification code (join with Qualificações stream)data_de_entrada- Entry date as partner (YYYYMMDD)
id_pais- Country code for foreign partners (join with Países stream)
cpf_representante_legal- CPF of legal representative (partially masked)nome_representante_legal- Name of legal representativequalificacao_representante_legal- Qualification code of legal representative
faixa_etaria- Age range code (for individuals)
year_month- Reference month of the data (YYYY-MM format, replication key)
Simples
Simples
Simples Nacional and MEI (Microempreendedor Individual) tax regime enrollment data.Key Fields:
cnpj_basico- CNPJ root of the company
opcao_simples- Simples Nacional status: S (Enrolled), N (Not enrolled), empty (Other)data_opcao_simples- Enrollment date (YYYYMMDD)data_exclusao_simples- Exclusion date (YYYYMMDD)
opcao_mei- MEI status: S (Enrolled), N (Not enrolled), empty (Other)data_opcao_mei- Enrollment date (YYYYMMDD)data_exclusao_mei- Exclusion date (YYYYMMDD)
year_month- Reference month of the data (YYYY-MM format, replication key)
CNAEs
CNAEs
Reference table for CNAE (Classificação Nacional de Atividades Econômicas) - Brazilian business activity classification codes.Key Fields:
codigo- CNAE code (7 digits)nome- Activity description
year_month- Reference month of the data (YYYY-MM format, replication key)
Municípios
Municípios
Reference table for Brazilian municipality codes.Key Fields:
codigo- Municipality code (IBGE code)nome- Municipality name
year_month- Reference month of the data (YYYY-MM format, replication key)
Natureza Jurídica
Natureza Jurídica
Reference table for legal nature codes (company types).Key Fields:
codigo- Legal nature codenome- Legal nature description (e.g., “Sociedade Limitada”, “Empresário Individual”)
year_month- Reference month of the data (YYYY-MM format, replication key)
Países
Países
Reference table for country codes.Key Fields:
codigo- Country codenome- Country name
year_month- Reference month of the data (YYYY-MM format, replication key)
Qualificações
Qualificações
Reference table for partner/shareholder qualification codes.Key Fields:
codigo- Qualification codenome- Qualification description (e.g., “Sócio-Administrador”, “Diretor”)
year_month- Reference month of the data (YYYY-MM format, replication key)
Data Model
The following diagram illustrates the relationships between the data streams. The arrows indicate the join keys that link the different entities.Use Cases for Data Analysis
This guide outlines valuable business intelligence use cases when consolidating Portal da Transparência CNPJ data, along with ready-to-use SQL queries that you can run on Explorer.1. Company Profile Lookup
Get complete company information by combining core data with establishment details. Business Value:- Due diligence for business partnerships
- Customer/supplier validation
- Market research and prospecting
SQL query
SQL query
- AWS
- GCP
2. Active Companies by State and Activity
Analyze the distribution of active companies by state and business activity. Business Value:- Market sizing and opportunity analysis
- Regional business intelligence
- Industry trend analysis
SQL query
SQL query
- AWS
- GCP
3. Simples Nacional and MEI Analysis
Identify companies enrolled in simplified tax regimes. Business Value:- Tax regime analysis for business development
- Understanding SME market composition
- Compliance verification
SQL query
SQL query
- AWS
- GCP
Implementation Notes
Data Volume and Performance
Data Volume and Performance
This data source contains extremely large datasets:
- Empresas: ~55+ million records
- Estabelecimentos: ~60+ million records
- Sócios: ~25+ million records
- Schedule extractions during off-peak hours
- Consider extracting only the streams you need
- Initial extraction can take several hours depending on network speed
CNPJ Structure
CNPJ Structure
The Brazilian CNPJ has 14 digits with the following structure:
cnpj_basico(8 digits): Identifies the companycnpj_ordem(4 digits): Identifies the branch (0001 = headquarters)cnpj_dv(2 digits): Check digits
Date Formats
Date Formats
Dates in this dataset are stored as strings in
YYYYMMDD format. To convert:Reference Table Joins
Reference Table Joins
The main streams contain codes that should be joined with reference tables for human-readable values:
natureza_juridica→ Natureza Jurídica streamcnae_fiscal_principal→ CNAEs streamid_municipio→ Municípios streamid_pais→ Países streamqualificacao→ Qualificações stream
Incremental Sync Strategy
Incremental Sync Strategy
The connector uses
year_month as the replication key. Each month, Receita Federal publishes a complete snapshot of all CNPJs. When using incremental sync:- New monthly data is appended to existing data
- You can track historical changes by comparing records across different
year_monthvalues - For the latest company state, filter by the most recent
year_month