Lead Scoring

The problem

In Post 05, we crossed acquisition channel with funnel stage and found where each channel breaks. Useful for diagnosis, but it left something exposed: hundreds of open deals in the pipeline, and no clear criteria for which ones deserve attention first. Without a scoring system, the default is to prioritize whoever speaks loudest or arrived most recently. Neither correlates with likelihood to close. The question is simple: of all open deals, which ones have the highest chance of converting?

What we did

We went back to deals that already closed (won and lost) and looked for the factors with the strongest separation between the two groups. No machine learning model, no external tooling. Just grouping the data that already existed in the CRM and measuring which attributes correlated most with winning. We found 6 factors with clear predictive signal:

Acquisition channel

Historical win rate ranges from 2% to 57% depending on the channel. The strongest single predictor.

Cloud provider

Deals that connected a real cloud (AWS, GCP) have 45%+ win rate. Deals still on the free tier: 3%. Selecting a real provider signals concrete intent.

Activation depth

How many funnel stages the deal has passed through. 5 stages = 35% win rate. 2 stages = 4%. Progressive signal that improves as the deal advances.

Communication channel

Deals with an active communication channel (WhatsApp, Slack) convert well above average. Low coverage, but having any channel set is a strong intent signal.

First meeting

86% of won deals had at least one meeting registered. Only 7% of open deals do. Binary but highly predictive.

Stage velocity

Deals that have spent less time in their current stage than the average of deals that converted score higher. Stalled deals lose points.

Each factor contributes a weighted score. The total goes from 0 to 100, with weights calibrated by how much separation each factor shows between won and lost deals.

Factor	Weight	Why this weight
Acquisition channel	25 pts	Largest win rate spread (2% to 57%)
Cloud provider	20 pts	Binary intent signal with strong separation
Activation depth	25 pts	Progressive, 100% coverage, clear correlation
Communication channel	10 pts	Strong signal but low coverage (12%)
First meeting	10 pts	Binary, high correlation with closing
Stage velocity	10 pts	Penalizes stalled deals using won-deal benchmarks

What we found

The scoring model classified 840 open deals into four tiers:

5% of the pipeline deserves focus

44 deals scored above 40 (tiers A and B), combining strong channel, real cloud setup, deep activation, and active engagement. The remaining 95% are mostly early-stage self-serve deals with no significant activation.

Zero ML models needed

The entire score is computed with SQL over existing CRM data. Historical win rates serve as weights. No training, no external model. Anyone can understand why a deal scored high by looking at the factor breakdown.

The model doesn’t predict the future. It organizes the present using the past. Deals that look like the ones that already closed deserve more attention. That’s the entire logic, and it’s enough to go from “no criteria” to “clear priority list.”

The takeaway

Lead scoring doesn’t require a data science team. If you have enough closed deals to compute reliable win rates by segment, you have enough to build a scoring model. The factors will be different for every company, but the method is the same: find what separates won from lost, weight it, and apply it to open deals. This is Stage 03 of the maturity model: predictive. Using historical patterns to inform present decisions.

The Skill

Download the file below and add it to your Claude Skills. It will query your HubSpot pipeline data at Nekt and generate the lead scoring analysis automatically. ⬇ Download Skill file (.md)

SQL — Lead scoring with factor breakdown

Compute a composite score (0-100) for each open deal based on 6 historical factors. Replace pipeline ID, stage IDs, and cloud provider values with your own:

Nekt Express / GCP (BigQuery)
AWS (Amazon Athena)

WITH channel_scores AS (
    SELECT
        CASE
            WHEN properties.hs_analytics_source = 'OFFLINE'
                 AND properties.hs_analytics_source_data_1 = 'INTEGRATION' THEN 'Self-Serve'
            WHEN properties.hs_analytics_source = 'OFFLINE'
                 AND properties.hs_analytics_source_data_1 IN ('IMPORT', 'CRM_UI') THEN 'Sales-Assisted'
            WHEN properties.hs_analytics_source = 'DIRECT_TRAFFIC' THEN 'Direct Traffic'
            WHEN properties.hs_analytics_source IN ('SOCIAL_MEDIA', 'PAID_SOCIAL') THEN 'Social'
            WHEN properties.hs_analytics_source = 'PAID_SEARCH' THEN 'Paid Search'
            WHEN properties.hs_analytics_source = 'ORGANIC_SEARCH' THEN 'Organic Search'
            WHEN properties.hs_analytics_source IN ('OTHER_CAMPAIGNS', 'REFERRALS') THEN 'Campaigns/Referrals'
            ELSE 'Other'
        END AS channel,
        ROUND(100.0 * COUNT(CASE WHEN properties.hs_is_closed_won = 'true' THEN 1 END)
              / NULLIF(COUNT(*), 0), 1) AS channel_win_rate
    FROM `nekt_raw.hubspot_deals`
    WHERE properties.pipeline = 'YOUR_PIPELINE_ID'
    GROUP BY 1
),
stage_benchmarks AS (
    SELECT
        properties.dealstage,
        ROUND(AVG(CAST(properties.hs_v2_time_in_current_stage AS FLOAT64)
              / 86400000), 1) AS avg_days_in_stage
    FROM `nekt_raw.hubspot_deals`
    WHERE properties.pipeline = 'YOUR_PIPELINE_ID'
      AND properties.hs_is_closed_won = 'true'
    GROUP BY 1
),
scored_deals AS (
    SELECT
        d.properties.dealname,
        d.properties.dealstage,
        d.properties.createdate,
        CASE
            WHEN d.properties.hs_analytics_source = 'OFFLINE'
                 AND d.properties.hs_analytics_source_data_1 = 'INTEGRATION' THEN 'Self-Serve'
            WHEN d.properties.hs_analytics_source = 'OFFLINE'
                 AND d.properties.hs_analytics_source_data_1 IN ('IMPORT', 'CRM_UI') THEN 'Sales-Assisted'
            WHEN d.properties.hs_analytics_source = 'DIRECT_TRAFFIC' THEN 'Direct Traffic'
            WHEN d.properties.hs_analytics_source IN ('SOCIAL_MEDIA', 'PAID_SOCIAL') THEN 'Social'
            WHEN d.properties.hs_analytics_source = 'PAID_SEARCH' THEN 'Paid Search'
            WHEN d.properties.hs_analytics_source = 'ORGANIC_SEARCH' THEN 'Organic Search'
            WHEN d.properties.hs_analytics_source IN ('OTHER_CAMPAIGNS', 'REFERRALS') THEN 'Campaigns/Referrals'
            ELSE 'Other'
        END AS channel,
        ROUND(cs.channel_win_rate * 0.25, 1) AS channel_score,
        d.properties.cloud_provider,
        CASE
            WHEN d.properties.cloud_provider IN ('AWS', 'GCP', 'Azure', 'Oracle') THEN 20.0
            WHEN d.properties.cloud_provider = 'None' THEN 10.0
            ELSE 0.0
        END AS cloud_score,
        (CASE WHEN d.properties.hs_v2_date_entered_STAGE1ID IS NOT NULL THEN 1 ELSE 0 END
        + CASE WHEN d.properties.hs_v2_date_entered_STAGE2ID IS NOT NULL THEN 1 ELSE 0 END
        + CASE WHEN d.properties.hs_v2_date_entered_STAGE3ID IS NOT NULL THEN 1 ELSE 0 END
        + CASE WHEN d.properties.hs_v2_date_entered_STAGE4ID IS NOT NULL THEN 1 ELSE 0 END
        ) AS stages_reached,
        ROUND(25.0 * (
            CASE WHEN d.properties.hs_v2_date_entered_STAGE1ID IS NOT NULL THEN 1 ELSE 0 END
            + CASE WHEN d.properties.hs_v2_date_entered_STAGE2ID IS NOT NULL THEN 1 ELSE 0 END
            + CASE WHEN d.properties.hs_v2_date_entered_STAGE3ID IS NOT NULL THEN 1 ELSE 0 END
            + CASE WHEN d.properties.hs_v2_date_entered_STAGE4ID IS NOT NULL THEN 1 ELSE 0 END
        ) / 4.0, 1) AS activation_score,
        CASE WHEN d.properties.communication_channel IS NOT NULL
             THEN 10.0 ELSE 0.0 END AS comm_score,
        CASE WHEN d.properties.first_meeting_date IS NOT NULL
             THEN 10.0 ELSE 0.0 END AS meeting_score,
        CASE
            WHEN sb.avg_days_in_stage IS NULL OR sb.avg_days_in_stage = 0 THEN 5.0
            WHEN CAST(d.properties.hs_v2_time_in_current_stage AS FLOAT64)
                 / 86400000 <= sb.avg_days_in_stage THEN 10.0
            WHEN CAST(d.properties.hs_v2_time_in_current_stage AS FLOAT64)
                 / 86400000 <= sb.avg_days_in_stage * 2 THEN 5.0
            ELSE 0.0
        END AS velocity_score
    FROM `nekt_raw.hubspot_deals` d
    LEFT JOIN channel_scores cs
        ON cs.channel = CASE
            WHEN d.properties.hs_analytics_source = 'OFFLINE'
                 AND d.properties.hs_analytics_source_data_1 = 'INTEGRATION' THEN 'Self-Serve'
            WHEN d.properties.hs_analytics_source = 'OFFLINE'
                 AND d.properties.hs_analytics_source_data_1 IN ('IMPORT', 'CRM_UI') THEN 'Sales-Assisted'
            WHEN d.properties.hs_analytics_source = 'DIRECT_TRAFFIC' THEN 'Direct Traffic'
            WHEN d.properties.hs_analytics_source IN ('SOCIAL_MEDIA', 'PAID_SOCIAL') THEN 'Social'
            WHEN d.properties.hs_analytics_source = 'PAID_SEARCH' THEN 'Paid Search'
            WHEN d.properties.hs_analytics_source = 'ORGANIC_SEARCH' THEN 'Organic Search'
            WHEN d.properties.hs_analytics_source IN ('OTHER_CAMPAIGNS', 'REFERRALS') THEN 'Campaigns/Referrals'
            ELSE 'Other'
        END
    LEFT JOIN stage_benchmarks sb
        ON sb.dealstage = d.properties.dealstage
    WHERE d.properties.pipeline = 'YOUR_PIPELINE_ID'
      AND d.properties.hs_is_closed_won = 'false'
      AND d.properties.hs_is_closed_lost = 'false'
)
SELECT
    dealname, channel, cloud_provider, stages_reached,
    channel_score, cloud_score, activation_score,
    comm_score, meeting_score, velocity_score,
    ROUND(channel_score + cloud_score + activation_score
          + comm_score + meeting_score + velocity_score, 1) AS total_score
FROM scored_deals
ORDER BY total_score DESC
LIMIT 30

WITH channel_scores AS (
    SELECT
        CASE
            WHEN properties.hs_analytics_source = 'OFFLINE'
                 AND properties.hs_analytics_source_data_1 = 'INTEGRATION' THEN 'Self-Serve'
            WHEN properties.hs_analytics_source = 'OFFLINE'
                 AND properties.hs_analytics_source_data_1 IN ('IMPORT', 'CRM_UI') THEN 'Sales-Assisted'
            WHEN properties.hs_analytics_source = 'DIRECT_TRAFFIC' THEN 'Direct Traffic'
            WHEN properties.hs_analytics_source IN ('SOCIAL_MEDIA', 'PAID_SOCIAL') THEN 'Social'
            WHEN properties.hs_analytics_source = 'PAID_SEARCH' THEN 'Paid Search'
            WHEN properties.hs_analytics_source = 'ORGANIC_SEARCH' THEN 'Organic Search'
            WHEN properties.hs_analytics_source IN ('OTHER_CAMPAIGNS', 'REFERRALS') THEN 'Campaigns/Referrals'
            ELSE 'Other'
        END AS channel,
        ROUND(100.0 * COUNT(CASE WHEN properties.hs_is_closed_won = 'true' THEN 1 END)
              / NULLIF(COUNT(*), 0), 1) AS channel_win_rate
    FROM nekt_raw.hubspot_deals
    WHERE properties.pipeline = 'YOUR_PIPELINE_ID'
    GROUP BY 1
),
stage_benchmarks AS (
    SELECT
        properties.dealstage,
        ROUND(AVG(TRY_CAST(properties.hs_v2_time_in_current_stage AS DOUBLE)
              / 86400000), 1) AS avg_days_in_stage
    FROM nekt_raw.hubspot_deals
    WHERE properties.pipeline = 'YOUR_PIPELINE_ID'
      AND properties.hs_is_closed_won = 'true'
    GROUP BY 1
),
scored_deals AS (
    SELECT
        d.properties.dealname,
        d.properties.dealstage,
        d.properties.createdate,
        CASE
            WHEN d.properties.hs_analytics_source = 'OFFLINE'
                 AND d.properties.hs_analytics_source_data_1 = 'INTEGRATION' THEN 'Self-Serve'
            WHEN d.properties.hs_analytics_source = 'OFFLINE'
                 AND d.properties.hs_analytics_source_data_1 IN ('IMPORT', 'CRM_UI') THEN 'Sales-Assisted'
            WHEN d.properties.hs_analytics_source = 'DIRECT_TRAFFIC' THEN 'Direct Traffic'
            WHEN d.properties.hs_analytics_source IN ('SOCIAL_MEDIA', 'PAID_SOCIAL') THEN 'Social'
            WHEN d.properties.hs_analytics_source = 'PAID_SEARCH' THEN 'Paid Search'
            WHEN d.properties.hs_analytics_source = 'ORGANIC_SEARCH' THEN 'Organic Search'
            WHEN d.properties.hs_analytics_source IN ('OTHER_CAMPAIGNS', 'REFERRALS') THEN 'Campaigns/Referrals'
            ELSE 'Other'
        END AS channel,
        ROUND(cs.channel_win_rate * 0.25, 1) AS channel_score,
        d.properties.cloud_provider,
        CASE
            WHEN d.properties.cloud_provider IN ('AWS', 'GCP', 'Azure', 'Oracle') THEN 20.0
            WHEN d.properties.cloud_provider = 'None' THEN 10.0
            ELSE 0.0
        END AS cloud_score,
        (CASE WHEN d.properties.hs_v2_date_entered_STAGE1ID IS NOT NULL THEN 1 ELSE 0 END
        + CASE WHEN d.properties.hs_v2_date_entered_STAGE2ID IS NOT NULL THEN 1 ELSE 0 END
        + CASE WHEN d.properties.hs_v2_date_entered_STAGE3ID IS NOT NULL THEN 1 ELSE 0 END
        + CASE WHEN d.properties.hs_v2_date_entered_STAGE4ID IS NOT NULL THEN 1 ELSE 0 END
        ) AS stages_reached,
        ROUND(25.0 * (
            CASE WHEN d.properties.hs_v2_date_entered_STAGE1ID IS NOT NULL THEN 1 ELSE 0 END
            + CASE WHEN d.properties.hs_v2_date_entered_STAGE2ID IS NOT NULL THEN 1 ELSE 0 END
            + CASE WHEN d.properties.hs_v2_date_entered_STAGE3ID IS NOT NULL THEN 1 ELSE 0 END
            + CASE WHEN d.properties.hs_v2_date_entered_STAGE4ID IS NOT NULL THEN 1 ELSE 0 END
        ) / 4.0, 1) AS activation_score,
        CASE WHEN d.properties.communication_channel IS NOT NULL
             THEN 10.0 ELSE 0.0 END AS comm_score,
        CASE WHEN d.properties.first_meeting_date IS NOT NULL
             THEN 10.0 ELSE 0.0 END AS meeting_score,
        CASE
            WHEN sb.avg_days_in_stage IS NULL OR sb.avg_days_in_stage = 0 THEN 5.0
            WHEN TRY_CAST(d.properties.hs_v2_time_in_current_stage AS DOUBLE)
                 / 86400000 <= sb.avg_days_in_stage THEN 10.0
            WHEN TRY_CAST(d.properties.hs_v2_time_in_current_stage AS DOUBLE)
                 / 86400000 <= sb.avg_days_in_stage * 2 THEN 5.0
            ELSE 0.0
        END AS velocity_score
    FROM nekt_raw.hubspot_deals d
    LEFT JOIN channel_scores cs
        ON cs.channel = CASE
            WHEN d.properties.hs_analytics_source = 'OFFLINE'
                 AND d.properties.hs_analytics_source_data_1 = 'INTEGRATION' THEN 'Self-Serve'
            WHEN d.properties.hs_analytics_source = 'OFFLINE'
                 AND d.properties.hs_analytics_source_data_1 IN ('IMPORT', 'CRM_UI') THEN 'Sales-Assisted'
            WHEN d.properties.hs_analytics_source = 'DIRECT_TRAFFIC' THEN 'Direct Traffic'
            WHEN d.properties.hs_analytics_source IN ('SOCIAL_MEDIA', 'PAID_SOCIAL') THEN 'Social'
            WHEN d.properties.hs_analytics_source = 'PAID_SEARCH' THEN 'Paid Search'
            WHEN d.properties.hs_analytics_source = 'ORGANIC_SEARCH' THEN 'Organic Search'
            WHEN d.properties.hs_analytics_source IN ('OTHER_CAMPAIGNS', 'REFERRALS') THEN 'Campaigns/Referrals'
            ELSE 'Other'
        END
    LEFT JOIN stage_benchmarks sb
        ON sb.dealstage = d.properties.dealstage
    WHERE d.properties.pipeline = 'YOUR_PIPELINE_ID'
      AND d.properties.hs_is_closed_won = 'false'
      AND d.properties.hs_is_closed_lost = 'false'
)
SELECT
    dealname, channel, cloud_provider, stages_reached,
    channel_score, cloud_score, activation_score,
    comm_score, meeting_score, velocity_score,
    ROUND(channel_score + cloud_score + activation_score
          + comm_score + meeting_score + velocity_score, 1) AS total_score
FROM scored_deals
ORDER BY total_score DESC
LIMIT 30

SQL — Tier distribution

See how many deals fall into each scoring tier to understand the shape of your pipeline:

Nekt Express / GCP (BigQuery)
AWS (Amazon Athena)

-- Use the same CTEs from the scoring query above, then:
SELECT
    CASE
        WHEN total_score >= 60 THEN 'A (60+)'
        WHEN total_score >= 40 THEN 'B (40-59)'
        WHEN total_score >= 20 THEN 'C (20-39)'
        ELSE 'D (0-19)'
    END AS tier,
    COUNT(*) AS deals,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct_pipeline,
    ROUND(AVG(total_score), 1) AS avg_score
FROM scored_deals_with_total  -- wrap the scored_deals CTE adding total_score column
GROUP BY 1
ORDER BY 1

-- Use the same CTEs from the scoring query above, then:
SELECT
    CASE
        WHEN total_score >= 60 THEN 'A (60+)'
        WHEN total_score >= 40 THEN 'B (40-59)'
        WHEN total_score >= 20 THEN 'C (20-39)'
        ELSE 'D (0-19)'
    END AS tier,
    COUNT(*) AS deals,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct_pipeline,
    ROUND(AVG(total_score), 1) AS avg_score
FROM scored_deals_with_total  -- wrap the scored_deals CTE adding total_score column
GROUP BY 1
ORDER BY 1

Variations

Backtest the model against closed deals

Validate the scoring model by comparing the average score of won vs lost deals. If the model works, won deals should score significantly higher:

-- Use the same channel_scores CTE, then score closed deals
-- (exclude velocity since it's not comparable for closed deals)
SELECT
    CASE
        WHEN properties.hs_is_closed_won = 'true' THEN 'won'
        WHEN properties.hs_is_closed_lost = 'true' THEN 'lost'
    END AS outcome,
    COUNT(*) AS deals,
    ROUND(AVG(score), 1) AS avg_score,
    ROUND(MIN(score), 1) AS min_score,
    ROUND(MAX(score), 1) AS max_score
FROM scored_closed_deals
GROUP BY 1
ORDER BY 1

High-score deals missing a meeting

Find deals that score well on other factors but have never had a meeting. These are the most obvious candidates for a sales outreach:

SELECT
    properties.dealname,
    properties.dealstage,
    properties.createdate,
    properties.cloud_provider
FROM nekt_raw.hubspot_deals
WHERE properties.pipeline = 'YOUR_PIPELINE_ID'
  AND properties.hs_is_closed_won = 'false'
  AND properties.hs_is_closed_lost = 'false'
  AND properties.first_meeting_date IS NULL
  AND (
    properties.cloud_provider IN ('AWS', 'GCP', 'Azure', 'Oracle')
    OR properties.communication_channel IS NOT NULL
    OR properties.hs_v2_date_entered_STAGE3ID IS NOT NULL
  )
ORDER BY properties.createdate DESC

Average score by acquisition channel

See which channels produce the highest-scoring open deals on average. Useful to understand if channel alone determines the score ceiling:

-- Use the same scored_deals CTE, then:
SELECT
    channel,
    COUNT(*) AS open_deals,
    ROUND(AVG(total_score), 1) AS avg_score,
    COUNT(CASE WHEN communication_channel IS NOT NULL THEN 1 END) AS with_comm,
    COUNT(CASE WHEN first_meeting_date IS NOT NULL THEN 1 END) AS with_meeting
FROM scored_deals
GROUP BY 1
ORDER BY avg_score DESC

Watch the walkthrough (PT-BR)

Implementation notes

This is not machine learning. The model uses historical win rates as weights, computed directly in SQL. No training step, no external model. The advantage: anyone can understand why a deal scored high by reading the factor breakdown.
The weights (25/20/25/10/10/10) are design choices, not regression outputs. Channel and activation depth get the highest weights because they showed the largest separation between won and lost in the data. Adjust them as your pipeline evolves and you accumulate more closed deals.
Cloud provider as an intent signal. In Nekt’s context (data platform), connecting a real cloud (AWS, GCP) demonstrates concrete technical intent. In other products, the equivalent is any setup action that requires real effort: connecting an integration, configuring an environment, importing data.
Communication channel has low coverage (12%) but strong signal. It only exists when filled manually. It works as a proxy for “sales is already in active contact.” In pipelines with more automatically-filled engagement fields, consider substituting with activity metrics (emails, calls, meetings).
Stage velocity benchmarks are computed from won deals only. The average time per stage uses only deals that actually converted, not all deals. This prevents deals stalled for months from contaminating the benchmark.
Backtesting is essential. The backtesting variation compares the average score of won vs lost deals. If the separation is small, the weights need adjusting or there are relevant factors missing from the model.
Pipeline and stage IDs are specific to each HubSpot account. Use the pipeline overview query to discover yours.
The scoring model works for any funnel type. Replace “cloud provider” and “activation depth” with equivalent intent signals in your pipeline (demos completed, proposal sent, contract in review). The principle is the same: concrete actions that indicate real progress.

Next up

With a scoring model to prioritize deals, the next step is using the same historical patterns to build a revenue forecast. How much is the current pipeline actually worth, weighted by the real probability of each deal closing?

07 · Revenue Forecast

Stage 03 · Predictive — A forecast the CFO can defend. Pipeline weighted by actual win rate per stage and source.

​The problem

​What we did

Acquisition channel

Cloud provider

Activation depth

Communication channel

First meeting

Stage velocity

​What we found

5% of the pipeline deserves focus

Zero ML models needed

​The takeaway

​The Skill

​SQL — Lead scoring with factor breakdown

​SQL — Tier distribution

​Variations

​Watch the walkthrough (PT-BR)

​Implementation notes

​Next up

07 · Revenue Forecast

The problem

What we did

What we found

The takeaway

The Skill

SQL — Lead scoring with factor breakdown

SQL — Tier distribution

Variations

Watch the walkthrough (PT-BR)

Implementation notes

Next up