HubSpot + Apollo: Why the Native Integration Creates Duplicates and Loses Context

Symptoms: What Happens to Your Data

You connected Apollo to HubSpot through the native integration. Everything seems to be working: contacts from Apollo appear in HubSpot. But a few weeks later the picture becomes alarming:

  • HubSpot has “John Smith” from Acme Corp with email john@acme.com - and another “John Smith” also with john@acme.com, created by Apollo
  • HubSpot deals are not linked to activity from Apollo sequences
  • Apollo has an “Apollo Score” field (AI prospect rating) - it is not in HubSpot
  • The HubSpot timeline does not show that the contact opened emails in Apollo sequences
  • An SDR opens the contact card in HubSpot - and does not know that a colleague already wrote to them three days ago from Apollo

All of this is a direct consequence of architectural limitations of the native Apollo + HubSpot integration.

Root Causes of the Problems

1. Duplicate Contacts - Matching Only by Email

When Apollo creates a contact in HubSpot (when adding to a sequence), the native integration checks for duplicates only by the email field. If HubSpot already has a contact john@acme.com - the integration should find them and not create a new one.

The problem: HubSpot stores email as primary + secondary. Apollo checks only the primary email. If the contact in HubSpot has the corporate email as secondary (and personal as primary), Apollo will create a duplicate.

Second problem: some companies use email aliases. john@acme.com and j.smith@acme.com are different addresses from the native integration’s perspective, but the same person. Apollo does not match by name + company domain.

2. No Association with Existing HubSpot Deals

The native Apollo <-> HubSpot integration creates/updates Contact and Company. It does not touch Deals in HubSpot.

Result: Apollo adds a contact to the “Enterprise Outreach” sequence. This same contact already has an open deal in HubSpot with another rep. The native integration does not see this connection. There is no notification in HubSpot that “a colleague is working with this lead in Apollo.”

3. Apollo Custom Properties Are Not Passed to HubSpot

Apollo has its own set of fields that are valuable for prioritization:

  • apollo_score - AI prospect rating (0-100)
  • seniority - level in the company (C-level, VP, Manager)
  • linkedin_url - LinkedIn profile
  • keywords - technologies the company uses
  • technologies - technology stack (from Apollo Intent data)

The native integration syncs basic fields: First Name, Last Name, Email, Title, Company. Apollo-specific fields remain only in Apollo.

4. Sequence Activity Does Not Reach the HubSpot Engagement Timeline

HubSpot Timeline is the history of interactions with a contact: calls, emails, meetings. The native Apollo integration does NOT write to the Timeline:

  • The fact of being added to a sequence
  • Emails sent from Apollo
  • Email opens and clicks from Apollo
  • Replies to Apollo emails

An SDR opening a contact in HubSpot sees an empty Timeline - even though a colleague already sent 3 emails via Apollo and received an “out of office” auto-reply.

Analyzing Data Loss

Estimate the scale of the problem in your database:

import os
import requests
from collections import defaultdict

HUBSPOT_TOKEN = os.environ["HUBSPOT_PRIVATE_APP_TOKEN"]
HUBSPOT_BASE = "https://api.hubapi.com"


def find_duplicate_contacts() -> dict:
    """Find duplicate contacts in HubSpot by email."""
    headers = {"Authorization": f"Bearer {HUBSPOT_TOKEN}"}
    email_to_contacts: defaultdict[str, list] = defaultdict(list)
    after = None

    while True:
        params = {
            "limit": 100,
            "properties": "email,firstname,lastname,createdate,hs_analytics_source",
        }
        if after:
            params["after"] = after

        r = requests.get(
            f"{HUBSPOT_BASE}/crm/v3/objects/contacts",
            params=params,
            headers=headers,
            timeout=15,
        )
        if not r.ok:
            break

        data = r.json()
        contacts = data.get("results", [])

        for contact in contacts:
            email = contact["properties"].get("email", "")
            if email:
                email_to_contacts[email.lower()].append({
                    "id": contact["id"],
                    "name": f"{contact['properties'].get('firstname', '')} {contact['properties'].get('lastname', '')}",
                    "created": contact["properties"].get("createdate"),
                    "source": contact["properties"].get("hs_analytics_source"),
                })

        paging = data.get("paging")
        if paging and paging.get("next"):
            after = paging["next"]["after"]
        else:
            break

    duplicates = {
        email: contacts
        for email, contacts in email_to_contacts.items()
        if len(contacts) > 1
    }
    return duplicates


duplicates = find_duplicate_contacts()
print(f"Duplicates found: {len(duplicates)}")
for email, contacts in list(duplicates.items())[:5]:
    print(f"  {email}: {len(contacts)} records")
    for c in contacts:
        print(f"    ID={c['id']}, source={c['source']}, created={c['created']}")

The Right Approach: Custom Integration

A custom Apollo + HubSpot integration via both systems’ APIs solves all problems:

Step 1. Bidirectional Matching During Contact Sync

import re

APOLLO_API_KEY = os.environ["APOLLO_API_KEY"]
APOLLO_BASE = "https://api.apollo.io/v1"


def search_hubspot_contact(email: str, domain: str, name: str) -> str | None:
    """Search for a contact in HubSpot by email, then by domain+name."""
    headers = {"Authorization": f"Bearer {HUBSPOT_TOKEN}"}

    # Search by exact email
    r = requests.post(
        f"{HUBSPOT_BASE}/crm/v3/objects/contacts/search",
        json={
            "filterGroups": [{
                "filters": [{
                    "propertyName": "email",
                    "operator": "EQ",
                    "value": email,
                }]
            }],
            "properties": ["email", "firstname", "lastname", "associatedcompanyid"],
        },
        headers=headers,
        timeout=10,
    )
    if r.ok:
        results = r.json().get("results", [])
        if results:
            return results[0]["id"]

    # Search by domain + name (fallback)
    if domain and name:
        domain_r = requests.post(
            f"{HUBSPOT_BASE}/crm/v3/objects/contacts/search",
            json={
                "filterGroups": [{
                    "filters": [
                        {"propertyName": "email", "operator": "CONTAINS_TOKEN", "value": f"*@{domain}"},
                        {"propertyName": "lastname", "operator": "EQ",
                         "value": name.split()[-1] if name.split() else ""},
                    ]
                }],
                "properties": ["email", "firstname", "lastname"],
            },
            headers=headers,
            timeout=10,
        )
        if domain_r.ok:
            results = domain_r.json().get("results", [])
            if results:
                return results[0]["id"]

    return None


def sync_apollo_contact_to_hubspot(apollo_contact: dict) -> str:
    """Sync an Apollo contact to HubSpot with the full set of fields."""
    email = apollo_contact.get("email", "")
    domain = email.split("@")[1] if "@" in email else ""
    name = f"{apollo_contact.get('first_name', '')} {apollo_contact.get('last_name', '')}".strip()

    existing_id = search_hubspot_contact(email, domain, name)

    headers = {
        "Authorization": f"Bearer {HUBSPOT_TOKEN}",
        "Content-Type": "application/json",
    }

    properties = {
        "email": email,
        "firstname": apollo_contact.get("first_name", ""),
        "lastname": apollo_contact.get("last_name", ""),
        "jobtitle": apollo_contact.get("title", ""),
        "phone": apollo_contact.get("phone", ""),
        "linkedin_url": apollo_contact.get("linkedin_url", ""),
        # Apollo custom properties - must be created in HubSpot in advance
        "apollo_score": str(apollo_contact.get("score", 0)),
        "apollo_seniority": apollo_contact.get("seniority", ""),
        "apollo_technologies": ", ".join(
            apollo_contact.get("account", {}).get("technologies", [])[:10]
        ),
    }

    if existing_id:
        # Update existing
        r = requests.patch(
            f"{HUBSPOT_BASE}/crm/v3/objects/contacts/{existing_id}",
            json={"properties": properties},
            headers=headers,
            timeout=10,
        )
        return existing_id
    else:
        # Create new
        r = requests.post(
            f"{HUBSPOT_BASE}/crm/v3/objects/contacts",
            json={"properties": properties},
            headers=headers,
            timeout=10,
        )
        if r.ok:
            return r.json()["id"]
    return ""


def log_apollo_sequence_activity(
    hubspot_contact_id: str,
    sequence_name: str,
    event_type: str,
    email_subject: str,
    occurred_at: str,
):
    """Write Apollo sequence activity to HubSpot Timeline."""
    # A Custom Timeline Event Type must be created in HubSpot in advance
    TIMELINE_EVENT_TYPE_ID = os.environ.get("HUBSPOT_APOLLO_TIMELINE_TYPE_ID", "")
    APP_ID = os.environ.get("HUBSPOT_APP_ID", "")

    if not TIMELINE_EVENT_TYPE_ID or not APP_ID:
        return

    r = requests.put(
        f"{HUBSPOT_BASE}/integrations/v1/{APP_ID}/timeline/event",
        headers={
            "Authorization": f"Bearer {HUBSPOT_TOKEN}",
            "Content-Type": "application/json",
        },
        json={
            "eventTypeId": TIMELINE_EVENT_TYPE_ID,
            "id": f"apollo_{sequence_name}_{event_type}_{hubspot_contact_id}_{occurred_at}",
            "objectId": hubspot_contact_id,
            "occurredAt": occurred_at,
            "extraData": {
                "sequence_name": sequence_name,
                "event_type": event_type,
                "email_subject": email_subject,
            },
        },
        timeout=10,
    )
    return r.ok

Step 2. Association with an Existing Deal

def associate_contact_to_deal_if_exists(
    hubspot_contact_id: str, deal_search_email: str
):
    """Associate a contact with an existing open deal if one exists."""
    headers = {
        "Authorization": f"Bearer {HUBSPOT_TOKEN}",
        "Content-Type": "application/json",
    }

    # Find open deals for the contact
    r = requests.get(
        f"{HUBSPOT_BASE}/crm/v3/objects/contacts/{hubspot_contact_id}/associations/deals",
        headers=headers,
        timeout=10,
    )
    if r.ok:
        deals = r.json().get("results", [])
        if deals:
            # Contact already linked to a deal - do not create a duplicate
            print(f"Contact {hubspot_contact_id} already has {len(deals)} deal(s)")
            return deals[0]["id"]

    # No deals - can create one or simply leave it
    return None

Who Needs a Custom Integration

A custom Apollo + HubSpot integration is needed by companies:

  • With a HubSpot base of 5,000+ contacts where the duplicate problem is already noticeable
  • That use Apollo Intent Data and Apollo Score for prioritization - and want to see that data in HubSpot
  • Where SDRs work in Apollo, and AEs and CSMs work in HubSpot - requiring a unified history
  • Where compliance requirements (GDPR, SOC2) demand a full audit of all communications in one system

If you are interested in other HubSpot anti-patterns, read about HubSpot + Zendesk and HubSpot + Slack.

Frequently Asked Questions

Is the native Apollo integration a paid feature? The HubSpot integration in Apollo is available on Professional plans and above. On the Basic plan only manual CSV export and basic sync via Zapier are available.

Can you clean up existing duplicates before switching to a custom integration? Yes. HubSpot has a built-in Merge Contacts tool (Settings -> Data Management -> Duplicates). For bulk cleanup, use the HubSpot API: find duplicates with the script above, then call POST /crm/v3/objects/contacts/merge for each pair.

Is the Apollo API paid? The Apollo REST API is available on paid plans (Basic from $49/month). On the free plan there is no API access. A custom integration requires at least the Apollo Basic plan.

How do you ensure GDPR compliance when syncing data? Apollo collects data from public sources. When syncing to HubSpot, ensure the contact has a lawful basis for data processing (legitimate interest for B2B prospecting in the EU is permissible but must be documented). Do not sync contacts without explicit opt-in in countries with strict regulations (Germany, Netherlands).

If you have problems with the native HubSpot + Apollo integration - describe the symptoms to the Exceltic.dev team. We will analyze the architecture and propose a solution without duplicates or data loss.

More articles

All →