Skip to content

How to Integrate with the Greenhouse API: A Guide for B2B SaaS

A practical guide to integrating with the Greenhouse API — covering Harvest, Ingestion, and Onboarding APIs, auth quirks across v1/v2/v3, RFC-5988 pagination traps, rate limits, and how to ship faster.

Sidharth Verma Sidharth Verma · · 13 min read
How to Integrate with the Greenhouse API: A Guide for B2B SaaS

If you're asking "how do I integrate with the Greenhouse API?" the short answer is: you're actually signing up to integrate with multiple distinct APIs, write custom parsers for legacy pagination headers, manage several different authentication flows, and build strict rate limiters for aggressively short time windows.

Your product doesn't integrate with Greenhouse. That enterprise deal just stalled because the prospect's talent acquisition team won't adopt a tool that doesn't plug into their ATS. Your engineering lead says it's a two-week project. They're wrong — and if you've been through this before with HRIS integrations or CRM integrations, you already know why.

The initial HTTP request to fetch a candidate takes an afternoon. Managing OAuth token lifecycles, parsing non-standard pagination headers, respecting aggressive rate limits, handling a looming API version deprecation, and maintaining integrations when a vendor pushes a breaking change — that's where the real weeks go.

This guide breaks down what it actually takes to build a reliable Greenhouse integration: the API surfaces you need to understand, the auth headaches across different endpoints, the pagination traps, and how to ship without burning your team's quarter.

The Business Case for a Greenhouse API Integration

The ATS market is expanding rapidly. According to MarketsandMarkets, the global applicant tracking system market is projected to grow from $3.28 billion in 2025 to $4.88 billion by 2030, representing a CAGR of 8.2%. More companies buying ATS platforms means more prospects expecting your B2B SaaS product to work with whichever system they've chosen.

Greenhouse dominates structured hiring at scale. Companies like DoorDash, Wayfair, and thousands of mid-market tech companies rely on it as their source of truth for headcount and talent acquisition. If your product touches employee data, onboarding, interviewing, or compensation, lacking a Greenhouse integration is a direct blocker for revenue.

And here's the uncomfortable part: solving for Greenhouse alone is rarely enough. HR tech stacks are notoriously fragmented. Your sales pipeline likely contains prospects using Lever, Workable, BambooHR, and Ashby alongside Greenhouse. If you need to support multiple ATS vendors, the engineering cost doesn't scale linearly — it compounds. Each vendor has different auth flows, different pagination patterns, different field naming conventions. A Greenhouse-only integration is table stakes. A multi-ATS strategy is the actual unlock.

Understanding the Greenhouse API Landscape

The first technical hurdle is realizing that "the Greenhouse API" doesn't exist as a single surface. Greenhouse exposes at least five different APIs, each designed for a different use case with different authentication requirements:

API Purpose Auth Method Notes
Harvest API Full data access (candidates, jobs, applications, offers) Basic Auth (v1/v2), OAuth 2.0 (v3) The primary API for most integrations
Candidate Ingestion API Source candidates from external platforms OAuth 2.0 or Basic Auth For sourcing partners and job portals
Job Board API Public job listings and application submission Public (read), Basic Auth (write) For building careers pages
Onboarding API Employee data post-hire Basic Auth GraphQL-only, not REST
Audit Log API Security and compliance event logs API Key Separate rate limit tiers

There are six separate APIs for Greenhouse, some of them with overlapping concerns (such as Harvest and Job Board APIs). Figuring out which API you actually need is the first decision you'll make.

For most B2B SaaS integrations, the Harvest API is what you want. It can be used to create HRIS integrations, candidate offer integrations, and job approval integrations, as well as candidate and job update integrations. It's designed to export internal candidate and job information from Greenhouse Recruiting via GET endpoints, but it also includes POST, PUT, PATCH, and DELETE endpoints for transforming data.

The Candidate Ingestion API is intended for sourcing partners like external job portals and agencies. It provides limited access to jobs and candidates, along with the ability to post new candidates. If you're building a sourcing tool, this is your API — not Harvest.

The Job Board API is intended for building custom careers pages. It's publicly accessible without authentication, cached, and not rate limited.

And then there's the Onboarding API, used exclusively for managing the transition from "Candidate" to "Employee" — handling new hire data, onboarding plans, and employee syncs to HRIS platforms. Greenhouse Onboarding only supports GraphQL; there's no REST option. They made this decision because GraphQL allows clients to request only the data they need, reducing payload sizes and increasing throughput.

Warning

Deprecation Alert: Harvest API v1 and v2 will be deprecated and unavailable after August 31, 2026. If you're building a new integration today, go straight to Harvest v3. If you have an existing integration on v1/v2, start planning your migration now.

Warning

Architectural Trade-off: Don't attempt to use the Harvest API to build a sourcing tool. While it's technically possible to create candidates via Harvest, the Ingestion API is optimized for resume parsing and deduplication. Mixing the two APIs in a single application means managing two completely different authentication flows.

Authentication: Basic Auth vs. OAuth 2.0 (and Why v3 Changes Everything)

Authentication is where most integration timelines start to slip. Because Greenhouse splits functionality across different APIs — and is in the middle of a major version migration — you may need to support multiple auth strategies simultaneously.

Harvest v1/v2: HTTP Basic Auth

Harvest v1 and v2 use Basic Auth over HTTPS. The username is your Greenhouse API token, and the password should be blank. Simple in theory, but there's a quirk that catches people:

// The trailing colon is required — it represents the empty password
const apiToken = 'your_greenhouse_api_token';
const credentials = Buffer.from(`${apiToken}:`).toString('base64');
 
const response = await fetch('https://harvest.greenhouse.io/v1/candidates', {
  headers: {
    'Authorization': `Basic ${credentials}`,
    'On-Behalf-Of': '12345' // Required for write operations
  }
});

The On-Behalf-Of header is an easy one to miss. For auditing purposes, write operations (creating, updating, and deleting resources) require an On-Behalf-Of HTTP header containing the Greenhouse ID of the user performing the operation. Forget it and your POST requests silently fail or get rejected. This means you must first query the Users endpoint to map your application's users to their corresponding Greenhouse IDs — or create a dedicated integration service user (ISU) for your API requests.

A subtlety about access control worth knowing: users with Harvest API keys may access all the data in the endpoint. Access to data in Harvest is binary — everything or nothing. There's no row-level scoping. If you grant access to the candidates endpoint, the key sees every candidate.

Harvest v3: OAuth 2.0 with JWT

After v1 and v2 endpoints are deprecated, OAuth will be the only supported authentication method for the Harvest API. Transition early to avoid disruptions.

For custom integrations, v3 uses a client credentials flow:

  1. Create Harvest v3 (OAuth) credentials in the Greenhouse Dev Center
  2. Exchange your client_id and client_secret for a JWT access token via POST https://auth.greenhouse.io/token
  3. Use the token as a Bearer token for all /v3 requests

For partner integrations, Greenhouse requires the OAuth 2.0 Authorization Code flow. Access tokens expire after 1 hour. Use the refresh_token (which lasts 24 hours) to get a new access token before the current one expires. If the refresh token expires — meaning it wasn't used within 24 hours — the user must repeat the entire authorization flow from scratch.

That 24-hour refresh token expiry is brutal. If your system doesn't proactively refresh tokens, a weekend without API activity means your customers wake up Monday morning to a broken integration. This is exactly the kind of token lifecycle problem that's easy to underestimate and hard to get right at scale. Building a reliable token refresh architecture is often a project in itself.

Candidate Ingestion API: Both Methods

The Ingestion API offers two methods of authentication: OAuth 2.0 and Basic Auth. If the mutual customer's users have accounts in both Greenhouse and the partner's system, the preferred authentication method is OAuth 2.0, giving users a cleaner authorization experience without copying and pasting API keys.

If you've built integrations with APIs that use simple offset/limit query parameters or JSON-based meta objects with cursors, Greenhouse's pagination will feel foreign. Greenhouse uses RFC-5988 Link headers — the pagination metadata lives in the HTTP response header, not the response body.

API methods that return a collection of results are always paginated. Paginated results include a Link (RFC-5988) response header. A typical Link header looks like this:

Link: <https://harvest.greenhouse.io/v1/candidates?page=2&per_page=100>; rel="next",
      <https://harvest.greenhouse.io/v1/candidates?page=10&per_page=100>; rel="last"

To fetch the next page, your application must parse this header, split the string by commas, isolate the segment containing rel="next", and extract the actual URL. This requires writing custom utility functions just to read a list of candidates:

function getNextUrl(linkHeader: string | null): string | null {
  if (!linkHeader) return null;
  const match = linkHeader
    .split(',')
    .find(part => part.includes('rel="next"'));
  if (!match) return null;
  const urlMatch = match.match(/<([^>]+)>/);
  return urlMatch ? urlMatch[1] : null;
}
 
async function* paginateHarvest(initialUrl: string, headers: HeadersInit) {
  let url: string | null = initialUrl;
  while (url) {
    const response = await fetch(url, { headers });
    if (!response.ok) throw new Error(`HTTP ${response.status}`);
    yield await response.json();
    url = getNextUrl(response.headers.get('Link'));
  }
}

The maximum allowed value for per_page is 500. Provide a value exceeding this limit and you'll receive a 422 Unprocessable Entity error.

v3: Cursor-Based Pagination

Harvest v3 uses cursor-based pagination for list endpoints — a significant change from v1/v2. The cursor value is a URL-safe, Base64-encoded payload that contains the information needed to paginate through the records you initially requested. Treat the cursor as an opaque value: don't parse it, and don't try to construct it yourself. Always take it from the Link header.

There's a sharp edge here that will break your integration if you miss it: when you pass a cursor, it must be the only query parameter. Put filters and per_page on the first request only, then follow the Link header exactly for subsequent pages.

Harvest v3 currently returns only a next link — no prev or last. You can't calculate total result counts or jump to arbitrary pages. Understanding how unified APIs handle these pagination divergences across vendors is a major hurdle for in-house integration teams.

How to Handle Greenhouse API Rate Limits Without Getting 429'd

Greenhouse's rate limits are measured in unusually short windows, which makes them easy to accidentally blow through during bulk operations.

Harvest v1/v2: Requests are limited to the amount specified in the X-RateLimit-Limit header per 10-second rolling window. Unlisted vendors may be subject to additional rate limits. Exceeding the limit returns an HTTP 429 response.

Harvest v3: If you exceed the request limit within a 30-second window, the API responds with an HTTP 429 Too Many Requests status code.

The response headers tell you everything you need:

Header Purpose
X-RateLimit-Limit Total requests allowed in the current window
X-RateLimit-Remaining Requests left before you're throttled
X-RateLimit-Reset Timestamp when the window resets
Retry-After Seconds to wait (only on 429 responses)
sequenceDiagram
  participant YourApp
  participant Greenhouse
  YourApp->>Greenhouse: GET /v1/candidates?page=5
  Greenhouse-->>YourApp: 429 Too Many Requests <br> X-RateLimit-Remaining: 0 <br> Retry-After: 4
  Note over YourApp: Circuit Breaker: <br> Pause execution for 4 seconds
  YourApp->>Greenhouse: GET /v1/candidates?page=5 (Retry)
  Greenhouse-->>YourApp: 200 OK

Don't wait for a 429 error to react. Use the X-RateLimit-Remaining header to dynamically control your request rate. If the remaining count drops below a threshold (say, 10), slow down your requests or pause them until the window resets.

50 requests per 10 seconds sounds generous until you're paginating through 10,000 candidates at 500 per page. That's 20 pages, consuming 20 of your 50 requests in a burst. Mix in scorecard fetches and attachment downloads and you'll be rate-limited before you've finished the initial sync.

Failing to respect the Retry-After header can result in temporary IP bans or integration suspension. Implementing exponential backoff with jitter is non-negotiable. For patterns on handling this across multiple providers, see our rate limits guide.

Common Integration Pitfalls: Custom Fields, Attachments, and the On-Behalf-Of Trap

Custom Fields Are Schema-on-Read

Enterprise Greenhouse instances are heavily customized. A standard candidate payload might contain 20 default fields and 50 custom fields specific to that company's hiring process. These custom fields are returned in a custom_fields array rather than as top-level properties, and they're extremely flexible.

Custom field value types include short_text, long_text, yes_no, single_select, multi_select, currency, currency_range, number, number_range, date, url, and user.

The keyed_custom_fields object is what you actually want to use. It contains the same information as custom_fields but includes the custom field's immutable field key, which won't change even if someone renames the custom field in the Greenhouse UI. If you key on the custom field name instead of the immutable key, a customer renaming "Salary Band" to "Compensation Range" will silently break your field mapping.

You also can't hardcode your schema expectations. Your integration must dynamically query the Greenhouse Custom Fields endpoint to understand the available data types (strings, booleans, single-select arrays) before reading or writing data.

One more gotcha: custom fields on the application object are only available to customers with Enterprise-level Greenhouse accounts. If your customer is on a lower tier, those fields won't exist in the API response. Your code needs to handle this gracefully.

File Attachments: Base64 In, Signed URLs Out

Uploading attachments (resumes, cover letters) requires base64-encoded content within the JSON payload:

{
  "filename": "resume.pdf",
  "type": "resume",
  "content": "JVBERi0xLjQKJcOkw7zDtsO...",
  "content_type": "application/pdf"
}

Base64-encoded content must be UTF-8 encoded strings. If you're handling large PDF portfolios, loading the entire base64 string into memory can cause performance issues in Node.js environments. You should stream the file, encode it in chunks, and ensure the final payload respects Greenhouse's file size limits.

The gotcha is on the download side. Resumes, cover letters, and other document attachments in Greenhouse are provided via signed, temporary URLs. Due to the ephemeral nature of these resource links, you should download documents immediately after the request is made and not rely on these URLs being available for future requests. If you cache candidate response payloads and try to access attachment URLs hours later, they'll be expired.

The On-Behalf-Of Requirement

Harvest API requests require validation via an On-Behalf-Of header that includes the ID of an active Greenhouse Recruiting user who has permission to view the requested data. This means your integration needs to either look up users via the Users API endpoint to get their Greenhouse IDs, or create a new integration system user (ISU) specifically for your API requests. Skipping this step or hardcoding an invalid user ID will cause write operations to fail in ways that are frustratingly difficult to debug.

How to Ship a Greenhouse Integration in Days, Not Months

Everything above — the auth complexity across API versions, the RFC-5988 pagination parsing, the rate limit dance, the custom field mapping, the attachment encoding — is undifferentiated work. It doesn't make your product better. It just makes your integration exist.

This is where a Unified ATS API earns its keep.

Truto is a unified API platform that abstracts away the proprietary endpoints of different ATS providers — including Greenhouse, Lever, Workable, and Ashby — enabling developers to manage candidates, applications, jobs, scorecards, and offers through a single, standardized schema. Instead of writing Greenhouse-specific pagination parsing, auth handling, and field mapping—or trying to normalize pagination and error handling yourself—you call a single endpoint:

curl https://api.truto.one/unified/ats/candidates \
  -H "Authorization: Bearer YOUR_TRUTO_TOKEN" \
  -H "X-Integrated-Account-Id: customer_greenhouse_account_id"

The same call works whether the customer is on Greenhouse, Lever, or any other supported ATS — without changing a line of code.

What Truto Handles So You Don't Have To

Truto's architecture is fundamentally different from building point-to-point connections. The platform contains zero integration-specific code — integration behavior is defined entirely as declarative configuration.

When you make a request to the Truto Unified ATS API:

  1. Auth lifecycle management: Truto automatically resolves the correct authentication method (Basic Auth or OAuth) for the specific connected account. For Greenhouse Harvest v3, it manages the entire OAuth 2.0 flow and proactively refreshes tokens before they expire. No more Monday-morning broken connections from expired refresh tokens.
  2. Pagination abstraction: RFC-5988 Link header parsing, cursor-based pagination in v3, opaque cursor handling — all transparent. You get a standard, predictable JSON pagination response regardless of the underlying provider.
  3. Rate limit handling: Truto monitors X-RateLimit-Remaining headers and automatically throttles requests, implementing exponential backoff with jitter when hitting 429 responses. The rate limit complexity is abstracted completely from your application logic.
  4. Schema normalization: Greenhouse's keyed_custom_fields, nested application.current_stage, and prospect vs. candidate distinctions get mapped into a consistent data model shared across all supported ATS vendors.

The Proxy API Escape Hatch

Unified APIs are powerful, but enterprise software is messy. Sometimes a customer has a highly specific, undocumented edge case that doesn't fit neatly into a unified model — like accessing Greenhouse's EEOC demographics data or custom field definitions.

Instead of blocking you, Truto provides a Proxy API. It gives you direct, unmapped access to Greenhouse's raw endpoints. You make a raw REST call to Greenhouse, and Truto still handles the authentication, token refreshes, and rate limits. You get the flexibility of a native integration with the infrastructure of a managed platform.

Info

The honest trade-off: A unified API adds a dependency and a layer of abstraction. You lose some fine-grained control over exactly how API calls are made. For teams building deep, single-vendor ATS integrations with heavy customization, going direct may make sense. For teams that need to support 3+ ATS vendors — which is most B2B SaaS companies selling into mid-market and enterprise — the engineering cost of building and maintaining each integration separately is almost never worth it.

Where to Go From Here

If you're starting a Greenhouse integration today, here's the decision tree:

  1. Single-vendor, Greenhouse only? Build directly against Harvest v3. Skip v1/v2 entirely. Budget 4–8 weeks for auth, pagination, rate limiting, custom fields, and error handling.
  2. Multi-vendor ATS support needed? Use a Unified ATS API. The schema normalization and auth management alone will save you months. You can always drop to the Proxy API for vendor-specific features.
  3. Already on Harvest v1/v2? Start your v3 migration now. The August 2026 deadline is closer than it looks, and the auth model is fundamentally different.

Whichever path you choose, don't let the initial simplicity of a GET /v1/candidates fool you. The real work is in the long tail: token refreshes at 3 AM, pagination edge cases with concurrent modifications, rate limit storms during bulk syncs, and custom field schemas that vary per customer. That's where integrations live or die.

FAQ

What authentication does the Greenhouse API use?
It depends on the API and version. Harvest v1/v2 uses HTTP Basic Auth with an API key as the username and a blank password. Harvest v3 uses OAuth 2.0 with JWT Bearer tokens. The Candidate Ingestion API supports both OAuth 2.0 and Basic Auth.
How does Greenhouse API pagination work?
Greenhouse uses RFC-5988 Link headers instead of JSON-based pagination. In v1/v2, you parse the Link header for page-based next/last URLs. In v3, pagination is cursor-based — the cursor must be the only query parameter on subsequent requests, and you should never construct cursor URLs manually.
What are the Greenhouse API rate limits?
Harvest v1/v2 allows a set number of requests per 10-second rolling window. Harvest v3 uses a 30-second fixed window. Both return HTTP 429 when exceeded. Monitor the X-RateLimit-Remaining header proactively and always respect the Retry-After header.
Is Greenhouse Harvest API v1 being deprecated?
Yes. Harvest API v1 and v2 will be deprecated and unavailable after August 31, 2026. All integrations must migrate to Harvest v3, which uses OAuth 2.0 instead of Basic Auth.
Can I use a unified API to integrate with Greenhouse?
Yes. Platforms like Truto provide a Unified ATS API that normalizes Greenhouse data into a standard schema alongside other ATS vendors like Lever, Workable, and Ashby, handling auth, pagination, and rate limits automatically.

More from our Blog