Back Home

References

Building a Webflow to Algolia Sync with Cloudflare Workers

Javascript

Cloud computing

This article walks through building an automated sync between Webflow's CMS and Algolia's search service using Cloudflare Workers. We'll cover why this architecture makes sense, how the pieces fit together, and what I learned building this.

Side note: If you just need to get set up fast, there’s a quickstart guide inside the GitHub repo. Just clone it and follow the instructions!

Prerequisites and Foundations

Before diving into code, let's establish what you'll need to follow along. You should be comfortable with JavaScript basics, such as functions, promises, and array methods. You don't need to be an expert, but you should understand what async/await does and how to work with APIs.

You'll also need accounts with Webflow, Algolia, and Cloudflare. The free tiers work fine for testing.

Understanding Serverless Functions

If you're new to serverless functions, think of them as small pieces of code that run on a server when triggered by an event.

You don't manage servers or worry about scaling. You write a function (endpoint), deploy it, and it runs when needed.

Cloudflare Workers are serverless functions that run at the edge, meaning they execute close to your users on Cloudflare's network. This makes them fast and reliable. Workers can be triggered by HTTP requests, scheduled times (cron jobs), or other events.

In our case, we'll use both HTTP webhooks and scheduled cron triggers.

The key advantage of serverless for this sync task is that we only pay for what we use. The sync might run once a day or after every publish. Either way, we're not paying for a server sitting idle between syncs.

Why This Architecture Makes Sense

You might wonder why we need this complexity when Webflow has its own search.

Algolia is purpose-built for instant search. It returns results in milliseconds, handles typos intelligently, and offers features such as faceted search and analytics.

The architecture we're building creates a bridge between these two systems. Webflow remains your source of truth for content. Algolia becomes a synchronised copy optimised for search. Cloudflare Workers handle the synchronisation, running either on a schedule or when you publish changes.

This separation of concerns keeps each system doing what it does best. Webflow manages content. Algolia searches it. Cloudflare Workers keep them in sync.

Project Structure and Design Decisions

The code is organised into modular helpers that each handle one responsibility. This makes the code easier to understand, test, and modify. Here's the structure:

src/
├── index.js
├── helpers/
│   ├── webflow.js
│   ├── algolia.js
│   └── sync.js

The main index.js file contains two handlers. The scheduled handler runs on a cron schedule, while the fetch handler responds to HTTP requests from Webflow's webhook. Both handlers call the same performSync function, keeping the logic DRY.

Let's examine each piece to understand how they work together.

The Entry Point: Worker Handlers

The Worker needs to respond to two types of triggers. First, the scheduled cron job runs daily to keep everything in sync. Second, the webhook that fires when someone publishes in Webflow for immediate updates.

‍

1export default {
2  async scheduled(event, env, ctx) {
3    try {
4      const result = await performSync(env);
5      
6      return new Response(JSON.stringify(result), {
7        headers: { 'Content-Type': 'application/json' }
8      });
9    } catch (error) {
10      console.error('Error syncing Webflow to Algolia:', error);
11      
12      return new Response(JSON.stringify({
13        success: false,
14        error: error.message,
15        timestamp: new Date().toISOString()
16      }), {
17        status: 500,
18        headers: { 'Content-Type': 'application/json' }
19      });
20    }
21  },
22
23  async fetch(request, env, ctx) {
24    const url = new URL(request.url);
25    
26    if (url.pathname === '/webhook' && request.method === 'POST') {
27      const result = await performSync(env);
28      // Return result...
29    }
30    
31    return new Response('Webflow to Algolia sync worker...');
32  }
33};

‍

Both handlers follow the same pattern. They call performSync, handle any errors, and return a JSON response.

The env parameter contains our environment variables, like API keys. Cloudflare injects these securely at runtime, so they're never exposed in your code.

Fetching Data from Webflow

The Webflow helper module handles all interactions with Webflow's API.

The main challenge here is pagination. Webflow returns items in batches of 100, so we need to fetch multiple pages for larger collections.

export async function fetchAllCollectionItems(webflow, collectionId) {
  let allItems = [];
  let offset = 0;
  const limit = 100;
  let hasMore = true;

  while (hasMore) {
    const response = await webflow.collections.items.listItemsLive(collectionId, {
      offset: offset,
      limit: limit
    });

    if (response.items && response.items.length > 0) {
      allItems = allItems.concat(response.items);
      offset += limit;

      if (response.items.length < limit) {
        hasMore = false;
      }
    } else {
      hasMore = false;
    }
  }

  return allItems;
}

Notice we're using listItemsLive instead of the regular list endpoint. This is important, but there's a catch I discovered during testing.

While listItemsLive correctly excludes draft items and items queued for publish, it still returns items that have been published to staging. This means that if you publish something to your staging domain for testing, it appears in Algolia immediately, even though it's not on your production site yet.

This was a surprise and changes how you need to think about your workflow.

The filtering logic deserves special attention:

export function filterPublishedItems(items) {
  return items.filter(item => {
    if (!item.lastPublished) return false;
    if (item.isDraft === true) return false;
    if (item.isArchived === true) return false;
    
    return true;
  });
}

This triple check excludes draft and archived content.

But here's what it doesn't catch: items published only to staging. Those items have a lastPublished timestamp, aren't drafts, and aren't archived. So they pass through this filter and get indexed.

This is by design in Webflow's API, but it might not be what you expect. The API treats staging-published items as "live" items, which is why the status field becomes important for controlling what actually gets searched.

The Status Field Solution

Because Webflow's API sends staging content to Algolia, you need a way to control what actually gets indexed.

Maybe you're testing new content on staging, or you have seasonal items you want to hide temporarily. Without some form of control, your staging tests will appear in production search results.

The solution is a status field in your Webflow CMS. You add a switch field called something like "Include in Search" and configure the Worker to check it. This gives you explicit control over what gets synced, regardless of its publish status in Webflow.

If you want to live fast and loose, you can skip this field entirely. Just know that anything published to staging will appear in search results. For some teams, that's fine. For others, especially those with careful staging workflows, the status field is essential.

You’d add the field name inside your .dev.vars file

export function filterByStatusField(items, statusFieldName) {
  if (!statusFieldName || statusFieldName.trim() === '') {
    return items;
  }

  return items.filter(item => {
    const fieldData = item.fieldData;
    if (!fieldData) return true;
    
    const statusValue = fieldData[statusFieldName];
    
    if (statusValue === undefined || statusValue === null || statusValue === '') {
      return true;
    }
    
    if (statusValue === true) return true;
    if (statusValue === false) return false;
    
    return true;
  });
}

This approach maintains backwards compatibility. If you add the status field later, existing items without the field still get indexed.

Try playing with different field names in your environment variables. You could even have different status fields for different collections if you extend the code.

Transforming Data for Algolia

Algolia needs clean, searchable text. But Webflow's rich text fields contain HTML, inline styles, and sometimes even script tags.

We need to strip all of this to create good search results.

function stripToPlainText(value) {
  if (typeof value !== 'string') return value;
  
  let text = value;
  
  text = text.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '');
  text = text.replace(/<style\b[^<]*(?:(?!<\/style>)<[^<]*)*<\/style>/gi, '');
  
  text = text.replace(/<[^>]*>/g, ' ');
  
  text = text.replace(/&nbsp;/g, ' ');
  text = text.replace(/&amp;/g, '&');
  text = text.replace(/&lt;/g, '<');
  text = text.replace(/&gt;/g, '>');
  text = text.replace(/&quot;/g, '"');
  text = text.replace(/&#039;/g, "'");
  
  text = text.replace(/\s+/g, ' ').trim();
  
  return text;
}

This cleaning process is essential for search quality. Without it, searches might match HTML tag names or CSS properties instead of actual content.

The transformation function then creates Algolia records:

export function transformItemsToAlgoliaRecords(items) {
  return items.map(item => {
    const record = {
      objectID: item.id,
      ...item.fieldData
    };
    
    Object.keys(record).forEach(key => {
      if (typeof record[key] === 'string') {
        record[key] = stripToPlainText(record[key]);
      }
    });
    
    if (item.createdOn) record.createdOn = item.createdOn;
    if (item.updatedOn) record.updatedOn = item.updatedOn;
    if (item.publishedOn) record.publishedOn = item.publishedOn;
    
    return record;
  });
}

The objectID field is crucial. It's Algolia's unique identifier for each record. By using Webflow's item ID, updates replace the correct records rather than creating duplicates.

Orchestrating the Sync

The sync orchestrator brings everything together. It handles the complete flow from fetching collections to updating Algolia indices.

async function syncCollectionToAlgolia(webflow, algoliaClient, collection, statusFieldName) {
  const collectionId = collection.id;
  const collectionName = collection.slug || collection.displayName;
  const algoliaIndexName = collectionName;
  
  console.log(`Syncing collection: ${collection.displayName}`);
  
  const items = await fetchAllCollectionItems(webflow, collectionId);
  console.log(`  Fetched ${items.length} items from collection`);
  
  const publishedItems = filterPublishedItems(items);
  console.log(`  Filtered to ${publishedItems.length} published items`);
  
  const statusFilteredItems = filterByStatusField(publishedItems, statusFieldName);
  if (statusFieldName) {
    console.log(`  Filtered by status field: ${statusFilteredItems.length} items`);
  }
  
  const algoliaRecords = transformItemsToAlgoliaRecords(statusFilteredItems);
  
  await syncToAlgoliaIndex(algoliaClient, algoliaIndexName, algoliaRecords);
  
  return {
    collectionName: collection.displayName,
    indexName: algoliaIndexName,
    itemsSynced: algoliaRecords.length
  };
}

The logging here is intentional and essential. When something goes wrong in production, these logs help you understand precisely where the sync failed.

The main sync function handles multiple collections:

export async function performSync(env) {
  const webflow = new WebflowClient({ accessToken: env.WEBFLOW_API_TOKEN });
  const algoliaClient = algoliasearch(env.ALGOLIA_APP_ID, env.ALGOLIA_ADMIN_KEY);
  
  const collectionsToSync = env.COLLECTIONS_TO_SYNC
    ? env.COLLECTIONS_TO_SYNC.split(',').map(name => name.trim())
    : [];
  
  const allCollections = await fetchAllCollections(webflow, env.WEBFLOW_SITE_ID);
  
  const collections = collectionsToSync.length > 0
    ? allCollections.filter(collection => {
        const slug = collection.slug || collection.displayName;
        return collectionsToSync.includes(slug);
      })
    : allCollections;
  
  const syncResults = [];
  
  for (const collection of collections) {
    try {
      const result = await syncCollectionToAlgolia(
        webflow, 
        algoliaClient, 
        collection, 
        env.STATUS_FIELD
      );
      syncResults.push(result);
    } catch (error) {
      console.error(`Error syncing collection ${collection.displayName}:`, error);
      syncResults.push({
        collectionName: collection.displayName,
        error: error.message
      });
    }
  }
  
  return {
    success: true,
    collectionsProcessed: collections.length,
    totalItemsSynced: syncResults.reduce((sum, r) => sum + (r.itemsSynced || 0), 0),
    results: syncResults,
    timestamp: new Date().toISOString()
  };
}

Notice how errors for individual collections don't stop the entire sync. If one collection fails, others still process.

This resilience is essential for production systems. You don't want one problematic collection to break the search for your entire site.

Configuration and Environment Variables

The Worker uses environment variables for all sensitive configurations. This keeps credentials out of your code and makes it easy to have different settings for development and production.

The wrangle.toml file configures the Worker:

name = "webflow-algolia-sync"
main = "src/index.js"
compatibility_date = "2024-01-01"

[vars]
COLLECTIONS_TO_SYNC = "blog-posts,products"
STATUS_FIELD = "include-in-search"

[triggers]
crons = ["0 0 * * *"]

The cron schedule uses standard cron syntax. The five positions represent minute, hour, day of month, month, and day of week.

Common patterns you might want:

0 0 * * * - Daily at midnight
0 */6 * * * - Every 6 hours
*/30 * * * * - Every 30 minutes

Sensitive values like API keys are added as secrets:

1wrangler secret put WEBFLOW_API_TOKEN
2wrangler secret put WEBFLOW_SITE_ID
3wrangler secret put ALGOLIA_APP_ID
4wrangler secret put ALGOLIA_ADMIN_KEY

These secrets are encrypted and only decrypted when your Worker runs.

Setting Up the Webflow Webhook

While the daily cron keeps everything eventually consistent, you probably want search to update immediately when you publish changes.

That's where Webflow's webhook comes in.

In your Webflow site settings, you add a webhook that fires on "Site Publish" events. The webhook sends a POST request to your Worker's /webhook endpoint. The Worker receives this notification and immediately runs the sync.

Here's the crucial thing I learned: Webflow's webhook fires for all publish events, including publishing to staging. Combined with the fact that listItemsLive returns staging content, this means your staging tests end up in production search immediately.

There's no way to distinguish between staging and production publishes in the API.

This is why the status field becomes so important. Without it, your workflow needs to account for the fact that staging equals searchable. Some teams adapt by only testing in draft mode, never publishing to staging. Others use the status field to maintain control.

Choose what works for your team, but be aware of this behavior.

Local Development and Testing

Cloudflare's Wrangler tool makes local development straightforward.

Start local dev server

wrangler dev

‍

Test the cron trigger

wrangler dev --test-scheduled

‍

Test the webhook

curl -X POST http://localhost:8787/webhook

For development, I recommend using separate Algolia indices. You might prefix them with dev- or use a completely different Algolia app. This prevents test data from appearing in production search results.

Try modifying the sync logic to see what happens. Change the filtering rules. Add console logs to understand the data flow. The local environment is perfect for experimentation.

Production Considerations

Here are the important lessons I've learned.

Rate Limiting

Both Webflow and Algolia have rate limits. Webflow allows 60 requests per minute for most plans. Algolia's limits are more generous but still exist.

The code doesn't currently implement retry logic, so hitting rate limits causes sync failures. For most sites syncing daily, this isn't an issue. But if you have many large collections, you might need to add delays between requests.

Large Collections

Collections with thousands of items take time to sync. Cloudflare Workers have a 30-second CPU time limit for cron triggers. If your sync takes longer, it will be terminated.

The solution is to either sync collections individually using separate Workers or implement incremental syncing that only updates changed items.

Monitoring

You'll want to monitor your syncs to know they're working. Cloudflare provides logs you can check, but I also recommend setting up email alerts for failures.

You can add a simple fetch call to send notifications when syncs fail. This gives you peace of mind that your search stays up to date.

Extending the System

This foundation is intentionally modular and straightforward, making it easy to extend for specific needs.

You should add field mapping to rename Webflow fields for Algolia or select only specific fields to sync. You could implement incremental syncing using Webflow's lastPublished timestamps only to update changed items.

Multiple-environment support is another common extension that syncs to different Algolia indices for staging and production.

The modular structure means these enhancements slot in without rewriting the core logic. Each helper function has a single responsibility, so changes stay isolated.

Lessons Learned

Building this system taught me several valuable lessons about working with these APIs and platforms.

First, Webflow's API behaviour around staging content was the biggest surprise. The listItemsLive endpoint returns items published to staging, not just production. This means "live" in Webflow's API vocabulary means "not draft," not "on production."

Combined with webhooks firing for staging publishes, this creates a situation where staging tests immediately appear in production search. The status field isn't just a nice-to-have feature. For many workflows, it's essential to prevent staging content from appearing in search results.

Second, data cleaning matters more than you might expect. Algolia has size limits for records and I managed to hit some in testing with long rich text articles. The HTML stripping logic might seem aggressive, but it's necessary for performance.

Third, the optional status field pattern allows you to add the feature to existing sites without breaking anything. Items without the field continue to sync normally.

Finally, logging is your friend in production. Detailed logs help you understand what happened without having to reproduce issues. The structured logging approach makes it easy to track each step of the sync process.

Conclusion

This Webflow to Algolia sync system solves a real problem for content-heavy sites. By leveraging Cloudflare Workers, we get a serverless solution that's both cost-effective and reliable.

The modular architecture makes the code maintainable and extensible.

The key to this solution is understanding that each service excels at different things. Webflow manages content with a great editor experience. Algolia provides lightning-fast search with powerful features. Cloudflare Workers glue them together with minimal overhead and maximum reliability.

Whether you're building this for your own site or client projects, this architecture provides a solid foundation. Start with the basic sync, then add features like status fields and webhook triggers as needed.

Monitor your syncs, handle edge cases as they arise, and gradually refine the system for your specific needs.

The complete code is available as a GitHub repository you can fork and customise. Feel free to adapt it for your projects and share what you learn along the way.

‍

Writings

Using Videos Effectively in Webflow (Without Losing Your Mind)

If you’ve ever used Webflow’s native background video component and thought “damn, that looks rough” I'm here for you.

Webflow + Cloudflare reverse proxy. Why and How

As more companies move to Webflow and demand for Webflow Enterprise grows, you’ll see more teams leaning on reverse proxies to solve some of Webflow’s infrastructure limitations.

How (and why) to add keyboard shortcuts to your Webflow site

A small keyboard shortcut can make a marketing site feel faster, more intentional, and “app-like” with almost no extra design or development

Useful GSAP utilities

A practical, code-heavy dive into GSAP’s utility functions—keyframes, pipe, clamp, normalize, and interpolate—and why they’re so much more than just shortcuts for animation math.

Using Functions as Property Values in GSAP (And Why You Probably Should)

GSAP lets you pass _functions_ as property values. I've known this for a while but never really explored it particularly deeply. Over the last couple of weeks I've been testing, experimenting and getting creative with it to deepen my understanding.

Organising JavaScript in Webflow: Exploring Scalable Patterns

Exploring ways to keep JavaScript modular and maintainable in Webflow — from Slater to GitHub to a custom window.functions pattern. A look at what’s worked (and what hasn’t) while building more scalable websites.

Building a Scroll-Based Image Sequencer with GSAP

An exploration in building a scroll-driven image sequence animation using GSAP and HTML5 canvas. Using Avif file compression with the Avif CLI, hosting strategies (Webflow vs AWS), GSAP and the quirks of working with canvas.