Background Functions v1 #400

ericallam · 2023-08-25T16:18:02Z

ericallam
Aug 25, 2023
Maintainer

Currently, Trigger.dev does not host user code and run it, instead it coordinates through an API endpoint hosted on user's machine or deployment. This setup works well for some use-cases, especially ones that mix 3rd party services with local services/databases (think of the proverbial email drip campaign that checks a user's status in the db between emails). But this setup suffers from issues when a single task might take longer than a function execution timeout, as it will never finish and we will never be able to complete the job. Here are some common function timeouts on various serverless platforms:

Vercel (Hobby Plan): 10 seconds ref
Vercel (Team Plan): 60 seconds ref
Vercel (Enterprise Plan): 900 seconds ref
AWS Lambda: 900 seconds ref
Cloudflare Workers: No duration limit but needs to finish within 30 seconds several times per week ref
Deno Cloud: 10s for free and 50s for paid ref

We've gotten feedback from quite a few people who have individual tasks that take longer than some of these function timeouts would allow, so they've wanted the ability to reliably run longer tasks.

Because this is not possible with our current approach, we are proposing introducing the idea of Background Tasks that would run on external infrastructure and be orchestrated by Trigger.dev.

Use cases

Generating a transcript for a 1-2 hour podcast using something like deepgram.
Scraping webpages with Puppeteer
Bulk database updates or backups
Running complex Langchain scripts

Developer Experience

The aspirational DX for this feature should be "as easy as deploying to Vercel", all while writing code in the same repo/codebase as your existing project. It should be fully integrated into Trigger.dev and the Trigger.dev dashboard, allowing task observability and management. It should also ideally work with integrations.

To create a background task, you would create a file named with the pattern <anything here>.background.ts. Inside this file you would export 1 or more tasks:

// tasks.background.ts
import { client } from "./trigger";

export const task1 = client.defineBackgroundTask({
  id: "task-1",
  name: "Task 1",
  version: "1.0.0",
  schema: z.object({
    userName: z.string(),
  }),
  run: async (payload) => {
    // This code will run in the background, as it will be bundled and shipped to a trigger.dev background worker
    await new Promise((resolve) => setTimeout(resolve, 100000));

    return `Task Response for user ${userName}`;
  },
});

And you are able to use and invoke tasks inside of your existing Trigger.dev jobs:

// foobar.ts
import { task1 } from "./tasks.background.ts";

client.defineJob({
  id: "foobar",
  name: "Foobar Job",
  version: "0.0.1",
  trigger: eventTrigger({
    name: "foo.bar",
  }),
  background: {
    task1,
  },
  run: async (payload, io, ctx) => {
    const user = await db.user.findFirst({
      where: {
        id: payload.userId,
      },
    });

    const output = await io.background.task1("🤩", { userName: user.name });
  },
});

The data needed inside the background task is defined in the schema and provided when invoking the background task (all with end-to-end typesafety.)

Limitations and workarounds

Because background tasks would run separately from the user's local or deployed app, they wouldn't be able to access anything outside of the scope of the run function. For example, this wouldn't be possible:

// tasks.background.ts
import { client } from "./trigger";
import { db } from "./db";

export const task1 = client.defineBackgroundTask({
  id: "task-1",
  name: "Task 1",
  version: "1.0.0",
  schema: z.object({
    userId: z.string(),
  }),
  run: async (payload) => {
    // This wouldn't work
    await db.users.findById(payload.id);
  },
});

I propose we add the ability to configure background tasks with secrets to allow for this sort of use-case:

// tasks.background.ts
import { client } from "./trigger";
import { Prisma } from "./db";

export const task1 = client.defineBackgroundTask({
  id: "task-1",
  name: "Task 1",
  version: "1.0.0",
  schema: z.object({
    userId: z.string(),
  }),
  secrets: {
    databaseUrl: process.env.DATABASE_URL,
  },
  run: async (payload, ctx) => {
    const db = new PrismaClient({
      url: ctx.secrets.databaseUrl
    });

    // This would work
    await db.users.findById(payload.id);
  },
});

Development

During development, background tasks would be bundled and run locally when running the @trigger.dev/cli dev command. And invoking background tasks during development would call these locally running tasks.

Deployment && Self-hostability

This is a tricky one, involving running untrusted code in a way that scales and doesn't cause "noisy neighbor" problems. All while being self-hostable while not requiring very complicated (or expensive) production setups. Here are some considered options:

Firecracker

Firecracker is a secure and fast microVMs for serverless computing, powering AWS Lambda and developed by AWS. This would allow highly scalable and fast mutlitenant user code running. Unfortunately, it's extremely expensive to host as it requires bare metal machines (e.g. $2k a month on AWS). It's also very complicated to configure and run securely.

WebAssembly

I wasn't able to find any Web Assembly runtimes that ran Node.js code without any hacks, but possibly more research into this is needed. There are just a ton of Web Assembly runtimes and none of them seemed to do what we need.

AWS Lamdba

Why not just ship user code to a lambda function and call it a day? For one, lambdas can run for a maximum of 15 minutes. Vercel Enterprise users can already get 15 minute function execution. And it would introduce a closed source element to the Trigger.dev project, for not that much benefit.

Fly.io <- what we're currently thinking

There are a lot of positives for choosing Fly.io. They have some very nice Machine APIs for this exact use case, that would allow us to bundle, deploy, and run background task code all through the Fly.io APIs. We would use Fly.io to independently scale and run machines for specific background tasks, based on their usage. We'd get code-isolation for free. It would also allow us to ship this feature (relatively) quickly, to guage interest and work out bugs and usability issues.

The downsides of course are that Fly.io is not open source, and isn't itself self-hostable. But if you were self-hosting Trigger.dev, you could make use of background tasks powered by Fly.io by setting a few environment variables (FLY_API_TOKEN and FLY_API_ORGANIZATION_ID), so it'd be more likely for self-hosters to be able to support Background Tasks. We even already have some developers using Fly.io to self-host Trigger.dev.

Architecture

TBD

Feedback

Please let us know if you have any thoughts on the above feature idea, we'd love to hear from you 👋.

ericallam · 2023-08-26T20:43:23Z

ericallam
Aug 26, 2023
Maintainer Author

Other considerations that will need some more in-depth exploration:

Multi-language support (e.g. python). Sort of like how Vercel supports Python runtime for serverless. Could be very useful for ML tasks
Integration support. It might be nice/required for these background tasks to support integrations, similar to how jobs do. This would allow background tasks to have access to hosted credentials (e.g. OAuth access tokens) and eventually allow them to be run as a user (via Trigger.dev Connect)

0 replies

matt-aitken · 2023-08-27T15:43:23Z

matt-aitken
Aug 27, 2023
Maintainer

The code below that you suggested wouldn't work, should actually work. Assuming that we add an extra step to deal with secrets automatically.

// tasks.background.ts
import { client } from "./trigger";
import { db } from "./db";

export const task1 = client.defineBackgroundTask({
  id: "task-1",
  name: "Task 1",
  version: "1.0.0",
  schema: z.object({
    userId: z.string(),
  }),
  run: async (payload) => {
    // This wouldn't work
    await db.users.findById(payload.id);
  },
});

When this code is bundled (using tsup or similar) it will automatically bundle the db file, which will bundle Prisma, etc. So the code will be available.

This leaves the challenge of secrets. Rather than explicitly having to define them, I think it would be better if we automatically find them all. This can be achieved (for TS/JS) by searching the bundled code for process.env..

Locally we can automatically set the env var values by using the values from the .env file. For deployment, we’d have the list of env vars which we’d display in the UI so values can be set – we'd have to allow them to be unset so we can support optionals.

1 reply

ericallam Aug 27, 2023
Maintainer Author

Very interesting idea, we'll definitely have to explore if this is possible which it does sound like it is.

matt-aitken · 2023-08-27T17:57:24Z

matt-aitken
Aug 27, 2023
Maintainer

Some more thoughts/questions:

Would the run function get replayed, like it does with the current Jobs? It feels conceptually hard for people to understand, so it would be good if we could avoid this I think.
Would retrying work by replaying the function, or given there's no timeout, retries could actually be implemented normally.
What about delays? It seems crazy to have a machine be executing for all that time doing nothing. Obviously replaying can solve this like we do currently.

1 reply

ericallam Aug 27, 2023
Maintainer Author

I think the only way we should make a background task run function work similarly to a Job is if we support subtasks in background tasks. At this point I think we should think of these as "leaf tasks", so no child tasks and no resumability. Retrying would work similarly to retrying a single task. There is no "partially" successful state for background tasks. There would be no delays outside of await new Promise((resolve, reject) => setTimeout(10000, resolve)). Background tasks would have a specific maximum run time (e.g. 60 minutes).

ericallam · 2023-09-14T13:32:31Z

ericallam
Sep 14, 2023
Maintainer Author

Background Task Function Alpha Proposal

To review, our goals with the first release of this feature are:

Get something shipped quickly to evaluate use-cases and iterate on fixes & improvements
Be usable by self-hosters without having to run a large infrastructure project
Provide a really good DX that fits in with the existing Trigger.dev

Our initial idea was to build a multi-tenant offering that would be powered by Fly.io, but Trigger.dev would manage building and deploying docker images for Background ~~Task~~ Functions*, but after some initial development we've run into a few realities:

Building a multi-tenant offering on top of Fly.io is still a lot of work
Self-hosters don't really need multi-tenant support, now and into the future

Alpha proposal

We're proposing a new Alpha version of this feature:

A npx @trigger.dev/cli build command that would build and publish a hostable docker image for each background function called a Background Function Worker
This docker image will then be run by users on whatever platform they wish (more on platforms below)
Trigger.dev would not orchestrate the running of this image, it would be up to the user
Docker image builds would happen in Depot.dev and would be handled by Trigger.dev
- This allows reliable docker images to be produced without platform issues (i.e. building on arm64 macs)
- Self-hosters using this feature would need to signup to Depot.dev and add API keys to the env vars
On the Trigger.dev cloud, function images would be hosted at a Docker Registry at registry.trigger.dev
- For deploying to platforms that don't support registry.trigger.dev, images can be pulled and re-pushed to Docker Hub, etc.
For self-hosters, they could put in their own Docker Registry url and credentials using environment variables

Platform support

Render.com - supports deploying prebuilt images from Docker Hub and GitHub/Gitlab Container Registries (https://render.com/docs/deploy-an-image)
Fly.io - supports deploying from Docker hub and registry.fly.io (https://community.fly.io/t/deploy-locally-built-image/2951)
Railway.app - Does NOT support deploying pre-built images
AWS ECS - supports deploying docker images from pretty much any repository

DX

This is how it would all work in practice:

First, you would define a background task in your code as a default export in a file with .background.ts extension:

// src/functions/function-1.background.ts
import { client } from "@/trigger";
import { z } from "zod";

export default client.defineBackgroundFunction({
  id: "function-1",
  name: "Function 1",
  version: "1.0.2",
  schema: z.object({
    userName: z.string(),
  }),
  run: async (payload) => {
    // This code will run in the background, as it will be bundled and shipped to a trigger.dev background worker
    await new Promise((resolve) => setTimeout(resolve, 100000));

    return {
      username: payload.userName,
      foo: "bar",
      message: `Task Response for user ${payload.userName}`,
    };
  },
});

Then in a Job.run function, you would invoke the function like so:

// src/jobs/examples.ts
import { eventTrigger } from "@trigger.dev/sdk";
import { client } from "@/trigger";
import function1 from "@/functions/function-1.background";

client.defineJob({
  id: "example-job",
  name: "Background Function Usage",
  version: "0.0.1",
  trigger: eventTrigger({
    name: "example.event",
  }),
  run: async (payload, io, ctx) => {
    const output = await function1.invoke("task-1", {
      userName: "ericallam",
    });

    return { output };
  },
});

When the example-job Job runs, a task will be created and shown in the dashboard:

When deploying to production, you would use the build command to build and push Background Function Worker docker images for each background function:

$ npx @trigger.dev/cli functions build
Built [email protected]: registry.trigger.dev/clmgatoar0003dyc44r2kvb0h-function-1:v1.0.2@sha256:91f60ea55704db7162a14ac18387dd31e77963b752a40dd2dfdd058541c25ce4

Environment variables

Because it's not secure to store sensitive data in docker images, users will need to supply any necessary environment variables when they deploy these Background Function Worker images. The only Trigger.dev related environment variable they'll need to include is TRIGGER_API_KEY.

Development

When running locally during development, there is a spectrum of the experience we could provide:

Functions would get immediately invoked and run, instead of queued to be run by a Background Function Worker.
The @trigger.dev/cli dev command could automatically build Background Function Workers and run them on the users machine via the Docker Daemon
The @trigger.dev/cli dev command could automatically build a non-docker version of a Background Function Worker and run them on the users machine via node.js child processes and a custom runtime just for development (that would closely mimic the production one embedded in Background Function Worker images

We're going to rename this feature from Background Tasks to Background Functions, so we don't overuse the "task" concept which already exists in Job runs

0 replies

ericallam · 2023-09-15T10:36:05Z

ericallam
Sep 15, 2023
Maintainer Author

Background Function Library proposal

Before we expose the ability for end-users to develop their own custom Background Functions, we're going to implement a common library of internal background functions that can be used by users without any additional work.

We currently have ad-hoc support for Background Fetch which does a Fetch outside of developers apps and instead performs the fetch on the Trigger.dev platform. Once the fetch response completes (either successfully or after retrying) then the "task" that initiated the background fetch is completed and the run is resumed.

The Background Function Library will move this ad-hoc "Background Fetch" into just 1 of many different background functions that can be referenced and called within Job runs and within integrations. In addition to Background Fetch, we could offer the following functions:

Bulk Image processing
Video/Audio encoding or transcribing
Puppeteer tasks (i.e. taking screenshots)
ML tasks

The Background Function Library would be implemented in a way that would prepare for opening up the ability for users to define their own background functions (see my previous comment for more on that).

This would allow us to work out bugs and the experience of background functions before we open it up to end users. It would also give us the ability to satisfy more users who want to perform certain long-running tasks quicker.

Considerations for self-hosters

This will add an additional docker image to run for self-hosters called trigger.dev-core-functions, that would include the runtime for all the Background Function library. We may also produce a separate docker image for each core function, allowing independent scaling/resources depending on the function (e.g. the ffmpeg function will need difference resource requirements and dependencies than the background fetch function)

We considered not creating another docker image for this, and instead adding the runtime for the Background Function library into the main trigger.dev docker image, but didn't think there would be much benefit to this as it would overload the image and it's tricky to run multi-process docker images and could lead to issues. But this is still something we haven't 100% decided on.

DX

Currently, using background fetch looks like this:

import { eventTrigger } from "@trigger.dev/sdk";
import { client } from "@/trigger";

client.defineJob({
  id: "function-usage-1",
  name: "Background Function Usage",
  version: "0.0.1",
  trigger: eventTrigger({
    name: "example.event",
  }),
  run: async (payload, io, ctx) => {
    const output = await io.backgroundFetch("fetch-1", "https://example.api", {
      method: "POST",
      body: JSON.stringify(payload),
    });

    return { output };
  },
});

In the new version background fetch would be imported and invoked from the @trigger.dev/functions package:

import { eventTrigger } from "@trigger.dev/sdk";
import { client } from "@/trigger";
import { fetchFunction } from "@trigger.dev/functions";

client.defineJob({
  id: "function-usage-1",
  name: "Background Function Usage",
  version: "0.0.1",
  trigger: eventTrigger({
    name: "example.event",
  }),
  run: async (payload, io, ctx) => {
    const output = await fetchFunction.invoke("fetch-1", "https://example.api", {
      method: "POST",
      body: JSON.stringify(payload),
    });

    return { output };
  },
});

Code Structure

The following packages/apps would be added to the monorepo:

@trigger.dev/functions: the library of common background functions
@trigger.dev/functions-worker: orchestrates the running of background function tasks and reporting status/logs/metrics to the Trigger.dev platform
apps/core-functions: a new app/docker image that would use @trigger.dev/functions and @trigger.dev/functions-worker to run the background function tasks

0 replies

visionarylab · 2024-01-01T08:32:18Z

visionarylab
Jan 1, 2024

This feature is what pipedream special. Anyone know any similar product in zapier's space had this feature?

Their Node, Python, Bash & Go runtimes, is extremely feasible and amazing.

0 replies

matt-aitken · 2024-01-17T14:19:21Z

matt-aitken
Jan 17, 2024
Maintainer

I'm going to close this discussion as it's superseded by what we're going to be calling v3.

0 replies

Charlotte-br560 · 2024-03-21T11:36:36Z

Charlotte-br560
Mar 21, 2024

Exciting proposal! Have you considered leveraging Crawlbase for web scraping tasks within your Background Functions? It could seamlessly integrate with your existing setup, offering reliable and scalable web scraping capabilities. Plus, it aligns with your aspiration of an easy-to-use solution while enhancing task diversity. Just a thought!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Background Functions v1 #400

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Background Functions v1 #400

ericallam Aug 25, 2023 Maintainer

Use cases

Developer Experience

Limitations and workarounds

Development

Deployment && Self-hostability

Firecracker

WebAssembly

AWS Lamdba

Fly.io <- what we're currently thinking

Architecture

Feedback

Replies: 8 comments · 2 replies

ericallam Aug 26, 2023 Maintainer Author

matt-aitken Aug 27, 2023 Maintainer

ericallam Aug 27, 2023 Maintainer Author

matt-aitken Aug 27, 2023 Maintainer

ericallam Aug 27, 2023 Maintainer Author

ericallam Sep 14, 2023 Maintainer Author

Background Task Function Alpha Proposal

Alpha proposal

Platform support

DX

Environment variables

Development

ericallam Sep 15, 2023 Maintainer Author

Background Function Library proposal

Considerations for self-hosters

DX

Code Structure

visionarylab Jan 1, 2024

matt-aitken Jan 17, 2024 Maintainer

Charlotte-br560 Mar 21, 2024

ericallam
Aug 25, 2023
Maintainer

Replies: 8 comments 2 replies

ericallam
Aug 26, 2023
Maintainer Author

matt-aitken
Aug 27, 2023
Maintainer

ericallam Aug 27, 2023
Maintainer Author

matt-aitken
Aug 27, 2023
Maintainer

ericallam Aug 27, 2023
Maintainer Author

ericallam
Sep 14, 2023
Maintainer Author

ericallam
Sep 15, 2023
Maintainer Author

visionarylab
Jan 1, 2024

matt-aitken
Jan 17, 2024
Maintainer

Charlotte-br560
Mar 21, 2024