A Node.js-based agent that generates high-quality images and short video clips based on various prompts and modes (text2image, image2image, text2video). Integrated with the Nevermined Payments API for seamless task management, event handling, and billing. Supports both “dummy mode” for testing and real-world generation via APIs like Fal.ai (image generation) and PiAPI (video generation).
This Video Generator Agent is part of a larger ecosystem of AI-driven media creation. For a complete view of how multiple agents work together, see:
-
Music Video Orchestrator Agent
- Coordinates end-to-end workflows: collects user prompts, splits them into tasks, pays agents in multiple tokens, merges final output.
-
- Produces lyrics, titles, and final audio tracks using LangChain + OpenAI and a chosen music generation API.
-
- Generates cinematic scripts, extracts scene info, identifies settings and characters, producing prompts for video generation.
Workflow Example:
[ User Prompt ] --> [Music Orchestrator] --> [Song Generation] --> [Script Generation] --> [Image/Video Generation] --> [Final Compilation]
- Features
- Prerequisites
- Installation
- Environment Variables
- Project Structure
- Architecture & Workflow
- Usage
- Detailed Guide: Creating & Managing Tasks
- Configuration Files
- License
-
Nevermined Integration:
Subscribes tostep-updated
events for the agent’s DID, handles steps automatically, and updates statuses (Pending → Completed → Failed). -
Multiple Generation Modes:
- text2image: Generate images from a pure text prompt.
- image2image: Transform or enhance an existing image based on a text prompt.
- text2video: Generate short video clips (5 or 10 seconds) from text prompts, with optional reference images.
-
Configurable:
- Dummy Mode: Mocks generation with random wait times and error probabilities.
- Production Mode: Uses Fal.ai or PiAPI to genuinely create images/videos.
-
Step-Based Billing:
Each generation mode deducts a certain “cost” from the plan (e.g., 1 credit for images, 5 for videos). -
Structured Logging:
Usespino
for local logs and updates steps in Nevermined with errors or success messages.
- Node.js (>= 18.0.0 recommended)
- NPM (or Yarn) for package management
- Nevermined credentials (API key, environment,
AGENT_DID
) - For real image/video generation:
- Fal.ai credentials (
FAL_KEY
) - PiAPI credentials (
PIAPI_KEY
)
- Fal.ai credentials (
-
Clone the repository:
git clone https://github.com/nevermined-io/video-generator-agent.git cd video-generator-agent
-
Install dependencies:
npm install
-
Optional: Build for production:
npm run build
Rename .env.example
to .env
and configure it:
NVM_API_KEY=your_nevermined_api_key
NVM_ENVIRONMENT=testing
AGENT_DID=did:nv:your_agent_did
PIAPI_KEY=your_piapi_key
FAL_KEY=your_fal_key
IS_DUMMY=false
NVM_API_KEY
/NVM_ENVIRONMENT
: Connect to Nevermined.AGENT_DID
: This agent’s DID.PIAPI_KEY
: PiAPI token for text2video (if not in dummy mode).FAL_KEY
: Fal.ai token for text2image and image2image (if not in dummy mode).IS_DUMMY
: Set totrue
for dummy mode (returns static URLs after random delay).
video-generator-agent/
├── main.ts # Main entry: subscribes to steps & routes generation tasks
├── tools.ts # Handlers for each generation mode (text2image, image2image, text2video)
├── videoGeneration.ts # PiAPI-based text2video logic (Kling.ai wrapper API)
├── imageGeneration.ts # Fal.ai-based text2image & image2image logic (Flux wrapper API)
├── logger/
│ └── logger.ts # Logging via pino
├── config/
│ └── env.ts # Environment variables
├── package.json
├── tsconfig.json
└── README.md
-
main.ts
:- Initializes the Nevermined
Payments
instance. - Subscribes to
step-updated
events targetingAGENT_DID
. - Based on
inference_type
instep.input_params
, calls the right handler (text2image
,image2image
, ortext2video
).
- Initializes the Nevermined
-
tools.ts
:- Defines handler functions (
handleText2image
,handleImage2image
,handleText2video
) that either call real generation functions or dummy logic depending onIS_DUMMY
.
- Defines handler functions (
-
videoGeneration.ts
:- Implements real text2video generation using PiAPI: creating a task, polling status, retrieving the final video URL.
-
imageGeneration.ts
:- Implements real text2image / image2image calls using Fal.ai.
-
logger/logger.ts
:pino
logger with a “pretty” transport for human-readable logs.
-
Task Reception:
-
Another agent (e.g., an orchestrator) triggers a step for
AGENT_DID
. The step typically includes aninference_type
ininput_params
:[ { "inference_type": "text2image", "some_other_param": "value" } ]
-
-
Main Handler (
run(data: any)
)- In
main.ts
,run
is called whenever a step is updated toPending
. - We parse the step, read the
inference_type
, and delegate to a matching function intools.ts
.
- In
-
Generation
- If in dummy mode, we simulate generation with random waits and 10% chance of failure, for testing and debugging.
- If in production mode, we call the appropriate method:
text2image(...)
orimage2image(...)
from Fal.aitext2video(...)
from PiAPI
-
Step Update
- If generation succeeds, we mark the step as
Completed
, store the generated URL inoutput_artifacts
, and optionally set acost
(e.g., 1 for images, 5 for videos). - If it fails, we mark the step as
Failed
with the error message.
- If generation succeeds, we mark the step as
npm start
- The agent logs into Nevermined using
NVM_API_KEY
/NVM_ENVIRONMENT
. - Subscribes to
step-updated
events forAGENT_DID
. - Waits for tasks. Whenever a step with
Pending
status appears (withinference_type
ininput_params
), it processes it and updates the step accordingly.
In main.ts
, we see:
await payments.query.subscribe(run, {
joinAccountRoom: false,
joinAgentRooms: [AGENT_DID],
subscribeEventTypes: ["step-updated"],
getPendingEventsOnSubscribe: false,
});
This ensures the agent reacts whenever a step for AGENT_DID
is updated (usually set to Pending
by the orchestrator).
When run(data)
is called:
const step = await payments.query.getStep(eventData.step_id);
if (step.step_status !== AgentExecutionStatus.Pending) return;
const [{ inference_type, ...inputs }] = JSON.parse(step.input_params);
- Check the step is
Pending
. - Extract the
inference_type
. - Route to
handleText2image
,handleImage2image
, orhandleText2video
intools.ts
.
In dummy mode, we call text2imageDummy
; otherwise, we call text2image
from imageGeneration.ts
. For videos, we similarly choose text2videoDummy
or text2video
from videoGeneration.ts
.
// tools.ts
export async function handleText2image(_inputs: any, step: any, payments: any) {
try {
if (IS_DUMMY) {
return await text2imageDummy(step.input_query);
} else {
return await text2image(step.input_query);
}
} catch (error) {
await payments.query.updateStep(..., { step_status: Failed });
throw error;
}
}
If successful, we mark the step as Completed
:
await payments.query.updateStep(step.did, {
...step,
step_status: AgentExecutionStatus.Completed,
output: "Generation completed successfully.",
output_artifacts: [outputUrl],
});
If it fails, we mark the step as Failed
, including an error message in output
.
Loads environment variables like NVM_API_KEY
, PIAPI_KEY
, FAL_KEY
, IS_DUMMY
, etc.
import dotenv from "dotenv";
dotenv.config();
export const NVM_API_KEY = process.env.NVM_API_KEY!;
export const NVM_ENVIRONMENT = process.env.NVM_ENVIRONMENT || "testing";
export const AGENT_DID = process.env.AGENT_DID!;
export const FAL_KEY = process.env.FAL_KEY!;
export const PIAPI_KEY = process.env.PIAPI_KEY!;
export const IS_DUMMY = process.env.IS_DUMMY === "true";
A simple pino setup, possibly with pino-pretty
.
import pino from "pino";
import pretty from "pino-pretty";
export const logger = pino(pretty({ sync: true }));
Apache License 2.0
(C) 2025 Nevermined AG
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions
and limitations under the License.