Assist Module
# Higgsfield – Assist (Chat) module
The **Assist** page on Higgsfield.ai provides a chat interface designed to help users craft prompts and explore the platform’s capabilities. The interface lists suggestions such as “What You Can Do?”, “Generate Soul Image Prompt”, and “Generate Video Prompt”. At the bottom is a message input bar with a ChatGPT 5 Mini icon, indicating that an LLM powers the assistant【40924663090641†screenshot】.
## Purpose
Assist serves as a personal prompt engineer and tutor. It answers questions about using the platform, suggests creative ideas and writes detailed prompts for image and video generation. A user might type “How can I create a soulful portrait?” and receive guidance on selecting the right model, adjusting aspect ratio and writing an evocative prompt.
## Recreating the assistant
To implement a similar assistant using your OpenRouter API key:
1. **Conversation UI** – Build a chat interface with a list of messages and an input box. Add quick suggestion buttons for common queries (e.g. “Generate Soul prompt”, “How to create a video?”). Implement streaming responses so the assistant appears responsive.
2. **Model selection** – Use the OpenRouter API to access a state‑of‑the‑art LLM such as GPT‑4 or Claude 3. Configure the conversation with system instructions outlining the role: a helpful content creator assistant familiar with FAL models and Higgsfield modules. Include instructions to always clarify the user’s intent and propose both descriptive and negative prompts.
3. **Prompt engineering** – When the user asks for help, the assistant should:
* **Clarify the task** – If the request is vague (“I want a cool video”), ask follow‑up questions about mood, characters and duration.
* **Suggest models** – Recommend which FAL model suits the task (e.g. Seedream for portraits, Veo 3 for cinematic videos).
* **Write a detailed prompt** – Provide a ready‑to‑use prompt including style descriptors, camera directions and negative prompts. For example: “Ultra‑realistic portrait of a woman with warm sunset lighting, shallow depth of field, minimal background, avoid blurry or extra limbs.”
* **Explain settings** – Teach users about aspect ratios, inference steps, seeds and when to use upscaling.
4. **Integration with backend** – After generating a prompt, offer a button to run the generation automatically. This button triggers the appropriate module (image or video) with the suggested prompt and settings. For example, when the assistant generates an image prompt, clicking “Generate” calls the `create_image` function with the prompt and default parameters.
## Example conversation flow
```
User: I want to create a product ad for a water bottle.
Assistant: Great! Would you like a photo or a video? Do you have a specific style or mood in mind?
User: A short video, dynamic and fresh.
Assistant: Perfect. I recommend using the Hailuo 02 Fast model for a 3‑second clip. Here’s a detailed prompt:
“A close‑up of a transparent water bottle spinning on a white table while drops of water splash around, bright natural lighting, slow motion, cinematic quality.”
Would you like to generate this now?
User: Yes.
Assistant: [Generates video]
```
This flow demonstrates how the assistant clarifies requirements, suggests models and writes a robust prompt before triggering the generation.
## Tips
* **Context retention** – Maintain conversation history so that later prompts can refer to earlier topics (“Make it similar to my previous portrait”).
* **Safety** – Use content moderation to filter inappropriate requests and inform the user politely.
* **Learning resources** – Provide links or tooltips explaining FAL models, parameter definitions and best practices.
By integrating your OpenRouter LLM with a chat interface and coupling it with FAL’s generation functions, you can recreate the Assist module and offer users a friendly guide to generating high‑quality content【40924663090641†screenshot】.
Create Image
# Higgsfield – Create Image module
The **Create Image** page is the gateway to Higgsfield’s core image‑generation tools. When you open `higgsfield.ai/image/soul` you land on a spacious canvas with options for generating photos from text, uploading reference images and adjusting quality or aspect ratio. A model selector lets you toggle between **Higgsfield Soul**, **WAN 2.2 Image**, **Seedream 4.0**, **Nano Banana**, **Flux Kontext**, **GPT Image** and **Topaz**【386279355400995†screenshot】. A generate button shows the cost in credits, and an **Animate** dropdown links to video‑generation tasks like **Video**, **Speak**, **UGC**, and **Draw to Video**【989823551932474†screenshot】.
This module can be replicated with a combination of an LLM (for prompt engineering) and a text‑to‑image diffusion model served via FAL. The general workflow is:
1. **Accept user input** – Allow the user to write a description or upload an existing picture. Provide controls to adjust the aspect ratio (e.g., square, 4:5 portrait, 16:9 landscape) and quality (HD vs. fast). The UI uses drop‑downs and sliders; in a CLI or API setting these become parameters.
2. **Generate a prompt** – Use your OpenRouter LLM to refine the user’s idea into a concise prompt. For example, a user description like “a futuristic city” can be expanded by the LLM into “a neon‑lit cyberpunk city at dusk, reflections on wet streets, ultra‑realistic, cinematic lighting”. This makes the model output more detailed and consistent.
3. **Choose a model** – Map the Higgsfield models to FAL equivalents:
* **Higgsfield Soul** → `seedream-v4` model (text‑to‑image). It produces high‑quality, soul‑like portraits and scenes.
* **WAN 2.2** → `wan-2` (a fast text‑to‑image engine) – suitable for stylized, anime‑like art.
* **Nano Banana** → `nano-banana` – a small diffusion model optimised for stylised outputs and character consistency.
* **Flux Kontext** → `flux-kontext` – experimental style‑transfer model.
* **GPT Image** → `fal/gpt-image` – uses LLM for prompt understanding before generating.
* **Topaz** (also used for upscaling) → `topaz-upscale` or `remini` – for enhancing resolution.
4. **Call FAL’s API** – Use the [`@fal-ai/client`](https://github.com/fal-ai/falclient) package in Node.js or Python’s `requests`. The FAL API requires your **FAL KEY** in an `Authorization` header. For example, using Node:
```js
import { fal } from '@fal-ai/client';
fal.config({
endpoint: 'https://fal.ai/v1',
key: process.env.FAL_KEY,
});
async function generateImage(prompt) {
const response = await fal.run('seedream/4.0', {
prompt: prompt,
width: 768,
height: 768,
guidance_scale: 7,
num_inference_steps: 30,
seed: Math.floor(Math.random() * 10000),
});
return response; // contains image URL(s)
}
```
Replace `seedream/4.0` with the appropriate model string (e.g. `wan-2`, `nano-banana`) depending on the selection. Adjust `width` and `height` to match the aspect ratio chosen by the user.
5. **Return the image** – The API returns a `result` array with URLs to generated images. Download or display the first result. For high‑resolution results choose a larger `num_inference_steps` (30–50) and adjust `guidance_scale` for more adherence to the prompt.
### Advanced options
* **Image input / image‑to‑image** – Some models (e.g., `seedream/4.0` and `nano-banana`) allow you to pass an `image_url` to perform image‑to‑image transformations. Use this when the user uploads a reference picture. You can also specify a `strength` (0–1) to control how much of the original image should remain.
* **Negative prompt** – Higgsfield hides an advanced setting for negative prompts; FAL accepts a `negative_prompt` field. Use your LLM to produce helpful negative terms such as “low quality, blurry, distorted” to reduce unwanted artefacts.
* **Batch generation** – To create multiple options, call the API with `num_images: n` (if supported) or loop over seeds. Present the user with a gallery to choose from.
### Integrating with a web service
To build a front‑end similar to Higgsfield’s Create Image page:
1. Use a UI framework (React, Vue or plain HTML) to create input fields for prompt, image upload and settings.
2. On submission, send the data to a backend endpoint where you call the FAL API as described above.
3. While awaiting the result, show a spinner or progress bar. FAL responses typically take 5–30 seconds depending on the model and settings.
4. Display the returned images in a grid with buttons to upscale (call `topaz-upscale`) or animate (link to video modules).
By combining prompt engineering via OpenRouter and FAL’s state‑of‑the‑art image models you can re‑create the intuitive and powerful **Create Image** module found on Higgsfield.ai【386279355400995†screenshot】.
Create Video
# Higgsfield – Create Video module
The **Create Video** page on Higgsfield is a powerful interface for generating short clips from text prompts and/or reference images. Users can upload an image as a starting frame, write a scene description, enable an “enhance” toggle, and choose from a list of models. Available models include Higgsfield Lite, Standard and Turbo, Google Veo 3, Kling 2.1 (including a Master version), Minimax Hailuo 02, Seedance Pro, and Wan 2.2, each with different resolutions and durations【151309751194989†screenshot】【68658749472542†screenshot】【174166847387648†screenshot】. Advanced settings allow control over duration (3 s or 5 s), seed and steps.
## Selecting a model
Each model corresponds to a video diffusion or generative model exposed via FAL:
* **Higgsfield Lite / Standard / Turbo** – map to FAL’s `seedance` or `seedance-fast`. Turbo versions prioritise speed over quality.
* **Google Veo 3** – available through FAL as `veo-3` or `veo-3-fast` for the fast variant.
* **Kling 2.1** – accessible via `kling-2.1` for general generation and `kling-2.1-master` for longer durations or higher fidelity.
* **Minimax Hailuo 02** – available as `hailuo-02` and `hailuo-02-fast`.
* **Seedance Pro** – corresponds to an improved `seedance-pro` model.
* **Wan 2.2** – accessible via `wan-2.2` and `wan-2.2-fast`.
Select a model based on desired trade‑offs: Veo 3 produces realistic cinematic shots, Kling excels at action and expressive motion, Hailuo 02 offers strong character consistency, and Wan provides stylised, anime‑like videos.
## Workflow
1. **Gather inputs** – Present fields for the user to:
* **Prompt** – a description of the scene (“hero running through a forest at sunset, slow‑motion”). Use your LLM to refine this into a detailed and cinematic prompt. Include camera movements (“tracking shot”, “close‑up”), lighting and mood.
* **Reference image** – optional. If provided, use it as the starting frame (`frame_0`). This ensures character or environment consistency. Upload the image to storage and get its URL.
* **Duration** – choose between 3 s and 5 s as per the interface【151309751194989†screenshot】.
* **Seed** – optional random seed for reproducibility.
2. **Call the video model** – Example using FAL’s API in Python:
```python
import requests
import os
def generate_video(model_name, prompt, image_url=None, duration=3, seed=None, steps=50):
api_url = f'https://api.fal.ai/v1/{model_name}'
payload = {
'prompt': prompt,
'num_inference_steps': steps,
'frames': int(duration * 15), # 15 FPS assumed; adjust per model
}
if image_url:
payload['frame_0'] = image_url
if seed is not None:
payload['seed'] = seed
headers = {'Authorization': f'Key {os.environ["FAL_KEY"]}'}
response = requests.post(api_url, json=payload, headers=headers)
return response.json()['result'][0] # returns a video URL
```
Replace `model_name` with `veo-3`, `kling-2.1`, `hailuo-02`, etc. Some models may require additional parameters such as aspect ratio or motion intensity. Check FAL’s documentation for specifics.
3. **Enhancement (optional)** – Higgsfield offers an “Enhance” toggle which likely adds post‑processing such as frame interpolation or upscaling. To replicate, after generating the video call a frame interpolation or super‑resolution model (e.g. `video-superres`). You can also convert the generated frames to higher resolution individually using `topaz-upscale` and then reassemble them into a video using ffmpeg.
4. **Return the clip** – The API returns a URL to the video file (usually MP4 or WEBM). Present the video player with controls to download or iterate using a different seed. Provide a loading indicator during generation (which may take 30–90 seconds depending on the model).
## Fine‑tuning and advanced controls
* **Camera controls** – Higgsfield’s Explore section contains camera‑motion categories like “Boom up”, “Zoom in/out”, etc. You can mimic these by adding camera instructions to the prompt (“dolly zoom in on the subject”). Use the LLM to suggest shot types.
* **Action movements** – For dynamic scenes (e.g. explosions, action), choose models like Kling 2.1 or Seedance Pro. Add verbs and descriptors (“bullet time spin”, “dramatic slow‑motion effect”) to the prompt.
* **Seed & steps** – Expose a seed input for reproducibility. Increase `num_inference_steps` for more detailed frames but at the cost of time. Keep seeds consistent to iterate on prompts while preserving motion patterns.
By combining robust prompt engineering via your LLM, careful model selection and FAL’s video generation APIs you can rebuild the Create Video module from Higgsfield with control over duration, quality and style【151309751194989†screenshot】【68658749472542†screenshot】.
Draw to Edit
# Higgsfield – Draw to Edit module
Higgsfield’s **Draw to Edit** tool converts a simple sketch into a fully rendered image. On the website, clicking **Draw to Edit** opens a modal describing the feature: “From sketch to picture in seconds. No prompt needed.” with buttons to upload a sketch or create a blank canvas【987600533325509†screenshot】. The same interface offers a second tab labelled **Draw to Video** for turning sketches into animated clips (see separate file).
This feature can be reproduced using a diffusion model conditioned on an edge map or scribble. FAL exposes such capabilities via **ControlNet** models. Below is a high‑level guide to rebuilding the module.
## Workflow
1. **Canvas for drawing** – Provide a drawing canvas to the user. Use an HTML `<canvas>` element with tools for freehand drawing, erasing and adjusting stroke size. When the user finishes, export the canvas as a PNG.
2. **Optional prompt** – Although Higgsfield claims no prompt is needed, offering an optional text input helps direct the style (e.g. “cartoon style”, “realistic painting”). Use your LLM to enhance this description if provided.
3. **Pre‑process the sketch** – Convert the user’s drawing into a binary or edge map. ControlNet models expect a high‑contrast outline. Use Python’s Pillow or OpenCV to desaturate and threshold the image:
```python
from PIL import Image
import numpy as np
def preprocess_sketch(path):
img = Image.open(path).convert('L') # convert to grayscale
arr = np.array(img)
arr = 255 * (arr > 200).astype(np.uint8) # simple threshold
Image.fromarray(arr).save('edge.png')
return 'edge.png'
```
4. **Call the ControlNet scribble model** – FAL offers `controlnet-scribble` (or `controlnet/canny`) which accepts an `image_url` and a prompt. For example:
```python
import requests
import os
FAL_KEY = os.environ['FAL_KEY']
def generate_from_scribble(edge_url, prompt=''):
url = 'https://api.fal.ai/v1/controlnet/scribble'
headers = {'Authorization': f'Key {FAL_KEY}'}
payload = {
'image_url': edge_url,
'prompt': prompt or 'masterpiece, highly detailed',
'num_inference_steps': 40,
'guidance_scale': 7,
}
response = requests.post(url, json=payload, headers=headers)
return response.json()['result'][0]
```
If the user uploads a sketch rather than drawing, upload it to cloud storage (e.g. AWS S3) and pass the publicly accessible URL to `image_url`.
5. **Post‑process and display** – The API returns a URL to the generated image. Download and display it back to the user. Provide controls to refine (regenerate with different seed) or upscale (call Topaz model) and to animate the result.
## Leveraging the LLM
OpenRouter’s LLM can help generate better prompts for ControlNet. If the user leaves the prompt empty, ask the LLM to guess the intended subject from the sketch (e.g. “sketch of a mountain, snowy peaks”) by describing shapes. You can also ask the LLM to suggest negative prompts to avoid undesirable outputs.
## Advanced settings
* **Strength** – control how much the scribble influences the final image. In FAL models you can set `scribble_strength` (0–1). A higher value keeps the outline more faithfully.
* **Seed** – allow users to set a random seed for reproducibility.
* **Sampler** – advanced users can choose sampling methods (DDIM, Euler, DPM). Expose this through a dropdown.
By combining an HTML canvas, pre‑processing the user’s sketch and calling FAL’s ControlNet scribble model you can rebuild the Draw to Edit module exactly as it appears on Higgsfield【987600533325509†screenshot】.
Draw to Video
# Higgsfield – Draw to Video module
**Draw to Video** extends the Draw to Edit concept to motion pictures. The modal for this tool declares: “Your sketch turns into a cinema in a second. No prompt needed.” with options to upload media or create a blank canvas【636994195977727†screenshot】. Users draw simple lines indicating motion or composition, and the model generates a short clip following those contours.
## Principle
The underlying technique is similar to ControlNet for videos: a diffusion model is conditioned on a sequence of edge maps or scribbles that direct motion. The model interprets the drawn strokes as trajectories for objects and the camera, filling in frames with plausible details.
## Workflow
1. **Sketch acquisition** – Provide a drawing canvas for the user to sketch the layout and key motion paths. Use multiple frames or layers if you wish to capture movement over time. Alternatively, allow the user to upload a series of sketches (one per frame).
2. **Pre‑process sketches** – Convert the drawing into a binary edge map for each frame. For a single static sketch, replicate it across the desired frame count.
3. **Optional prompt** – Although the Higgsfield interface implies that no prompt is required, allowing a text description will greatly improve results. Use your LLM to expand short phrases into cinematic directions (“a dragon flies across the sky, camera pans up”).
4. **Select a video model** – Choose a FAL model capable of video generation conditioned on motion cues. Options include:
* `kling-2.1-draw` – a Kling variant that accepts a base image or sketch and generates a video following drawn lines.
* `veo-3-draw` – Veo 3 variant for sketch inputs.
* `seedance-draw` – for simpler scenes.
5. **Call the API** – Example using FAL (pseudo‑code as endpoints may vary):
```python
import requests
import os
def draw_to_video(model, sketch_url, prompt='', duration=3):
api_url = f'https://api.fal.ai/v1/{model}'
payload = {
'sketch_url': sketch_url,
'prompt': prompt,
'frames': int(duration * 15),
'num_inference_steps': 40,
}
headers = {'Authorization': f'Key {os.environ["FAL_KEY"]}'}
return requests.post(api_url, json=payload, headers=headers).json()['result'][0]
```
Upload the sketch as an image file and obtain its URL. If multiple sketches are used, supply them as `sketch_sequence_urls`.
6. **Post‑processing** – The API returns a video. Optionally run a stabilisation or interpolation step to smooth motion. Use `ffmpeg` to add sound effects or music.
## Tips from Higgsfield’s blog
Higgsfield’s tutorial emphasises that using high‑quality, coherent sketches yields better results. For product shots, draw the product’s outline and motion path; avoid clutter【405327199234345†L69-L124】. The blog also notes that models like Veo 3 and Hailuo 02 support this feature, and the resulting video can be combined with product placement for advertising.
## Combining with prompts
If the user provides a prompt, your LLM should enrich it with cinematic details and align it with the sketch. For example, if the sketch shows an arrow moving left to right, the prompt might be “a car drives from left to right through a neon city, cinematic lighting”. Use negative prompts to exclude unwanted elements.
By converting user drawings into edge maps, enriching them with prompts and calling FAL’s sketch‑conditioned video models you can replicate the Draw to Video module from Higgsfield【636994195977727†screenshot】.
Edit Image
# Higgsfield – Edit Image (Inpaint) module
Within the **Edit** menu Higgsfield offers a tool labelled **Edit Image** or **Soul Inpaint**. Selecting this option opens a canvas with an upload button and a description: “Change outfits, edit details, or restyle your image in no time”【566810496532437†screenshot】. The user draws a mask over the area to modify (e.g. clothing, background) and supplies a new description. Higgsfield uses a model trained on their “Soul” characters to perform inpainting.
To recreate this functionality you can employ FAL’s inpainting models. There are two common approaches:
1. **Diffusion inpainting** – Models such as `controlnet/inpaint` or `stable-diffusion/inpaint` accept an original image, a mask and a prompt describing the desired alteration. They fill the masked region while preserving the unmasked areas.
2. **LoRA‑based character inpaint** – If you need to maintain a specific character’s likeness (as Soul Inpaint does), train a LoRA on the character images and use it as additional conditioning in the inpainting model.
## Workflow
1. **Collect inputs** – The user uploads the original image. Display the image on an HTML canvas and allow them to draw a mask using a brush tool. The mask should be saved as a binary image (white where editing is requested, black elsewhere).
2. **Define the prompt** – Ask the user for a textual description of the desired change. Use your OpenRouter LLM to expand the description. For example, “add a red leather jacket” can be elaborated to “replace existing top with a stylish red leather jacket, shining texture, realistic folds”.
3. **Call the inpainting API** – Example using FAL’s `controlnet/inpaint` model in Python:
```python
import requests
import os
FAL_KEY = os.environ['FAL_KEY']
api_url = 'https://api.fal.ai/v1/controlnet/inpaint'
def inpaint_image(image_url, mask_url, prompt, negative_prompt=''):
headers = {'Authorization': f'Key {FAL_KEY}'}
payload = {
'image_url': image_url,
'mask_url': mask_url,
'prompt': prompt,
'negative_prompt': negative_prompt,
'num_inference_steps': 40,
'guidance_scale': 7
}
response = requests.post(api_url, json=payload, headers=headers)
return response.json()['result'][0]
```
Upload both the original image and the mask to a storage service and supply their URLs. If you have a LoRA checkpoint for the character, include it via the `lora_urls` parameter (see FAL documentation) and set `lora_scale` to control its influence.
4. **Preview and iterate** – Show the returned image to the user. Provide options to adjust the mask, tweak the prompt or random seed and regenerate until satisfied.
5. **Post‑processing** – If the new region doesn’t blend perfectly, you can apply a smoothing filter or run a second pass with a higher `strength` value to integrate the edit more seamlessly.
## Multi‑reference inpainting
Higgsfield also offers a **Multi Reference** variant that allows up to four reference images to improve character consistency【897786170985192†screenshot】. To replicate this, extend the payload with additional images and combine them into a composite prompt. FAL’s inpainting models accept `reference_images` or `influence_images` arrays; each image helps guide the output. When using multiple references, compute embeddings of the face region (via `dlib` or `face_recognition`) and match them to ensure similarity.
## Tips
* Use a high‑resolution base image for better details. After inpainting, call the `upscale` module to enhance sharpness.
* Suggest negative prompts to avoid unwanted results (“poorly drawn hands, blurry, distortion”).
* Keep the mask feathered (soft edges) to blend the edit smoothly.
By integrating an interactive masking interface with FAL’s inpainting API and optional LoRA conditioning, you can reproduce the powerful Edit Image/Soul Inpaint module from Higgsfield【566810496532437†screenshot】.
Fashion Factory
# Higgsfield – Fashion Factory module
**Fashion Factory** is a multi‑step wizard that guides users through producing a cohesive photo set combining a model (Soul ID) and clothing or products. When invoked (via `?modal-fashion-factory=true`), the interface presents steps: **Template**, **Character**, **Generation**, **Clothing**, and **Photo Set**. In the first step, users pick an aesthetic such as “Ultra Bold”, “Blue‑Green Gradient”, “Street Style Paparazzi”, or “Red Room”【245125773438469†screenshot】. Subsequent steps allow selection or creation of a *Soul ID* character, generating base images, dressing them in different outfits and assembling a cohesive set.
This tool combines template‑driven style generation with subject consistency. To reproduce it using FAL models and your OpenRouter LLM, follow these guidelines:
## 1. Select a template
Create a catalogue of aesthetics (e.g. bold studio, minimal gradient, paparazzi street style). For each aesthetic prepare a short style prompt describing lighting, background, colour palette and mood. You can store these in a JSON file. For example:
```json
{
"Ultra Bold": "studio photo, strong coloured lights, high contrast, fashion editorial",
"Blue‑Green Gradient": "photo studio backdrop with blue and green gradient, soft shadows",
"Street Style Paparazzi": "urban street at night, paparazzi flash, candid pose",
"Red Room": "rich red interior, dramatic lighting, glossy floor"
}
```
When the user selects a template, fetch the corresponding style description. Your LLM can enrich this description based on user input (“winter clothes”) and add descriptors like mood or camera angle. Use the final style prompt in all subsequent generation steps to maintain consistency.
## 2. Character (Soul ID)
The original Fashion Factory expects a trained **Soul ID** character to maintain facial consistency across the set. If the user already has a character (e.g. a LoRA or set of reference images), use that as input to the image models. If not, allow them to upload a single portrait. Use FAL’s `image-to-image` features (e.g. `seedream/4.0` with `image_url` and low `strength`) to adapt the portrait to the selected style. This becomes the “base” character image.
For a more robust solution you can fine‑tune a LoRA or DreamBooth model on the uploaded images using FAL’s `fal/fine-tune` service (if available). After training, call the resulting model for all subsequent generations.
## 3. Generation of photo set
For each shot in the photo set:
1. **Define the scene** – Use the template’s style prompt plus specific shot instructions (e.g. “full‑body shot on a chair”, “headshot with product in hand”). Use your LLM to vary angles and poses for diversity while staying on theme.
2. **Call the FAL image model** – Use `seedream/4.0` or `nano-banana` with the style prompt and the character image as `image_url` for consistency. Specify resolution (e.g. 1024×1536 for portrait). Example call:
```python
payload = {
'prompt': f"{style_description}, {shot_description}",
'image_url': character_url, # from previous step
'strength': 0.35, # keep character likeness
'guidance_scale': 7,
'num_inference_steps': 30
}
result = fal.run('seedream/4.0', payload)
```
3. **Dress the model** – If the user wants to showcase clothing, you can inpaint garments onto the base image. After generating the base, detect the clothing region using a segmentation model (e.g. Hugging Face’s `fcn-resnet50`) and call FAL’s `controlnet/inpaint` with a prompt describing the desired outfit (“red evening gown with sequins”). Alternatively, train a LoRA on clothing categories.
4. **Repeat** – Create multiple images with different outfits and poses to form a set.
## 4. Assembling the photo set
Once several images are generated, collate them into a gallery with consistent aspect ratios and colour grading. You can offer cropping and upscaling via `topaz-upscale` to ensure high resolution. Provide download options for the full set.
## Tips and enhancements
* **Consistency** – Use the same random seed or similar seeds across images to maintain facial likeness. Lower the `strength` parameter gradually if outputs drift.
* **Variations** – Provide a slider for variation amount. Higher variation reinterprets the prompt, lower variation stays close to the previous image.
* **Negative prompts** – Use negative prompts like “blurry, low quality, extra limbs” to maintain quality.
* **Use the LLM** – Ask the LLM to suggest creative outfits or accessories that match the brand identity. It can also propose background themes.
By structuring the process into template selection, character preparation and iterative generation you can rebuild Higgsfield’s Fashion Factory. The key is to combine template‑based style prompts with character conditioning to achieve a cohesive photo set【245125773438469†screenshot】.
Image Upscale
# Higgsfield – Image Upscale module
Higgsfield’s **Upscale** feature provides high‑resolution enhancement for images. When selected from the **Edit** menu, the interface opens a panel with scale factors of ×1, ×2, ×4, ×8 and ×16 and an **Upscale** button【10040261587628†screenshot】. The underlying model is labelled *Topaz*, suggesting the integration of the Topaz/Gigapixel technology or a similar super‑resolution model.
To reproduce this module using FAL you can call a dedicated super‑resolution model. FAL hosts models such as `upscale/remini` and `upscale/gigapixel`. Both accept an input image and output a higher resolution version.
## Workflow
1. **Upload or select an image** – Provide a file chooser or accept a URL. Convert the image into a format accepted by the API (e.g. PNG or JPEG). If the user is upscaling a previously generated image, pass its URL directly.
2. **Choose scale factor** – Expose options like ×2, ×4 and ×8. Higher factors yield larger images but require more processing time.
3. **Call the FAL upscaling model** – Use Python’s `requests` (or `@fal-ai/client` in Node) to call the API. Example in Python:
```python
import requests
import os
def upscale_image(image_url, scale=4):
api_url = 'https://api.fal.ai/v1/upscale/remini'
headers = {'Authorization': f'Key {os.environ["FAL_KEY"]}'}
payload = {
'image_url': image_url,
'scale': scale # 2, 4, or 8
}
response = requests.post(api_url, json=payload, headers=headers)
return response.json()['result'][0]
```
Replace `remini` with `gigapixel` if you prefer that model. Both accept `scale` values of 2, 4 or 8. If the user selects ×16 you can perform two successive ×4 upscales.
4. **Deliver the result** – The API returns a URL to the upscaled image. Display it next to the original for comparison. Provide a download option.
## Handling high resolutions
Upscaling to very large sizes (e.g. 4K or 8K) may exceed the API’s limits. Instead, split the image into tiles, upscale each tile individually and then merge them back together. Use a library like Pillow for tiling and merging. Always maintain the aspect ratio to prevent distortion.
## Integrating with other modules
The Upscale tool is often used as a final step after generation or editing. In your pipeline, call the upscaler on the final image before returning it to the user. When using the **Create Image** module, you can add a checkbox “Upscale after generation” to automatically run the upscaler once the diffusion model completes.
By exposing a simple scale selector and calling FAL’s super‑resolution API, you can replicate Higgsfield’s Upscale module and provide crisp, high‑resolution outputs【10040261587628†screenshot】.
Multi Reference
# Higgsfield – Multi Reference module
The **Multi Reference** tool is an advanced editing option under Higgsfield’s **Edit** menu. It allows users to upload up to four reference images to guide an inpainting operation. The interface describes it as: “Bring in up to 4 references to get improved accuracy, perfect poses, or to mix your favourite moments together”【897786170985192†screenshot】. This feature is particularly useful when editing a character consistently across multiple images or combining elements from several sources.
## Understanding the problem
In standard inpainting, only the masked area and a text prompt guide the model. When maintaining the likeness of a specific person or character, a single reference may not capture all angles or expressions. Multi‑reference solves this by feeding multiple images of the subject into the model, allowing it to learn a richer representation.
## Implementation strategy
1. **Collect reference images** – Allow the user to upload up to four images. These should show the subject from different angles or with different expressions. Use a face detector (e.g. `dlib` or `face_recognition`) to verify that the same face appears in each image and crop the head region if necessary.
2. **Embed references** – Compute latent embeddings for each reference image using a pre‑trained model (CLIP or a face embedding network). FAL may accept `reference_images` directly; otherwise, you can pass a concatenated image grid or average the embeddings.
3. **Prepare the base image and mask** – As with the regular inpainting module, the user uploads the image to edit and draws a mask. The mask designates the area to change (e.g. replacing clothing or background).
4. **Compose the prompt** – Use your LLM to convert the user’s description into a detailed prompt. Include guidance such as “maintain the subject’s likeness” and “use the style of the reference images”. If mixing moments, ask the LLM to blend descriptors from each reference (“combine the smile from the first photo with the hairstyle of the second”).
5. **Call the multi‑reference API** – FAL’s generic inpainting endpoints allow additional `reference_images` or `influence_images` arrays. Example:
```python
import requests
import os
def multi_reference_inpaint(image_url, mask_url, references, prompt):
api_url = 'https://api.fal.ai/v1/controlnet/inpaint'
headers = {'Authorization': f'Key {os.environ["FAL_KEY"]}'}
payload = {
'image_url': image_url,
'mask_url': mask_url,
'prompt': prompt,
'reference_images': references, # list of URLs
'num_inference_steps': 50,
'guidance_scale': 8
}
return requests.post(api_url, json=payload, headers=headers).json()['result'][0]
```
When the API does not support multiple references directly, you can merge them into a single image (e.g. create a 2×2 grid) and supply it as one reference. Alternatively, fine‑tune a LoRA on all reference images.
6. **Review and iterate** – Show the edited image to the user. Provide options to adjust the prompt, reorder reference importance or swap out images. A slider can control how strongly references influence the final result.
## Tips
* Ensure all reference images have similar lighting and style to avoid conflicting cues.
* Encourage the user to upload varied yet consistent angles (front, profile, three‑quarter) for more robust results.
* Use face embeddings to weigh each reference by similarity to the target region. Feed a weighted average embedding to the diffusion model to emphasise the most relevant reference.
By combining multiple reference images, careful prompt engineering and FAL’s inpainting capabilities you can replicate the Multi Reference editing experience described on Higgsfield【897786170985192†screenshot】.
Nano Banana Edit
# Higgsfield – Nano Banana Edit module
**Nano Banana Edit** is an editing tool within Higgsfield’s **Edit** menu that allows users to modify images while preserving a character’s style and identity. The description reads: “Maintain one consistent style and character across up to 8 references”【107715456764548†screenshot】. This suggests that the underlying model uses the Nano Banana diffusion architecture along with multiple reference images to ensure stylistic consistency during inpainting.
## Model background
Nano Banana is a compact diffusion model optimised for stylised images and identity preservation. It accepts prompts and reference images to guide generation or editing. FAL exposes endpoints like `nano-banana` for text‑to‑image and `nano-banana/edit` or `nano-banana/inpaint` for editing. These models can ingest multiple reference images (up to eight) to capture a subject’s appearance and style.
## Reproducing the module
1. **Collect reference images** – Allow the user to upload up to eight pictures of the subject. These should reflect the desired appearance and style. Preprocess them by cropping the subject’s face or full body, then upload to a storage service and note the URLs.
2. **Upload the target image and mask** – The user uploads the image to modify and draws a mask around the area to edit (clothing, hair, background, etc.). Save the mask as a binary image.
3. **Craft the prompt** – Combine the user’s description with a style extracted from the references. Use your LLM to summarise the stylistic elements (e.g. “impressionist brush strokes”, “comic‑book shading”) and include them in the prompt. Also list any specific changes (“replace jacket with leather coat”).
4. **Call the Nano Banana editing API** – Example in Python:
```python
import requests
import os
def nano_banana_edit(image_url, mask_url, reference_urls, prompt):
api_url = 'https://api.fal.ai/v1/nano-banana/edit'
headers = {'Authorization': f'Key {os.environ["FAL_KEY"]}'}
payload = {
'image_url': image_url,
'mask_url': mask_url,
'prompt': prompt,
'reference_images': reference_urls, # up to 8
'num_inference_steps': 50,
'guidance_scale': 8,
'style_strength': 0.5 # controls how much the style influences the edit
}
response = requests.post(api_url, json=payload, headers=headers)
return response.json()['result'][0]
```
Adjust `style_strength` to blend between the original image and the reference style. A higher value yields a stronger stylised effect.
5. **Iterate** – Present the result to the user. Allow them to adjust which references influence the edit by toggling them on or off or changing their order. Ask the LLM to suggest alternative prompts or negative prompts if the output isn’t satisfactory.
6. **Final touches** – Use the Upscale module to enhance resolution. If multiple edited images are needed (e.g. for a series), loop over different prompts or seeds while keeping the reference set constant to maintain consistency.
## Tips
* Choose references that are visually cohesive; mixing radically different styles may produce chaotic results.
* If the subject’s identity drifts, increase `style_strength` and include a stronger identity description in the prompt (“the same person as in the reference images”).
* For subtle edits (e.g. changing accessories), use a small mask and a low `guidance_scale` to prevent the entire image from changing.
By leveraging FAL’s Nano Banana editing endpoint with multiple reference images you can replicate the Nano Banana Edit module and achieve consistent, stylised edits across different scenes【107715456764548†screenshot】.
Product Placement
# Higgsfield – Product & Banana Placement modules
Under the **Edit** menu Higgsfield provides two related tools: **Banana Placement** and **Product Placement**. Both involve drawing a mask on an uploaded image and inserting, replacing or removing an object. The Banana Placement card describes the tool as offering “upgraded tools, effortless control, and stunning visuals—outline, tweak, done”【672258844854324†screenshot】. Product Placement invites users to “add products, swap clothes, and fix images — just draw a mask, adjust, and drop your edit”【470109149434081†screenshot】. Essentially, they are interactive inpainting tools tailored to adding new objects.
## Concept
These modules rely on an object insertion workflow. You supply an existing image, draw a mask to indicate where a new object should appear, and describe the object. A diffusion model then synthesises the object within the masked area while blending it seamlessly with the rest of the scene. Higgsfield differentiates between the two names to emphasise improvements (Banana Placement) or general purpose (Product Placement), but the underlying mechanism is similar.
## Implementation steps
1. **Image upload and masking** – Provide a UI for users to upload an image and draw a free‑form mask. This mask delineates the region for the object. Save the mask as a binary PNG. Optionally allow polygon selection for more precise boundaries.
2. **Prompt definition** – Ask the user to describe the object to insert (e.g. “yellow banana on a table” or “white sneakers facing forward”). Use your LLM to elaborate: include size, perspective, lighting, and style to match the original scene (“realistic, consistent lighting, matching shadows”). Include negative prompts (“no blur, no multiple objects”) to avoid mistakes.
3. **Call a diffusion model with inpainting** – Use FAL’s `controlnet/inpaint` or `stable-diffusion/inpaint` models. Provide the original image, mask and prompt. Example:
```python
import requests
import os
def insert_object(image_url, mask_url, prompt):
api_url = 'https://api.fal.ai/v1/controlnet/inpaint'
headers = {'Authorization': f'Key {os.environ["FAL_KEY"]}'}
payload = {
'image_url': image_url,
'mask_url': mask_url,
'prompt': prompt,
'num_inference_steps': 50,
'guidance_scale': 8,
'strength': 0.9 # strong influence on masked region
}
return requests.post(api_url, json=payload, headers=headers).json()['result'][0]
```
4. **Adjust position and scale** – Sometimes the synthesised object may not align perfectly. Provide controls for the user to re‑draw or adjust the mask and regenerate. Alternatively, implement a bounding‑box transformation after generation to reposition the inserted object. FAL does not reposition objects automatically, so careful prompt engineering is necessary (“place the object centred in the marked area”).
5. **Multiple objects and clothing swaps** – For clothing swaps (e.g. changing a shirt), mask the clothing region and describe the new garment. Use the same inpainting call but emphasise seamless integration with body contours. For multiple objects, either generate them together in one call or perform multiple passes with separate masks.
6. **Banana Placement enhancements** – The “Banana” version appears to incorporate improved control and visual guidance. To emulate this, implement interactive preview thumbnails: after generating the object, overlay it semi‑transparently on the original image before final compositing. Provide sliders for rotation and scale to fine‑tune placement. Once confirmed, merge the layers and return the final image.
## Tips
* Use high‑quality source images; low resolution results in blurry insertions.
* Encourage the user to draw a precise mask that matches the object’s outline. A poor mask can cause bleeding or unnatural blending.
* If the inserted object should cast shadows or reflect light, include those details in the prompt (“with natural soft shadow on the table”).
By leveraging FAL’s inpainting models and a user‑friendly masking interface you can replicate the Banana Placement and Product Placement modules from Higgsfield【672258844854324†screenshot】【470109149434081†screenshot】. The key is to marry accurate masking with rich prompt engineering via your OpenRouter LLM to achieve realistic object insertion.
Soul ID Character
# Higgsfield – Soul ID Character module
Higgsfield promotes a **Soul ID** system allowing users to create a persistent digital character. This character can then be used in image and video generation modules to maintain a consistent identity across content. Attempts to access the Soul ID page (`/image/soul-id`) during exploration resulted in an “Oops Something went wrong” message【140959722222085†screenshot】, suggesting that the feature may be restricted or under development. Nevertheless, the concept aligns with the growing trend of training personalised models (e.g. DreamBooth or LoRA) for individual avatars.
## Conceptual overview
* **Persistent identity** – A Soul ID represents your likeness or brand style. Once trained, the model can generate images and videos where you appear in different poses, outfits and settings while retaining recognisable facial features and style.
* **Integration** – After training, the Soul ID can be selected within other modules (Create Image, Create Video, Fashion Factory, Multi Reference) to ensure consistency.
## Building your own Soul ID
To replicate this functionality using FAL’s infrastructure:
1. **Gather training data** – Collect 10–30 high‑quality images of the subject (yourself or a character) showing various angles, expressions and outfits. The images should have consistent lighting and minimal background clutter.
2. **Annotate and preprocess** – Crop images to focus on the subject. Resize them to a consistent size (512×512 or 768×768). Optionally remove backgrounds to help the model focus on the face and clothing.
3. **Fine‑tune a diffusion model** – Use FAL’s fine‑tuning service (if available) or an open‑source DreamBooth/LoRA pipeline to train a custom model on your images. Provide a unique token (e.g. `soulid_name`) as the activation word. The training script might look like:
```bash
# Example using DreamBooth script
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=seedream/4.0 \
--instance_data_dir=/path/to/your/images \
--instance_prompt="a photo of soulid_name person" \
--output_dir=/output/soulid_model \
--resolution=512 \
--train_steps=800
```
FAL’s documentation may provide a simplified API for fine‑tuning: you upload your images and receive a model ID after training.
4. **Use the trained model** – Once training is complete, call the model via FAL by passing prompts that include your activation word. For example:
```python
result = fal.run('soulid_model_id', {
'prompt': 'a portrait of soulid_name wearing a suit, studio lighting',
'num_inference_steps': 30,
})
```
To use it in videos, select the model in the Create Video or Talk modules and ensure your prompts contain the activation word. This will maintain your likeness across frames.
## Considerations
* **Data privacy** – Only use images you own and consent to. Avoid training on sensitive or private photos.
* **Model size and cost** – Fine‑tuning can be resource intensive. FAL may charge credits for training and storing custom models.
* **Ethical use** – Don’t impersonate others without permission. Disclose when you use AI‑generated avatars in public content.
By training a custom diffusion model on your own images and integrating it into your generation pipeline, you can create a Soul ID‑like character that appears consistently across images and videos, recreating the intended functionality of Higgsfield’s Soul ID module even though the original page was unavailable【140959722222085†screenshot】.
Talking Avatar
# Higgsfield – Talking Avatar (Speak) module
Higgsfield’s **Speak** page turns a still image into a talking avatar synchronised to speech. The interface allows the user to upload or select a portrait, provide audio or text for speech, write a prompt to describe tone or expression, and choose a model. Available models include **Kling Speak**, **Higgsfield Speak 2.0**, and **Google Veo 3** (fast and regular). The page displays a preview of the talking head and fields for audio input【426874870666297†screenshot】.
## Core concept
The module performs **audio‑driven facial animation**: given a reference image and audio, generate a video in which the character’s lips and expressions match the speech. Some models also allow text input which is synthesised into speech and lip movements simultaneously.
## Mapping to FAL models
FAL offers endpoints such as `kling-speak`, `veo-3-speech`, and `speak-2.0`. These models require:
* **image_url** – a portrait of the character. Choose a high‑resolution front‑facing shot; the background will be kept or blurred depending on the model.
* **audio_url** or **text** – either supply pre‑recorded audio or a text script. When text is provided, the model uses an internal TTS engine to generate the voice.
* **prompt** – optional guidance about mood, emotion or style (“excited and friendly tone”). The OpenRouter LLM can help craft expressive prompts.
## Workflow
1. **Collect inputs** – Ask the user to upload a portrait image and either an audio file (MP3/WAV) or a text script. Provide a dropdown for selecting the model.
2. **Generate or process audio** – If the user supplies text, call a TTS engine (e.g. via OpenAI’s TTS or FAL’s built‑in speech) to produce a speech audio file. If they upload audio, ensure it is trimmed and normalised.
3. **Call the speaking model** – Example using FAL’s `kling-speak` model in Python:
```python
import requests
import os
def create_talking_avatar(image_url, audio_url=None, text=None, model='kling-speak', prompt=''):
api_url = f'https://api.fal.ai/v1/{model}'
payload = {
'image_url': image_url,
'prompt': prompt,
}
if audio_url:
payload['audio_url'] = audio_url
elif text:
payload['text'] = text # model will synthesise speech
headers = {'Authorization': f'Key {os.environ["FAL_KEY"]}'}
response = requests.post(api_url, json=payload, headers=headers)
return response.json()['result'][0] # returns video URL
```
For other models like `veo-3-speak`, adjust the endpoint accordingly. Some models support language selection or voice style; include these options in the payload if available.
4. **Preview and refine** – Display the generated video to the user. Provide controls to re‑generate with a different emotional prompt or adjust the script. If lip‑sync issues occur, adjust the prompt (“match lip movements precisely”). Use LLM suggestions for more natural expressions.
5. **Combine with other modules** – The speaking avatar can be incorporated into the **UGC Factory** or **Create Video** modules by placing the talking character into a larger scene. To do this, generate the talking head video with transparent background (if supported) and composite it over the scene using ffmpeg or a video editing library.
## Tips
* Choose a portrait where the face is clearly visible and front‑facing for best lip‑sync performance.
* Keep audio lengths within the model’s supported duration (typically up to 30 s). For longer speeches, split into segments and generate separate clips.
* Use the prompt to control facial expressions (“serious and authoritative”) or head movement (“slight nodding”).
By combining audio or text with a reference image and using FAL’s speaking models you can recreate Higgsfield’s talking avatar feature【426874870666297†screenshot】, enabling lifelike video messages from static photos.
UGC Factory
# Higgsfield – UGC Factory module
**UGC Factory** is a multi‑step video creation wizard designed to produce user‑generated content (UGC) style videos. Visiting `?ugc-studio=true` opens a panel with steps: **Template**, **Image**, **Action**, **Audio text**, **Audio settings**, and **Background**【54652604494918†screenshot】. Templates are categorised (GENERAL, SELFIE, SELLING, ASMR, PODCAST, CAR TALKING, etc.) and determine camera motion and framing. This module is ideal for creating social‑media ready clips featuring a talking avatar interacting with a product or performing an action.
## Deconstructing the wizard
1. **Template selection** – The user chooses a template that defines shot composition (e.g. selfie close‑up, product reveal, ASMR). Each template implies a default motion path (hand‑held look, slow zoom) and a recommended video model. For example, some templates pair with `Google Veo 3 Fast` or `Minimax Hailuo 02`【54652604494918†screenshot】. In your implementation, create a JSON mapping of template names to model identifiers and motion presets.
2. **Image / product upload** – The next step prompts the user to upload a product or character image. This image becomes the subject of the video. Use FAL’s speaking or video models with an `image_url` parameter to place the subject in the scene. For product shots, ensure the image has a transparent background or mask out the background.
3. **Action definition** – The user describes the action (“man pulls out a bottle”, “woman applies cream to her face”). Use your OpenRouter LLM to rewrite this into a cinematic prompt and to infer the necessary camera and actor movements. Combine this with the template’s preset motion.
4. **Audio text and settings** – Users can either write a script for narration or choose to generate voice automatically. Use a TTS model to convert text to speech. Offer settings for voice gender, tone and language. Some templates (e.g. ASMR) may require whispering or soft tones.
5. **Background and environment** – Finally, the user selects or describes the background. Provide options for plain colours, realistic rooms or abstract environments. Use the LLM to align the background description with the product and template style.
## Generating the video
Once all fields are collected:
1. **Compose the prompt** – Combine the template’s style, action description, background and any additional notes into a single prompt. For example: “a first‑person selfie video of a young woman applying face cream in a bright bathroom, gentle camera shake, warm lighting, soft focus”. Include voice instructions (“speak in a calm and friendly tone”).
2. **Select the model** – Use the mapping from template to model to choose between `veo-3-fast`, `hailuo-02-fast`, `seedance-pro` or `kling-2.1`. Fast models are ideal for UGC due to shorter durations (3–5 s)【54652604494918†screenshot】.
3. **Prepare assets** – If using a talking avatar, first generate a speaking head with the TTS script (see the **Talking Avatar** module). Then composite it into the UGC scene. For product inserts, use the **Product Placement** or **Banana Placement** workflow to integrate the product image into the generated frames【405327199234345†L69-L124】.
4. **Call the video API** – Use FAL’s video model endpoint similar to the **Create Video** module, but pass additional parameters for `voice_audio_url` or `frame_0` (the product image). Example:
```python
payload = {
'prompt': combined_prompt,
'frame_0': product_image_url,
'voice_audio_url': narration_audio_url,
'frames': 45, # 3 s at 15 fps
'num_inference_steps': 40
}
result = fal.run('veo-3-fast', payload)
```
5. **Preview and refine** – Present the generated video. Allow the user to adjust the action or script and regenerate. Provide options to change template or background without starting over.
## Additional considerations
* **Aspect ratios** – Social media platforms favour portrait (9:16) or square (1:1) videos. Ensure the selected model supports the chosen aspect ratio or crop the output accordingly.
* **Duration** – Limit videos to 3–5 seconds as per template guidelines. Longer durations require more computation and may reduce realism.
* **Voice synchronisation** – When combining speaking avatars with product actions, align the audio track with visual cues (e.g. the product appears as it is mentioned). Use a timeline editor or script your prompts precisely.
By orchestrating template‑based prompts, product insertion and lip‑synced narration using FAL’s video and speech models, you can rebuild the UGC Factory wizard found on Higgsfield. The step‑by‑step approach ensures novices can craft professional‑looking UGC videos【54652604494918†screenshot】【405327199234345†L69-L124】.
Video Upscale
# Higgsfield – Video Upscale module
Although not fully explored during the site walkthrough, Higgsfield advertises a **Video Upscale** feature alongside its image and video generation tools. The concept is to enhance the resolution and clarity of generated or uploaded clips. While the interface resembles the image Upscale panel (offering scale factors like ×2 or ×4), the underlying process involves applying super‑resolution to each frame and reconstructing the video.
## Approach
To replicate video upscaling using FAL and open‑source tooling:
1. **Split the video into frames** – Use `ffmpeg` to extract individual frames from the original video. For example:
```bash
ffmpeg -i input.mp4 -qscale:v 1 frames/frame_%05d.png
```
This writes high‑quality PNGs into a `frames` directory.
2. **Upscale each frame** – Call a FAL super‑resolution model on each frame. Reuse the same technique described in the **Image Upscale** module, but automate it in a loop:
```python
import glob
from multiprocessing.pool import ThreadPool
from your_upscale_function import upscale_image # from image_upscale.md
def process_frame(path):
url = upload_to_storage(path) # upload frame to S3 or similar
upscaled_url = upscale_image(url, scale=4) # calls FAL
download(upscaled_url, path.replace('frames', 'frames_upscaled'))
paths = sorted(glob.glob('frames/*.png'))
ThreadPool(4).map(process_frame, paths)
```
Adjust the number of worker threads based on your API rate limits.
3. **Reassemble the video** – Use `ffmpeg` to stitch the upscaled frames back into a video. Ensure the frame rate matches the original (e.g. 15 fps):
```bash
ffmpeg -framerate 15 -i frames_upscaled/frame_%05d.png -c:v libx264 -pix_fmt yuv420p upscaled.mp4
```
4. **(Optional) Audio handling** – Extract the original audio track with `ffmpeg -i input.mp4 -vn -acodec copy audio.aac` and merge it with the upscaled video: `ffmpeg -i upscaled.mp4 -i audio.aac -c:v copy -c:a aac -strict experimental final_output.mp4`.
## Model selection
FAL offers video‑specific super‑resolution models (e.g. `video-superres`) that process short clips directly. If available, you can skip splitting frames and call:
```python
payload = {'video_url': input_video_url, 'scale': 4}
result = fal.run('video-superres', payload)
```
where `input_video_url` is a publicly accessible link to your video. The response will contain a URL to the upscaled video. Check FAL’s documentation for supported scales and durations.
## Integration with generation workflows
* When using the **Create Video** module, add an “Upscale after generation” toggle. If enabled, automatically run the upscaling pipeline after receiving the generated video.
* For talking avatars produced by the **Speak** module, upscaling can improve facial details and reduce compression artefacts.
While the exact Higgsfield interface wasn’t explored, the above steps provide a practical blueprint for implementing a Video Upscale feature using FAL and standard video processing tools.