withoutBG OSS: Open Source Background Removal Model & Training Pipeline#

Most AI background removers are closed boxes. withoutBG OSS is open. You can see exactly how it is trained, reproduce it, and adapt it for your needs. The pipeline trains two models over three datasets, moving from a coarse matte to a clean, production-ready alpha mask.

High-Level Flow#

The training pipeline consists of three stages: starting with RGB and depth inputs, a MattingModel produces a coarse alpha matte, which is then refined by a RefinerModel trained on in-the-wild data, and finally improved by the same RefinerModel on synthetic data to produce the final high-quality alpha matte.

Step 1: MattingModel (Dataset: Matting)#

Goal: get a coarse alpha matte from RGB plus depth.

Input#

Input Concatenation

Concatenate RGB image with inverse depth to form a 4-channel tensor

1# Inputs
2# I: RGB image, shape (H, W, 3), normalized to [0, 1]
3# D: inverse depth, shape (H, W, 1), normalized to [0, 1]
4X = concat([I, D])  # (H, W, 4)

Target#

Training Target

Supervise with ground truth alpha

1Y = alpha_gt  # 1 channel

Output#

Forward Pass

Predict alpha and clamp to [0, 1]

1delta_alpha = MattingModel(X)
2alpha_pred  = clamp(delta_alpha, 0.0, 1.0)

After training:

Inference

Cache alpha_coarse for later stages

1alpha_coarse = MattingModel.infer(I, D)

Step 2: RefinerModel (Dataset: RefinerInTheWild)#

Goal: sharpen edges and fix coarse matte issues.

Input#

Input and Output

Refiner consumes RGB, depth, and the coarse alpha

1# Inputs
2alpha_coarse = from_step_1
3X = concat([I, D, alpha_coarse])  # (H, W, 5)
4
5# Output
6delta_alpha = RefinerModel(X)
7alpha_pred  = clamp(alpha_coarse.detach() + delta_alpha, 0.0, 1.0)

Target#

  • Use alpha_gt if available
  • Otherwise, rely on self-supervised or consistency losses

Step 3: RefinerModel (Dataset: RefinerSynthetic)#

Goal: learn from perfect synthetic foreground and background pairs to improve transparency and fine edges.

Output and Losses#

Recomposition and Loss

Combine alpha with synthetic F and B, and optimize alpha and compositional losses

1# Same input as in-the-wild stage
2# Synthetic dataset also provides F (foreground), B (background)
3
4delta_alpha = RefinerModel(X)
5alpha_pred  = clamp(alpha_coarse.detach() + delta_alpha, 0.0, 1.0)
6
7# Recompose
8I_recon = alpha_pred * F + (1.0 - alpha_pred) * B
9
10# Losses
11L_alpha = l1(alpha_pred, alpha_gt)
12L_comp  = l1(I_recon, I)
13L_total = lambda_alpha * L_alpha + lambda_comp * L_comp

Data Augmentation#

Crop all tensors in sync to 256x256 and sample crops that cross alpha boundaries more often. This improves edge quality and robustness.

Boundary-Aware Cropping

Random synchronized cropping that prefers boundary crossings

1# Random, synchronized crops that prefer crossing alpha boundaries
2X_crop, Y_crop = RandomCrop(X, Y, size=(256, 256))

Quick Channel Reference#

StageInput ChannelsTarget
Matting[R, G, B, D]alpha_gt
RefinerInTheWild[R, G, B, D, alpha_coarse]alpha_gt or consistency loss
RefinerSynthetic[R, G, B, D, alpha_coarse]alpha_gt plus compositional loss

Why Depth Helps#

Depth is like a free cheat sheet for segmentation. If the model knows what is closer, it can often guess what is the subject. Transparent objects, fine hair, and low-contrast clothing become less of a nightmare.

Reproduce This#

  1. Train MattingModel on a high-quality matting dataset with depth priors
  2. Cache alpha_coarse for your dataset
  3. Train RefinerModel on a mix of in-the-wild and synthetic data
  4. Tune lambda_alpha and lambda_comp to trade off edge sharpness vs global consistency