withoutBG OSS: Open Source Background Removal Model & Training Pipeline#
Most AI background removers are closed boxes. withoutBG OSS is open. You can see exactly how it is trained, reproduce it, and adapt it for your needs. The pipeline trains two models over three datasets, moving from a coarse matte to a clean, production-ready alpha mask.
High-Level Flow#
The training pipeline consists of three stages: starting with RGB and depth inputs, a MattingModel produces a coarse alpha matte, which is then refined by a RefinerModel trained on in-the-wild data, and finally improved by the same RefinerModel on synthetic data to produce the final high-quality alpha matte.
Step 1: MattingModel (Dataset: Matting)#
Goal: get a coarse alpha matte from RGB plus depth.
Input#
Input Concatenation
Concatenate RGB image with inverse depth to form a 4-channel tensor
1# Inputs
2# I: RGB image, shape (H, W, 3), normalized to [0, 1]
3# D: inverse depth, shape (H, W, 1), normalized to [0, 1]
4X = concat([I, D]) # (H, W, 4)Target#
Training Target
Supervise with ground truth alpha
1Y = alpha_gt # 1 channelOutput#
Forward Pass
Predict alpha and clamp to [0, 1]
1delta_alpha = MattingModel(X)
2alpha_pred = clamp(delta_alpha, 0.0, 1.0)After training:
Inference
Cache alpha_coarse for later stages
1alpha_coarse = MattingModel.infer(I, D)Step 2: RefinerModel (Dataset: RefinerInTheWild)#
Goal: sharpen edges and fix coarse matte issues.
Input#
Input and Output
1# Inputs
2alpha_coarse = from_step_1
3X = concat([I, D, alpha_coarse]) # (H, W, 5)
4
5# Output
6delta_alpha = RefinerModel(X)
7alpha_pred = clamp(alpha_coarse.detach() + delta_alpha, 0.0, 1.0)Target#
- Use alpha_gt if available
- Otherwise, rely on self-supervised or consistency losses
Step 3: RefinerModel (Dataset: RefinerSynthetic)#
Goal: learn from perfect synthetic foreground and background pairs to improve transparency and fine edges.
Output and Losses#
Recomposition and Loss
Combine alpha with synthetic F and B, and optimize alpha and compositional losses
1# Same input as in-the-wild stage
2# Synthetic dataset also provides F (foreground), B (background)
3
4delta_alpha = RefinerModel(X)
5alpha_pred = clamp(alpha_coarse.detach() + delta_alpha, 0.0, 1.0)
6
7# Recompose
8I_recon = alpha_pred * F + (1.0 - alpha_pred) * B
9
10# Losses
11L_alpha = l1(alpha_pred, alpha_gt)
12L_comp = l1(I_recon, I)
13L_total = lambda_alpha * L_alpha + lambda_comp * L_compData Augmentation#
Crop all tensors in sync to 256x256 and sample crops that cross alpha boundaries more often. This improves edge quality and robustness.
Boundary-Aware Cropping
Random synchronized cropping that prefers boundary crossings
1# Random, synchronized crops that prefer crossing alpha boundaries
2X_crop, Y_crop = RandomCrop(X, Y, size=(256, 256))Quick Channel Reference#
| Stage | Input Channels | Target |
|---|---|---|
| Matting | [R, G, B, D] | alpha_gt |
| RefinerInTheWild | [R, G, B, D, alpha_coarse] | alpha_gt or consistency loss |
| RefinerSynthetic | [R, G, B, D, alpha_coarse] | alpha_gt plus compositional loss |
Why Depth Helps#
Depth is like a free cheat sheet for segmentation. If the model knows what is closer, it can often guess what is the subject. Transparent objects, fine hair, and low-contrast clothing become less of a nightmare.
Reproduce This#
- Train MattingModel on a high-quality matting dataset with depth priors
- Cache alpha_coarse for your dataset
- Train RefinerModel on a mix of in-the-wild and synthetic data
- Tune lambda_alpha and lambda_comp to trade off edge sharpness vs global consistency