Self-host with Docker#

Run the withoutBG open weights v3 model on your own hardware. Four published images cover the two common deployment shapes: API only or web app + API, in CPU or GPU variants.

All images bundle the ONNX model (v4.1.0), FastAPI inference service, and license metadata. No Hugging Face token or separate model download at runtime.

Pick the right image#

Start here if you are not sure which one to pull.

You want to…Pull this imagePort
Try background removal in a browser on any machineapp-cpu8080
Give non-technical users a drag-and-drop UIapp-cpu or app-gpu8080
Call the API from your own app, script, or workflowservice-cpu or service-gpu8000
Deploy behind a load balancer or in Kubernetesservice-cpu or service-gpu8000
Process images as fast as possible on an NVIDIA GPUservice-gpu or app-gpu8000 / 8080
Run on a VPS, laptop, or CI runner with no GPUservice-cpu or app-cpu8000 / 8080

Default recommendation: pull withoutbg/withoutbg-openweights-v3-app-cpu:latest if you want to see it working in under a minute. Switch to service-* when you are integrating programmatically and do not need the UI.

When Docker is not the best fit:

Quick start#

Web app (most people start here)#

Open http://localhost:8080 after the model warms up.

docker run --rm -p 8080:8080 withoutbg/withoutbg-openweights-v3-app-cpu:latest
Press the copy button to copy this command to your clipboard

First startup loads and warms the model (usually 10–30 seconds on CPU). Wait until /health returns {"status":"ok"} before uploading large batches.

API only#

Headless FastAPI server on port 8000.

docker run --rm -p 8000:8000 withoutbg/withoutbg-openweights-v3-service-cpu:latest
Press the copy button to copy this command to your clipboard

On app images, prefix paths with /api (e.g. /api/v1/remove-background on port 8080).

API examples

# Liveness
curl http://localhost:8000/health

# Readiness (503 until model is warmed up)
curl http://localhost:8000/ready

# Remove background — works on macOS and Linux
IMAGE_B64=$(base64 < photo.jpg | tr -d '\n')
curl -X POST http://localhost:8000/v1/remove-background \
  -H "Content-Type: application/json" \
  -d "{\"image\":\"${IMAGE_B64}\"}"

Request body:

Request schema

{
  "image": "<raw base64 string, or data:image/jpeg;base64,...>"
}

Response fields:

FieldDescription
processedTransparent PNG cutout as a data URL
alphaMatteGrayscale alpha matte as a data URL
latencyMsServer-side inference time in milliseconds

Example response:

Response schema

{
  "processed": "data:image/png;base64,...",
  "alphaMatte": "data:image/png;base64,...",
  "latencyMs": 842
}

Error responses:

StatusMeaning
503Model not ready. Warmup still running; retry after /ready returns 200.
400Invalid or undecodable image payload.

Full schema and try-it-out UI: http://localhost:8000/docs

GPU images#

Requires an NVIDIA GPU and the NVIDIA Container Toolkit.

GPU run commands

# API
docker run --rm --gpus all -p 8000:8000 \
  withoutbg/withoutbg-openweights-v3-service-gpu:latest

# Web app
docker run --rm --gpus all -p 8080:8080 \
  withoutbg/withoutbg-openweights-v3-app-gpu:latest

If the container starts but inference is slow, check logs for the active ONNX Runtime provider. GPU images require CUDAExecutionProvider; they will fail fast at startup if CUDA is unavailable.

The four images#

Docker Hub namespace: withoutbg/withoutbg-openweights-v3-*

Service images (API only)#

Headless FastAPI server. Use these for backends, batch jobs, microservices, and production deployments.

ImageRuntimeTypical use
withoutbg/withoutbg-openweights-v3-service-cpuONNX Runtime CPULaptops, small VPS, CI, dev/staging
withoutbg/withoutbg-openweights-v3-service-gpuONNX Runtime CUDAWorkstations and servers with NVIDIA GPU

Endpoints (direct on port 8000):

PathMethodPurpose
/healthGETLiveness: process is up
/readyGETReadiness: model loaded and warmed up
/v1/remove-backgroundPOSTRemove background
/v1/licensesGETProduct and upstream licenses
/docsGETInteractive OpenAPI / Swagger UI

App images (web UI + API)#

Same inference service, plus a static Next.js UI served by nginx. The API is proxied under /api.

ImageRuntimeTypical use
withoutbg/withoutbg-openweights-v3-app-cpuCPULocal demos, internal tools, small teams
withoutbg/withoutbg-openweights-v3-app-gpuCUDAGPU-accelerated browser workflow

What you get on port 8080:

  • / — drag-and-drop background removal UI
  • /api/v1/remove-background — same API as the service image
  • /health — proxied liveness check

At a glance#

Service vs app#

Service (service-*)App (app-*)
Includes UINoYes (drag-and-drop editor)
Default port80008080
API base path/v1/.../api/v1/...
Image sizeSmaller (no static UI assets)Slightly larger
Best forIntegrations, automation, K8sDemos, internal tools, manual QA

CPU vs GPU#

CPUGPU
Setupdocker run onlyNVIDIA driver + Container Toolkit
Image sizeSmaller (~slim Python base)Larger (CUDA runtime)
CostRuns anywhereNeeds NVIDIA hardware
Best forDev, low volume, edgeBatch processing, high volume, latency-sensitive

Docker Compose#

For local or server deployment with published images:

compose.yaml

services:
  app:
    image: withoutbg/withoutbg-openweights-v3-app-cpu:latest
    ports:
      - "8080:8080"
    restart: unless-stopped

  # Uncomment for API-only deployment:
  # api:
  #   image: withoutbg/withoutbg-openweights-v3-service-cpu:latest
  #   ports:
  #     - "8000:8000"
  #   restart: unless-stopped

  # GPU variant (requires nvidia-container-toolkit):
  # api-gpu:
  #   image: withoutbg/withoutbg-openweights-v3-service-gpu:latest
  #   ports:
  #     - "8000:8000"
  #   deploy:
  #     resources:
  #       reservations:
  #         devices:
  #           - driver: nvidia
  #             count: all
  #             capabilities: [gpu]

Save as compose.yaml, then:

Start compose stack

docker compose up -d

The source repository also ships a docker-compose.yml wired to locally built image names. Use that when developing from source with docker buildx bake.

Image tags#

TagWhen to use
:latestConvenience for local testing and demos
:sha-XXXXXXXPin to an exact build from CI (recommended for production)
:3.x.y Semver pin, published when a v3.* git tag is pushed

For reproducible deployments, prefer a sha-* or semver tag over latest.

System requirements#

Rough guidance. Actual latency depends on image size and hardware.

CPU images#

  • RAM: 4 GB minimum, 8 GB recommended (model + ONNX Runtime + FastAPI)
  • Disk: ~2 GB pulled image size
  • CPU: Any x86_64 or arm64 host that runs Docker; inference is slower on low-core machines
  • GPU: Not required

Expect roughly 2–15 seconds per image on a modern laptop CPU for a typical photo.

GPU images#

  • GPU: NVIDIA GPU with CUDA support (CUDA 12.x runtime in image)
  • Driver: Recent NVIDIA driver compatible with CUDA 12.4
  • RAM: 8 GB system RAM recommended
  • Disk: ~5–8 GB pulled image size (CUDA base layer)

GPU images are significantly faster for repeated inference but have a larger footprint and require NVIDIA Container Toolkit on the host.

Production notes#

Health checks#

  • Use /health for liveness (is the process running?).
  • Use /ready for readiness (is the model loaded?). Returns 503 until warmup completes. Important for Kubernetes readinessProbe so traffic is not routed to a cold container.

Kubernetes probes (service image on port 8000):

Kubernetes probes

readinessProbe:
  httpGet:
    path: /ready
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 5
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  periodSeconds: 10

Scaling#

  • Each container runs a single uvicorn worker with one loaded model session. Scale horizontally by running multiple containers behind a load balancer rather than increasing --workers.
  • Inference is serialized per container (thread lock on the model). For high throughput, run several replicas.

Security#

  • Images run as non-root user withoutbg (uid 1000).
  • No outbound network is required after the image is built. The model is baked in at build time.
  • Put a reverse proxy (nginx, Caddy, Traefik) in front for TLS termination if exposed beyond localhost.

Request limits#

  • App images accept uploads up to 50 MB via nginx (client_max_body_size).
  • Input images are letterboxed to 1024×1024 for inference; output is returned at the original resolution.

Troubleshooting#

SymptomLikely cause and fix
503 on /ready or inferenceModel still warming up (10–30 s on CPU). Poll /ready until it returns 200 before sending traffic.
GPU container exits immediatelyCUDA not available on the host. Install NVIDIA drivers and Container Toolkit, or use a CPU image.
Inference slower than expected on GPUCheck container logs for the ONNX Runtime provider. You want CUDAExecutionProvider, not CPUExecutionProvider.
Port already allocatedChange the host port mapping, e.g. -p 8081:8080.
400 on /v1/remove-backgroundImage payload is not valid base64, or the decoded bytes are not a supported image format (JPEG, PNG, WebP, etc.).

Build from source#

To build locally instead of pulling from Docker Hub:

Build from source

git clone https://github.com/withoutbg/withoutbg-inference.git
cd withoutbg-inference
docker buildx bake -f docker-bake.hcl          # all four images
docker buildx bake -f docker-bake.hcl app-cpu  # single target
docker compose up app-cpu                       # run locally built image

CI publishes to Docker Hub on every push to main and on v3.* release tags.

ResourceLink
Open weights model overview/open-weights-model
Model on Hugging Facewithoutbg/withoutbg-openweights-onnx
Source codewithoutbg/withoutbg-inference
Python package/docs/open-weights-model/python
Open model license/open-weights-model/license
Third-party noticesTHIRD_PARTY_NOTICES.md

Docker Hub repos#