VespaReverse image search
Read the article →

Find the needle
in a million photos.

Drop any image. We turn it into a 768-dimension embedding with DINOv2, then ask Vespa to return the most visually similar photographs from ImageNet-1k.

DINOv2 base768d · bfloat16Vespa HNSWcosine similarity
how a reverse image search runs

The query image

Whatever you uploaded — a selfie, a sunset, a sneaker. We resize its longer side to 224px and feed the raw pixels into the encoder.

PIL.Image → processor(images, return_tensors='pt')

DINOv2 embedding

A self-supervised Vision Transformer turns the image into a single 768-dim vector. We L2-normalize it so cosine and dot-product agree.

model(**inputs).last_hidden_state[:,0]

Vespa nearest-neighbor

Vespa stores every ImageNet vector in a bfloat16 HNSW graph. A single YQL query walks the graph and returns the K closest points by prenormalized-angular distance.

{targetHits:100}nearestNeighbor(embedding, q)

Top-K results

The closest vectors become the nearest images — ranked by similarity. Your query image is the first result if it lives in the index.

rank-profile closeness · first-phase