Vector Search for Vehicle Matching: How Semantic Search Is Changing Used Car Discovery

Traditional search works by matching. You specify criteria — make, model, year, mileage, fuel type — and the system returns everything that satisfies all of them. It is deterministic, predictable, and effective for simple queries. It is also brittle: a car that is almost exactly what you want, but deviates on one field, does not appear in results at all.

Vector search works differently. Instead of matching criteria, it measures similarity. Two vehicles can be "near" each other in meaning even if they do not share identical field values. This article explains how that works, why it matters for vehicle discovery in a dealer context, and what the practical implications are for stock recommendation systems.

Why traditional keyword search fails for vehicle discovery

Consider a dealer who has a strong buy history on Range Rover Sport MHEV, black, 2021 to 2023, below 40,000 miles, Grade 1 or better. They run a search using these exact criteria. They get results — but only results that match all fields exactly. A Defender MHEV with an almost identical spec profile does not appear, because the model field does not match. A car that matches on every dimension except colour does not appear.

The binary nature of keyword matching means you either match or you do not. There is no concept of "close." There is no mechanism for the search to say: "You asked for X, and we do not have X, but here are three things that are very similar to X and might be worth considering."

For dealer buying — where the goal is to find the best available opportunity in the market right now — this means systematically missing relevant stock because the search terms do not perfectly align with the listing data.

What vector embeddings are — and why they are useful for vehicle data

An embedding is a way of representing an object — in this case, a vehicle — as a list of numbers. Specifically, as a point in a high-dimensional mathematical space. The key property of a good embedding is that objects which are similar in meaning end up close together in that space, and objects which are dissimilar end up far apart.

For vehicles, this means that a black Range Rover Sport MHEV 2022 with 35,000 miles and full service history occupies a point in embedding space. A black Defender MHEV 2022 with 38,000 miles and full service history occupies a nearby point — because the two vehicles share most of the attributes that matter for buying decisions. A diesel Vauxhall Astra 2018 with 90,000 miles occupies a very distant point.

Once vehicles are represented as embeddings, finding similar vehicles becomes a matter of measuring distance in that space — a well-defined mathematical operation that can be performed efficiently at scale across tens of thousands of listings.

How vehicle attributes get turned into embedding vectors

The process of encoding a vehicle into an embedding vector involves normalising structured fields — make, model, derivative, year, mileage, fuel type, colour, condition grade — into numerical representations. Categorical fields like colour require mapping to a common representation before encoding ("Santorini Black" and "Narvik Black" should be understood as essentially equivalent).

Richer attributes — full specification lists, feature flags, service history notes — can also be included, giving the embedding a more complete picture of the vehicle. The more information encoded into the vector, the more nuanced the similarity comparisons become.

The resulting vector is typically a list of hundreds or thousands of numbers. No single number has an obvious meaning in isolation. The meaning lives in the relationships between vectors — how close or far one vehicle is from another across all of those dimensions simultaneously.

pgvector and Supabase — the infrastructure behind similarity search at scale

Storing and querying embedding vectors at the scale required for a live stock recommendation system requires specialised infrastructure. This is where pgvector and Supabase become relevant.

pgvector is an extension for PostgreSQL that adds native support for vector storage and similarity search. Instead of treating vehicle embeddings as opaque blobs of data, pgvector allows you to store them as a native column type and run nearest-neighbour queries against them directly in the database: "find me the 20 vehicles most similar to this target vector."

Supabase is a managed platform built on top of PostgreSQL that provides the operational infrastructure — hosting, authentication, real-time capabilities, API generation — that turns a database into a production application. The combination of pgvector for vector operations and Supabase for platform infrastructure gives a vehicle matching system a solid, scalable foundation without requiring bespoke database engineering.

The practical result is that nearest-neighbour queries across a stock database of tens of thousands of vehicles can be performed in milliseconds — fast enough to run in real time as new stock arrives.

Where vector search works well and where rule-based filtering still wins

Vector search is not a replacement for rule-based filtering — it is a complement to it.

Rule-based filtering is best for hard constraints: non-negotiable criteria that must be satisfied regardless of similarity. No outstanding finance. No Category markers. Grade 1 minimum on certain models. Maximum mileage per year ratios. These are binary conditions and should be applied as a hard pre-filter before similarity search runs.

Vector search is best for ranking what remains after hard filters have been applied. Of the cars that pass your non-negotiable criteria, which ones are most similar to the vehicles you consistently choose to buy?

A well-designed system uses both: rules to eliminate the clearly inappropriate, vectors to rank the candidates that survive.

Real-world results — what semantic matching surfaces that keyword search misses

The practical difference shows up most clearly at the edges of a buying brief — in the cars that a keyword search would exclude but that an experienced buyer would recognise as relevant.

A dealer whose buying history is concentrated in Range Rover Sport MHEV stock might, through vector similarity, be surfaced a Defender MHEV with an almost identical spec profile and a comparable price-to-CAP ratio. The model field differs; the commercial profile is nearly identical. A keyword search would never show this car. A similarity search surfaces it because the embedding captures the similarity that the keyword filter cannot.

Similarly, a target spec with a panoramic roof as a key feature might surface vehicles where that specific option is listed under a different name across platforms — "glass panoramic sunroof," "panoramic glazed roof," "electric panoramic roof" — that a keyword match on a single term would miss.

These are not exotic edge cases. They are the kinds of near-misses that characterise keyword search at scale. Over hundreds of buying decisions per month, the cumulative impact of systematically finding relevant stock that keyword search misses is meaningful.

Reco Engine uses pgvector-powered similarity search to match auction and online stock against your portfolio — going beyond keyword filters to surface genuinely relevant cars. Learn more on the founding members page.