intro
Let’s learn everything you need to know about pgvectorscale, the extension that complements pgvector for managing vector data in PostgreSQL.
Today, vectors are everywhere, and PostgreSQL is among the databases that support them thanks to the pgvector extension. The problem is that pgvector by itself does not always deliver the highest level of performance. That is exactly why pgvectorscale was created as a complementary extension!
In this complete guide, you will learn what pgvectorscale is, how to install it, the advantages it offers, and see some practical examples of how to use it.
Let’s dive in!
What is pgvectorscale?
pgvectorscale is an open-source extension for PostgreSQL that builds on top of pgvector. Its end goal is to deliver higher-performance embedding search and more cost-efficient storage for AI applications.
More specifically, pgvectorscale complements the pgvector extension by introducing the following innovations:
For more information, refer to the official presentation blog post.
Note that pgvectorscale must be used in a PostgreSQL setup where pgvector is already installed. If you are not familiar with the latter extension, read our guide on pgvector. Unlike pgvector, which is written in C, pgvectorscale leverages Rust through the PGRX framework.
In short, pgvector enables storing and querying vector embeddings directly within PostgreSQL.
How and When to Use pgvectorscale
In the next two subchapters, we will explore how to get started with pgvectorscale and why you should consider using it.
Installation Procedure
pgvectorscale is developed by TigerData, a company that creates extensions to make PostgreSQL a valid open-source alternative to Pinecone for AI data storing. TigerData is also the team behind TimescaleDB, an open-source time-series database built on top of PostgreSQL.
That is the reason why the official installation instructions for pgvectorscale primarily focus on TimescaleDB. However, we will also cover how to install it on plain PostgreSQL.
To install pgvectorscale with TimescaleDB, refer to these official guides:
If you want to install pgvectorscale on vanilla PostgreSQL, start by installing it from the source with this Bash script:
1
# install Rust
2
curl --proto '=https' --tlsv1.2 -sSf <https://sh.rustup.rs> | sh
3
4
# download pgvectorscale
5
cd /tmp
6
git clone --branch <version> <https://github.com/timescale/pgvectorscale>
7
cd pgvectorscale/pgvectorscale
8
# install cargo-pgrx with the same version as pgrx
9
cargo install --locked cargo-pgrx --version $(cargo metadata --format-version 1 | jq -r '.packages[] | select(.name == "pgrx") | .version')
10
cargo pgrx init --pg17 pg_config
11
# build and install pgvectorscale
12
cargo pgrx install --release
Now, register the extension in your Postgres server by connecting to your server using a PostgreSQL database client like DbVisualizer and running this CREATE EXTENSION query:
1
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;

Note: The CASCADE option ensures that pgvector (which pgvectorscale depends on) is installed automatically if it is not already present.
Great! You just installed pgvectorscale in PostgreSQL.
Main Reasons to Use pgvectorscale
pgvectorscale brings faster and more cost-effective vector search capabilities directly into PostgreSQL, improving a lot over pgvector. This can make a PostgreSQL setup a strong alternative to managed services like Pinecone, achieving better performance at a fraction of the cost.
In particular, pgvectorscale is perfect for building AI-driven applications that rely on efficient similarity search across large datasets. That is true when you want full control over your database stack without paying the high fees that come with third-party vector databases.
For more guidance, follow the official tutorial that shows how to use it to build a fast, AI-powered recommendation system in PostgreSQL.
pgvectorscale: Main Features and Capabilities
Discover the main features, options, and innovations introduced by pgvectorscale.
StreamingDiskANN Indexing in pgvectorscale
First, start by creating a table with an embedding column, as below:
1
CREATE TABLE IF NOT EXISTS document_embedding (
2
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
3
metadata JSONB,
4
contents TEXT,
5
embedding VECTOR(1536)
6
);
Note that the embedding column must be of type VECTOR, a data type introduced by the pgvector extension.
You now have to populate the embedding column using an AI or ML pipeline. Find out more about this process in our article on vector databases.
To enable fast similarity search, create a StreamingDiskANN index on the embedding column:
1
CREATE INDEX document_embedding_idx
2
ON document_embedding
3
USING diskann (embedding vector_cosine_ops);
Note the USING diskann (embedding vector_cosine_ops) syntax. USING diskann tells Postgres to build the index using the StreamingDiskANN method provided by pgvectorscale, instead of a traditional B-tree or GiST index.
Then, (embedding vector_cosine_ops) defines the column and the operator class to use, which is vector_cosine_ops in this case. vector_cosine_ops is an operator class provided by pgvectorscale to compute cosine distance for nearest neighbor search.
To retrieve the top 10 closest vectors, you can now write:
1
SELECT *
2
FROM document_embedding
3
ORDER BY embedding <=> $1
4
LIMIT 10;

Note: This syntax mirrors pgvector . pgvectorscale supports cosine (<=>), L2 (<->), and inner product (<#>) distances.
Filtered Vector Search with Labels and Metadata
One of pgvectorscale’s standout innovations is filtered vector search. This features combines similarity search with structured filters.
There are two main types of filtering you can combine in a single query when using pgvectorscale:
In label-based filtering, you use label arrays for categories, tags, or access controls. When you build the diskann index, make sure it includes the label column:
1
CREATE TABLE documents (
2
id SERIAL PRIMARY KEY,
3
embedding VECTOR(1536),
4
labels SMALLINT[], -- category labels
5
status TEXT,
6
created_at TIMESTAMPTZ
7
);
8
9
CREATE INDEX ON documents
10
USING diskann (embedding vector_cosine_ops, labels);
Now, you can run filtered similarity queries that execute directly inside the index for low latency:
1
SELECT *
2
FROM documents
3
WHERE labels && ARRAY[1, 3] -- has label 1 or 3
4
ORDER BY embedding <=> '[...]' -- replace [...] with your query vector
5
LIMIT 10;
This approach leverages the DiskANN algorithm combined with Microsoft’s filtered search technique, delivering both high recall and speed on large datasets.
Instead, thanks to arbitrary WHERE filtering, you can also combine similarity search with any standard SQL filters:
1
SELECT *
2
FROM documents
3
WHERE status = 'active' AND created_at > '2024-01-01' -- sample WHERE filter
4
ORDER BY embedding <=> '[...]' -- replace [...] with your query vector
5
LIMIT 10;
Note: The filtering conditions are applied after the vector search. That means the entire similarity search runs first, and then the results are filtered by the WHERE clause. For tighter, performance-critical filtering, prefer label-based filtering.
Tips and Other Things You Need to Know about pgvectorscale
Explore the remaining aspects you should know to master pgvectorscale.
Adding Semantic Meaning to Labels
Labels must be stored as smallint[] for index efficiency, but you can still map them to meaningful names using a separate table, as shown below:
1
CREATE TABLE label_definitions (
2
id INTEGER PRIMARY KEY,
3
name TEXT,
4
description TEXT,
5
attributes JSONB
6
);
7
8
INSERT INTO label_definitions (id, name, description)
9
VALUES (1, 'science', 'Scientific content'),
10
(2, 'technology', 'Tech content'),
11
(3, 'business', 'Business content');
Then apply a JOIN clause to enrich results:
1
SELECT d.*, array_agg(l.name) AS label_names
2
FROM documents d
3
JOIN label_definitions l ON l.id = ANY(d.labels)
4
WHERE d.labels && ARRAY[1] -- 'science'
5
GROUP BY d.id
6
ORDER BY d.embedding <=> '[...]' -- replace [...] with your query vector
7
LIMIT 10;
This lets you keep performance while your query returns more readable categories.
Tuning pgvectorscale for Your Workload
pgvectorscale exposes both build-time and query-time parameters so you can adjust the speed vs. accuracy tradeoff.
When you create the index, customize options like num_neighbors, num_dimensions, and storage_layout. Apply these options through a WITH clause, as in this example:
1
CREATE INDEX document_embedding_idx
2
ON document_embedding
3
USING diskann (embedding)
4
WITH (num_neighbors=50); -- custom options
You may also want to increase maintenance_work_mem during index creation to handle large datasets:
1
SET maintenance_work_mem = '2GB';
Other Aspects
Conclusion
In this blog post, you will learn that pgvectorscale is a Postgres extension that builds on pgvector to achieve high-performance and cost-effective vector search. It works with both TimescaleDB and PostgreSQL, expanding them with optimized search capabilities for AI applications.
As demonstrated in this article, PostgreSQL database management becomes easier with a feature-rich visual tool like DbVisualizer. Here, we only scratch the surface of all the features it offers, including query optimization, drag-and-drop query building, ERD-style schema generation, streamlined import/export, and more. Download DbVisualizer for free today!
FAQ
What is the difference between pgvector and pgvectorscale?
Find the answer in the pgvector vs pgvectorscale comparison table below:
| pgvector | pgvectorscale | |
|---|---|---|
| Developed by | Open-source community | TigerData/TimescaleDB team |
| Programming language | C | Rust |
| Requirements | — | Rust, pgvector |
| Purpose | Vector data extension for PostgreSQL | Complements pgvector for enhanced performance |
| Index types supported | IVF, HNSW, Flat | StreamingDiskANN |
| Compression | None / basic | SBQ compression to reduce storage and speed up traversal |
| Filtering support | Basic (WHERE clause) | Label-based filtered vector search + arbitrary WHERE clause filtering |
| Use case focus | General vector similarity search in Postgres | Large-scale, cost-efficient AI embedding workloads in Postgres |
| GitHub stars | 16.6.k+ | 2.1k+ |
Is pgvectorscale open source?
Yes, pgvectorscale is an open-source project developed by the TimescaleDB team using Rust. As of this writing, the PostgreSQL extension for vector search has 2.1k stars on GitHub and 16 contributors.
What performance advantages do pgvector and pgvectorscale provide?
According to a benchmark by the TimescaleDB team on a dataset of 50 million Cohere embeddings (each with 768 dimensions), PostgreSQL with pgvector and pgvectorscale delivers 28x lower p95 latency and 16x higher query throughput than Pinecone’s storage-optimized (s1) index for approximate nearest neighbor queries at 99% recall.
What is TimescaleDB?
TimescaleDB is an open-source time-series database built on top of PostgreSQL. In detail, it extends Postgres's capabilities to handle time-series data efficiently. Some of the main features it offers include automatic partitioning, optimized data storage, and fast query performance for time-based data. At the time of writing, the project has more than 19k stars on GitHub.
Does pgvectorscale work with vanilla PostgreSQL, or only with TimescaleDB?
pgvectorscale can be used with vanilla PostgreSQL. So, to take advantage of it, you do not necessarily need TimescaleDB specifically for pgvectorscale to function. In other words, you can install and use the extension also on your local or remote Postgres setup.
Where can I learn more about PostgreSQL database extensions?
To learn more about database extensions, follow our blog and stay up to date by reading the best SQL books.

