AI
POSTGRESQL
Vectors

pgvectorscale: An Extension for Improved Vector Search in Postgres

intro

Let’s learn everything you need to know about pgvectorscale, the extension that complements pgvector for managing vector data in PostgreSQL.

Tools used in the tutorial
Tool Description Link
Dbvisualizer DBVISUALIZER
TOP RATED DATABASE MANAGEMENT TOOL AND SQL CLIENT

Today, vectors are everywhere, and PostgreSQL is among the databases that support them thanks to the pgvector extension. The problem is that pgvector by itself does not always deliver the highest level of performance. That is exactly why pgvectorscale was created as a complementary extension!

In this complete guide, you will learn what pgvectorscale is, how to install it, the advantages it offers, and see some practical examples of how to use it.

Let’s dive in!

What is pgvectorscale?

pgvectorscale is an open-source extension for PostgreSQL that builds on top of pgvector. Its end goal is to deliver higher-performance embedding search and more cost-efficient storage for AI applications.

More specifically, pgvectorscale complements the pgvector extension by introducing the following innovations:

  • StreamingDiskANN Index: A specialized index built for high-performance, cost-efficient ANN (Approximate Nearest Neighbor) search. The extension stores part of the index on disk, so that it can scale to much larger datasets.
  • Statistical Binary Quantization (SBQ): A compression technique that reduces storage requirements and speeds up index traversal.
  • Label-based filtered vector search: Opens the door to efficient filtering based on labels. It combines vector similarity search with metadata constraints.

For more information, refer to the official presentation blog post.

Note that pgvectorscale must be used in a PostgreSQL setup where pgvector is already installed. If you are not familiar with the latter extension, read our guide on pgvector. Unlike pgvector, which is written in C, pgvectorscale leverages Rust through the PGRX framework.

In short, pgvector enables storing and querying vector embeddings directly within PostgreSQL.

How and When to Use pgvectorscale

In the next two subchapters, we will explore how to get started with pgvectorscale and why you should consider using it.

Installation Procedure

pgvectorscale is developed by TigerData, a company that creates extensions to make PostgreSQL a valid open-source alternative to Pinecone for AI data storing. TigerData is also the team behind TimescaleDB, an open-source time-series database built on top of PostgreSQL.

That is the reason why the official installation instructions for pgvectorscale primarily focus on TimescaleDB. However, we will also cover how to install it on plain PostgreSQL.

To install pgvectorscale with TimescaleDB, refer to these official guides:

If you want to install pgvectorscale on vanilla PostgreSQL, start by installing it from the source with this Bash script:

Copy
        
1 # install Rust 2 curl --proto '=https' --tlsv1.2 -sSf <https://sh.rustup.rs> | sh 3 4 # download pgvectorscale 5 cd /tmp 6 git clone --branch <version> <https://github.com/timescale/pgvectorscale> 7 cd pgvectorscale/pgvectorscale 8 # install cargo-pgrx with the same version as pgrx 9 cargo install --locked cargo-pgrx --version $(cargo metadata --format-version 1 | jq -r '.packages[] | select(.name == "pgrx") | .version') 10 cargo pgrx init --pg17 pg_config 11 # build and install pgvectorscale 12 cargo pgrx install --release

Now, register the extension in your Postgres server by connecting to your server using a PostgreSQL database client like DbVisualizer and running this CREATE EXTENSION query:

Copy
        
1 CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
Executing the CREATE EXTENSION query in DbVisualizer
Executing the CREATE EXTENSION query in DbVisualizer

Note: The CASCADE option ensures that pgvector (which pgvectorscale depends on) is installed automatically if it is not already present.

Great! You just installed pgvectorscale in PostgreSQL.

Main Reasons to Use pgvectorscale

pgvectorscale brings faster and more cost-effective vector search capabilities directly into PostgreSQL, improving a lot over pgvector. This can make a PostgreSQL setup a strong alternative to managed services like Pinecone, achieving better performance at a fraction of the cost.

In particular, pgvectorscale is perfect for building AI-driven applications that rely on efficient similarity search across large datasets. That is true when you want full control over your database stack without paying the high fees that come with third-party vector databases.

For more guidance, follow the official tutorial that shows how to use it to build a fast, AI-powered recommendation system in PostgreSQL.

pgvectorscale: Main Features and Capabilities

Discover the main features, options, and innovations introduced by pgvectorscale.

StreamingDiskANN Indexing in pgvectorscale

First, start by creating a table with an embedding column, as below:

Copy
        
1 CREATE TABLE IF NOT EXISTS document_embedding ( 2 id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, 3 metadata JSONB, 4 contents TEXT, 5 embedding VECTOR(1536) 6 );

Note that the embedding column must be of type VECTOR, a data type introduced by the pgvector extension.

You now have to populate the embedding column using an AI or ML pipeline. Find out more about this process in our article on vector databases.

To enable fast similarity search, create a StreamingDiskANN index on the embedding column:

Copy
        
1 CREATE INDEX document_embedding_idx 2 ON document_embedding 3 USING diskann (embedding vector_cosine_ops);

Note the USING diskann (embedding vector_cosine_ops) syntax. USING diskann tells Postgres to build the index using the StreamingDiskANN method provided by pgvectorscale, instead of a traditional B-tree or GiST index.

Then, (embedding vector_cosine_ops) defines the column and the operator class to use, which is vector_cosine_ops in this case. vector_cosine_ops is an operator class provided by pgvectorscale to compute cosine distance for nearest neighbor search.

To retrieve the top 10 closest vectors, you can now write:

Copy
        
1 SELECT * 2 FROM document_embedding 3 ORDER BY embedding <=> $1 4 LIMIT 10;
The similarity search query in DbVisualizer
The similarity search query in DbVisualizer

Note: This syntax mirrors pgvector . pgvectorscale supports cosine (<=>), L2 (<->), and inner product (<#>) distances.

Filtered Vector Search with Labels and Metadata

One of pgvectorscale’s standout innovations is filtered vector search. This features combines similarity search with structured filters.

There are two main types of filtering you can combine in a single query when using pgvectorscale:

  1. Label-based filtering with the diskann index: Provides optimized, low-latency filtering directly within the index by using label arrays.
  2. Arbitrary WHERE clause filtering: Applies standard SQL filters after the vector similarity search, acting as post-filtering.

In label-based filtering, you use label arrays for categories, tags, or access controls. When you build the diskann index, make sure it includes the label column:

Copy
        
1 CREATE TABLE documents ( 2 id SERIAL PRIMARY KEY, 3 embedding VECTOR(1536), 4 labels SMALLINT[], -- category labels 5 status TEXT, 6 created_at TIMESTAMPTZ 7 ); 8 9 CREATE INDEX ON documents 10 USING diskann (embedding vector_cosine_ops, labels);

Now, you can run filtered similarity queries that execute directly inside the index for low latency:

Copy
        
1 SELECT * 2 FROM documents 3 WHERE labels && ARRAY[1, 3] -- has label 1 or 3 4 ORDER BY embedding <=> '[...]' -- replace [...] with your query vector 5 LIMIT 10;

This approach leverages the DiskANN algorithm combined with Microsoft’s filtered search technique, delivering both high recall and speed on large datasets.

Instead, thanks to arbitrary WHERE filtering, you can also combine similarity search with any standard SQL filters:

Copy
        
1 SELECT * 2 FROM documents 3 WHERE status = 'active' AND created_at > '2024-01-01' -- sample WHERE filter 4 ORDER BY embedding <=> '[...]' -- replace [...] with your query vector 5 LIMIT 10;

Note: The filtering conditions are applied after the vector search. That means the entire similarity search runs first, and then the results are filtered by the WHERE clause. For tighter, performance-critical filtering, prefer label-based filtering.

Tips and Other Things You Need to Know about pgvectorscale

Explore the remaining aspects you should know to master pgvectorscale.

Adding Semantic Meaning to Labels

Labels must be stored as smallint[] for index efficiency, but you can still map them to meaningful names using a separate table, as shown below:

Copy
        
1 CREATE TABLE label_definitions ( 2 id INTEGER PRIMARY KEY, 3 name TEXT, 4 description TEXT, 5 attributes JSONB 6 ); 7 8 INSERT INTO label_definitions (id, name, description) 9 VALUES (1, 'science', 'Scientific content'), 10 (2, 'technology', 'Tech content'), 11 (3, 'business', 'Business content');

Then apply a JOIN clause to enrich results:

Copy
        
1 SELECT d.*, array_agg(l.name) AS label_names 2 FROM documents d 3 JOIN label_definitions l ON l.id = ANY(d.labels) 4 WHERE d.labels && ARRAY[1] -- 'science' 5 GROUP BY d.id 6 ORDER BY d.embedding <=> '[...]' -- replace [...] with your query vector 7 LIMIT 10;

This lets you keep performance while your query returns more readable categories.

Tuning pgvectorscale for Your Workload

pgvectorscale exposes both build-time and query-time parameters so you can adjust the speed vs. accuracy tradeoff.

When you create the index, customize options like num_neighbors, num_dimensions, and storage_layout. Apply these options through a WITH clause, as in this example:

Copy
        
1 CREATE INDEX document_embedding_idx 2 ON document_embedding 3 USING diskann (embedding) 4 WITH (num_neighbors=50); -- custom options

You may also want to increase maintenance_work_mem during index creation to handle large datasets:

Copy
        
1 SET maintenance_work_mem = '2GB';

Other Aspects

  • NULL vectors are not indexed.
  • NULL labels become empty arrays.
  • NULL inside arrays is ignored.
  • pgvectorscale's diskann indexes use relaxed ordering. That means results can be slightly out of order to optimize for speed. For strict ordering, use a materialized CTE, as explained in the docs. For more information, follow our complete walkthrough on the PostgreSQL CTE mechanism.
  • Creating an index on an UNLOGGED table is currently not supported.

Conclusion

In this blog post, you will learn that pgvectorscale is a Postgres extension that builds on pgvector to achieve high-performance and cost-effective vector search. It works with both TimescaleDB and PostgreSQL, expanding them with optimized search capabilities for AI applications.

As demonstrated in this article, PostgreSQL database management becomes easier with a feature-rich visual tool like DbVisualizer. Here, we only scratch the surface of all the features it offers, including query optimization, drag-and-drop query building, ERD-style schema generation, streamlined import/export, and more. Download DbVisualizer for free today!

FAQ

What is the difference between pgvector and pgvectorscale?

Find the answer in the pgvector vs pgvectorscale comparison table below:

pgvectorpgvectorscale
Developed byOpen-source communityTigerData/TimescaleDB team
Programming languageCRust
RequirementsRust, pgvector
PurposeVector data extension for PostgreSQLComplements pgvector for enhanced performance
Index types supportedIVF, HNSW, FlatStreamingDiskANN
CompressionNone / basicSBQ compression to reduce storage and speed up traversal
Filtering supportBasic (WHERE clause)Label-based filtered vector search + arbitrary WHERE clause filtering
Use case focusGeneral vector similarity search in PostgresLarge-scale, cost-efficient AI embedding workloads in Postgres
GitHub stars16.6.k+2.1k+

Is pgvectorscale open source?

Yes, pgvectorscale is an open-source project developed by the TimescaleDB team using Rust. As of this writing, the PostgreSQL extension for vector search has 2.1k stars on GitHub and 16 contributors.

What performance advantages do pgvector and pgvectorscale provide?

According to a benchmark by the TimescaleDB team on a dataset of 50 million Cohere embeddings (each with 768 dimensions), PostgreSQL with pgvector and pgvectorscale delivers 28x lower p95 latency and 16x higher query throughput than Pinecone’s storage-optimized (s1) index for approximate nearest neighbor queries at 99% recall.

What is TimescaleDB?

TimescaleDB is an open-source time-series database built on top of PostgreSQL. In detail, it extends Postgres's capabilities to handle time-series data efficiently. Some of the main features it offers include automatic partitioning, optimized data storage, and fast query performance for time-based data. At the time of writing, the project has more than 19k stars on GitHub.

Does pgvectorscale work with vanilla PostgreSQL, or only with TimescaleDB?

pgvectorscale can be used with vanilla PostgreSQL. So, to take advantage of it, you do not necessarily need TimescaleDB specifically for pgvectorscale to function. In other words, you can install and use the extension also on your local or remote Postgres setup.

Where can I learn more about PostgreSQL database extensions?

To learn more about database extensions, follow our blog and stay up to date by reading the best SQL books.

Dbvis download link img
About the author
Antonello Zanini

Antonello is a software engineer, and often refers to himself as a technology bishop. His mission is to spread knowledge through writing.

The Table Icon
Sign up to receive The Table's roundup
More from the table
Title Author Tags Length Published
title

PostgreSQL Index Mechanism: A Guide to Database Performance Optimization

author Leslie S. Gyamfi tags POSTGRESQL 8 min 2025-09-02
title

UUIDv7 in PostgreSQL 18: What You Need to Know

author Antonello Zanini tags POSTGRESQL 8 min 2025-09-01
title

SQL Server Vector Data Type, Search, and Indexing

author Antonello Zanini tags AI SQL SERVER Vectors 8 min 2025-08-25
title

PostgreSQL ISNULL Equivalent: COALESCE, CASE, and NULLIF

author TheTable tags POSTGRESQL 5 min 2025-08-19
title

SQL IS NOT NULL Condition: Definitive Guide

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 6 min 2025-08-13
title

Can a Primary Key Be NULL in an SQL Table?

author TheTable tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 5 min 2025-08-12
title

ISNULL vs COALESCE: Comparing NULL Handling Functions

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 6 min 2025-08-06
title

The SELECT INTO TEMP TABLE Mechanism in SQL

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 6 min 2025-08-05
title

Oracle 23ai: What’s New? Everything You Need to Know at a Glance

author Antonello Zanini tags AI ORACLE 7 min 2025-08-04
title

PostgreSQL TRUNCATE TABLE Statement: A Guide

author Leslie S. Gyamfi tags POSTGRESQL 6 min 2025-07-30

The content provided on dbvis.com/thetable, including but not limited to code and examples, is intended for educational and informational purposes only. We do not make any warranties or representations of any kind. Read more here.