PostgreSQL

Overview

PostgreSQL is a powerful, open-source relational database known for its reliability, extensibility, and standards compliance. In Zenskar, you can use PostgreSQL as a data source to import transactional, usage, or customer data for metering, billing, and analytical reporting.


Prerequisites

Create a dedicated read-only PostgreSQL user for replicating data. Alternatively, you can use an existing user:

CREATE USER <user_name> PASSWORD 'your_password_here';

Provide the user read-only access to relevant schemas and tables:

GRANT USAGE ON SCHEMA <schema_name> TO <user_name>;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <user_name>;
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name> GRANT SELECT ON TABLES TO <user_name>;
📖

You must re-run the above command for each schema from which you expect to replicate data.

🚧

You must allow inbound traffic from Zenskar IP addresses .


Set up a PostgreSQL data-source connector via Zenskar dashboard

  1. Log into your Zenskar dashboard.
  2. In the left side bar, click Usage > Data Sources.
  3. In the top-right corner, click + ADD DATA SOURCE.
  4. On the Add New Data Source page, configure the PostgreSQL connector.
  5. Click on the SAVE SOURCE button.

Connector configuration

General configuration

FieldDescriptionRequired
Source NameEnter a unique name for this data source.Yes
Source TypeSelect PostgreSQL from the dropdown menu.Yes

Connector configuration

FieldDescriptionRequired
HostThe hostname or IP address of the PostgreSQL server.Yes
PortThe port on which PostgreSQL listens for incoming connections.Yes
SchemasThe schema you want to replicate in Zenskar.Yes
Database nameThe name of the PostgreSQL database to connect to.Yes
UsernameThe username of the PostgreSQL database.Yes
PasswordThe password for the database user.Yes
SSL ModesSSL configuration mode: Available options are: Disable, Allow, Prefer, Require, Verify-CA, and Verify-Full.Conditional

Set up a PostgreSQL data-source connector via API

💡

The connector_config object is the only part of the request that differs across connector types (such as BigQuery, Snowflake, or Redshift). The create data-source connector API reference provides a generic overview, while this document explains the PostgreSQL-specific structure of connector_config.

Request example

curl --location 'https://api.zenskar.com/datasources' \
  -H 'x-api-key: <your-api-key>' \
  -H 'apiversion: 20240301' \
  -H 'Content-Type: application/json' \
  -d '{
    "name":"test_postgresql",
    "connector_type":"Postgres",
    "destination":"Postgres",
    "status":"active",
    "connector_config":{
        "port":5432,
        "schemas":["public"],
        "ssl":true,
        "ssl_mode":{
            "mode":"require",
            "client_certificate":"certificate",
            "client_key":"key"},
        "database":"db",
        "username":"zenskar",
        "password":"password",
        "host":"acme"},
        "source_definition_id":"decd3333-5647-4c0b-adf4-da0e75f5a750",
        "remote_conn":true
  }'

PostgreSQL-specific connector configuration

The connector_config object contains configuration fields specific to the PostgreSQL data source.

FieldDescriptionRequired
portThe port on which the PostgreSQL server listens for incoming connections (default: 5432).Yes
schemasList of schemas to be replicated from the PostgreSQL database.Yes
sslIndicates whether SSL should be used for the connection (true or false).Yes
ssl_mode.modeSSL mode to use for the connection. Supported values: disable, allow, prefer, require, verify-ca, verify-full.Conditional (required if ssl = true)
ssl_mode.client_certificateThe client certificate to use when connecting over SSL.Conditional (required for verify-ca and verify-full)
ssl_mode.client_keyThe private key associated with the client certificate.Conditional (required for verify-ca and verify-full)
databaseThe name of the PostgreSQL database to connect to.Yes
usernameThe username used to authenticate with the PostgreSQL database.Yes
passwordThe password for the specified database user.Yes
hostThe hostname or IP address of the PostgreSQL server.Yes
source_definition_idThe unique identifier for the PostgreSQL connector definition in Zenskar.Yes
remote_connIndicates whether a remote (SSH tunnel or bastion) connection should be used.No

Addendum

SSL modes in PostgreSQL

SSL modeDescription
disableSSL is disabled. The connection between the client (Zenskar) and the PostgreSQL server is unencrypted.
allowTries to connect without SSL first; if that fails, retries with SSL. This mode allows either encrypted or unencrypted connections depending on server configuration.
prefer(Default behavior in many clients.) Tries to connect with SSL first, but falls back to a non-SSL connection if SSL isn’t available.
requireAlways uses SSL to encrypt the connection, but does not validate the server certificate. Ensures encryption but not identity verification.
verify-caEstablishes an SSL connection and verifies that the server’s certificate is signed by a trusted Certificate Authority (CA). Protects against untrusted certificates.
verify-fullEstablishes an SSL connection, verifies the certificate’s CA, and checks that the server hostname matches the certificate. This is the most secure mode.
❗️

If SSL mode is set to Require, the connection will fail if the PostgreSQL instance doesn’t support SSL.

Data replication

Change Data Capture (CDC) replication

Zenskar uses logical replication of PostgreSQL’s write-ahead log (WAL) to incrementally capture data changes, including deletions.

You should configure the PostgreSQL data source with CDC when:

  • You need a record of deletions.
  • Your database is large (500 GB or more).
  • Your table has a primary key but lacks a suitable incremental cursor field (for example, updated_at).
⚠️

CDC must connect to the primary database. Configuring CDC on a replica host will result in failures.

Set up CDC

  1. Grant REPLICATION privileges to the read-only user:

    ALTER USER <user_name> REPLICATION;
  2. Enable logical replication on your PostgreSQL database.

    • For self-hosted or VM environments (bare metal, EC2, GCE, Docker): Edit the postgresql.conf file:

      ParameterDescriptionSet value to
      wal_levelType of logging in the WALlogical
      max_wal_sendersMaximum number of WAL sender processesMinimum 1
      max_replication_slotsMaximum number of replication slots allowed1 (or higher if other services read from WAL)
    • For AWS RDS or Aurora:

      • Go to the Configuration tab of your DB cluster.
      • Locate your cluster parameter group and set rds.logical_replication = 1.
      • Restart your instance manually or during the next maintenance window.
    • For Azure Database for PostgreSQL:

      • Change the replication mode to logical from the Azure portal, or run:

        az postgres server configuration set --resource-group <group> \
          --server-name <server> \
          --name azure.replication_support \
          --value logical
        az postgres server restart --resource-group <group> --name <server>
  3. Create a dedicated replication slot for Zenskar using the pgoutput plugin:

    SELECT pg_create_logical_replication_slot('zenskar_slot', 'pgoutput');

    The output will include the slot name to be filled in the Zenskar setup page.

  4. Define replication identities for all tables you want to replicate:

    ALTER TABLE <table_name> REPLICA IDENTITY DEFAULT;

    Use FULL for tables with large or TOASTable fields:

    ALTER TABLE <table_name> REPLICA IDENTITY FULL;
  5. Create a publication including all tables to be replicated:

    CREATE PUBLICATION zenskar_publication FOR TABLE <tbl1, tbl2, tbl3>;

    The publication name is customizable.

💡

The Zenskar UI allows table selection for CDC, but if a table is not part of the publication, it will not be replicated. If a table lacks a replication identity, Zenskar can create it automatically if the user has sufficient permissions.

Xmin replication

The xmin method is a cursorless replication mode that tracks inserts and updates using PostgreSQL’s internal xmin system column.

Use xmin when:

  • There’s no suitable cursor field (for example, updated_at).
  • You want to replace a full-refresh sync.
  • You are replicating PostgreSQL tables smaller than 500 GB.
  • You are not replicating non-materialized views (these are unsupported by xmin).