PostgreSQL

Overview

PostgreSQL is a powerful, open-source relational database known for its reliability, extensibility, and standards compliance. In Zenskar, you can use PostgreSQL as a data source to import transactional, usage, or customer data for metering, billing, and analytical reporting.

Prerequisites

Create a dedicated read-only PostgreSQL user for replicating data. Alternatively, you can use an existing user:

CREATE USER <user_name> PASSWORD 'your_password_here';

Provide the user read-only access to relevant schemas and tables:

GRANT USAGE ON SCHEMA <schema_name> TO <user_name>;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO <user_name>;
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name> GRANT SELECT ON TABLES TO <user_name>;

📖
You must re-run the above command for each schema from which you expect to replicate data.

🚧
You must allow inbound traffic from Zenskar IP addresses .

Set up a PostgreSQL data-source connector via Zenskar dashboard

Log into your Zenskar dashboard.
In the left side bar, click Usage > Data Sources.
In the top-right corner, click + ADD DATA SOURCE.
On the Add New Data Source page, configure the PostgreSQL connector.
Click on the SAVE SOURCE button.

Connector configuration

General configuration

Field	Description	Required
Source Name	Enter a unique name for this data source.	Yes
Source Type	Select PostgreSQL from the dropdown menu.	Yes

Connector configuration

Field	Description	Required
Host	The hostname or IP address of the PostgreSQL server.	Yes
Port	The port on which PostgreSQL listens for incoming connections.	Yes
Schemas	The schema you want to replicate in Zenskar.	Yes
Database name	The name of the PostgreSQL database to connect to.	Yes
Username	The username of the PostgreSQL database.	Yes
Password	The password for the database user.	Yes
SSL Modes	SSL configuration mode: Available options are: Disable, Allow, Prefer, Require, Verify-CA, and Verify-Full.	Conditional

Set up a PostgreSQL data-source connector via API

Refer to the create data-source connector API reference.

💡
The connector_config object is the only part of the request that differs across connector types (such as BigQuery, Snowflake, or Redshift). The create data-source connector API reference provides a generic overview, while this document explains the PostgreSQL-specific structure of connector_config.

Request example

curl --location 'https://api.zenskar.com/datasources' \
  -H 'x-api-key: <your-api-key>' \
  -H 'apiversion: 20240301' \
  -H 'Content-Type: application/json' \
  -d '{
    "name":"test_postgresql",
    "connector_type":"Postgres",
    "destination":"Postgres",
    "status":"active",
    "connector_config":{
        "port":5432,
        "schemas":["public"],
        "ssl":true,
        "ssl_mode":{
            "mode":"require",
            "client_certificate":"certificate",
            "client_key":"key"},
        "database":"db",
        "username":"zenskar",
        "password":"password",
        "host":"acme"},
        "source_definition_id":"decd3333-5647-4c0b-adf4-da0e75f5a750",
        "remote_conn":true
  }'

PostgreSQL-specific connector configuration

The connector_config object contains configuration fields specific to the PostgreSQL data source.

Field	Description	Required
`port`	The port on which the PostgreSQL server listens for incoming connections (default: `5432`).	Yes
`schemas`	List of schemas to be replicated from the PostgreSQL database.	Yes
`ssl`	Indicates whether SSL should be used for the connection (`true` or `false`).	Yes
`ssl_mode.mode`	SSL mode to use for the connection. Supported values: `disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`.	Conditional (required if `ssl` = `true`)
`ssl_mode.client_certificate`	The client certificate to use when connecting over SSL.	Conditional (required for `verify-ca` and `verify-full`)
`ssl_mode.client_key`	The private key associated with the client certificate.	Conditional (required for `verify-ca` and `verify-full`)
`database`	The name of the PostgreSQL database to connect to.	Yes
`username`	The username used to authenticate with the PostgreSQL database.	Yes
`password`	The password for the specified database user.	Yes
`host`	The hostname or IP address of the PostgreSQL server.	Yes
`source_definition_id`	The unique identifier for the PostgreSQL connector definition in Zenskar.	Yes
`remote_conn`	Indicates whether a remote (SSH tunnel or bastion) connection should be used.	No

Addendum

SSL modes in PostgreSQL

SSL mode	Description
disable	SSL is disabled. The connection between the client (Zenskar) and the PostgreSQL server is unencrypted.
allow	Tries to connect without SSL first; if that fails, retries with SSL. This mode allows either encrypted or unencrypted connections depending on server configuration.
prefer	(Default behavior in many clients.) Tries to connect with SSL first, but falls back to a non-SSL connection if SSL isn’t available.
require	Always uses SSL to encrypt the connection, but does not validate the server certificate. Ensures encryption but not identity verification.
verify-ca	Establishes an SSL connection and verifies that the server’s certificate is signed by a trusted Certificate Authority (CA). Protects against untrusted certificates.
verify-full	Establishes an SSL connection, verifies the certificate’s CA, and checks that the server hostname matches the certificate. This is the most secure mode.

❗️
If SSL mode is set to Require, the connection will fail if the PostgreSQL instance doesn’t support SSL.

Data replication

Change Data Capture (CDC) replication

Zenskar uses logical replication of PostgreSQL’s write-ahead log (WAL) to incrementally capture data changes, including deletions.

You should configure the PostgreSQL data source with CDC when:

You need a record of deletions.
Your database is large (500 GB or more).
Your table has a primary key but lacks a suitable incremental cursor field (for example, updated_at).

⚠️
CDC must connect to the primary database. Configuring CDC on a replica host will result in failures.

Set up CDC

Grant REPLICATION privileges to the read-only user:
```
ALTER USER <user_name> REPLICATION;
```

Enable logical replication on your PostgreSQL database.

For self-hosted or VM environments (bare metal, EC2, GCE, Docker): Edit the postgresql.conf file:

Parameter	Description	Set value to
`wal_level`	Type of logging in the WAL	`logical`
`max_wal_senders`	Maximum number of WAL sender processes	Minimum `1`
`max_replication_slots`	Maximum number of replication slots allowed	`1` (or higher if other services read from WAL)

For AWS RDS or Aurora:
- Go to the Configuration tab of your DB cluster.
- Locate your cluster parameter group and set rds.logical_replication = 1.
- Restart your instance manually or during the next maintenance window.

For Azure Database for PostgreSQL:

Change the replication mode to logical from the Azure portal, or run:

az postgres server configuration set --resource-group <group> \
  --server-name <server> \
  --name azure.replication_support \
  --value logical
az postgres server restart --resource-group <group> --name <server>

Create a dedicated replication slot for Zenskar using the pgoutput plugin:
```
SELECT pg_create_logical_replication_slot('zenskar_slot', 'pgoutput');
```
The output will include the slot name to be filled in the Zenskar setup page.
Define replication identities for all tables you want to replicate:
```
ALTER TABLE <table_name> REPLICA IDENTITY DEFAULT;
```
Use FULL for tables with large or TOASTable fields:
```
ALTER TABLE <table_name> REPLICA IDENTITY FULL;
```
Create a publication including all tables to be replicated:
```
CREATE PUBLICATION zenskar_publication FOR TABLE <tbl1, tbl2, tbl3>;
```
The publication name is customizable.

💡
The Zenskar UI allows table selection for CDC, but if a table is not part of the publication, it will not be replicated. If a table lacks a replication identity, Zenskar can create it automatically if the user has sufficient permissions.

Xmin replication

The xmin method is a cursorless replication mode that tracks inserts and updates using PostgreSQL’s internal xmin system column.

Use xmin when:

There’s no suitable cursor field (for example, updated_at).
You want to replace a full-refresh sync.
You are replicating PostgreSQL tables smaller than 500 GB.
You are not replicating non-materialized views (these are unsupported by xmin).