Anonymous Dumps

PostgreSQL Anonymous Dumps

Transparent Anonymous Dumps

To export the anonymized data from a database, follow these 3 steps:

1. Create a masked user

CREATE ROLE anon_dumper LOGIN PASSWORD 'x';
ALTER ROLE anon_dumper SET anon.transparent_dynamic_masking = True;
SECURITY LABEL FOR anon ON ROLE anon_dumper IS 'MASKED';

NOTE: You can replace the name anon_dumper by another name.

2. Grant read access to that masked user

GRANT pg_read_all_data to anon_dumper;

NOTE: If you are running PostgreSQL 13 or if you want a more fine-grained access policy you can grant access more precisely, for instance:

GRANT USAGE ON SCHEMA public TO anon_dumper;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO anon_dumper;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO anon_dumper;

GRANT USAGE ON SCHEMA foo TO anon_dumper;
GRANT SELECT ON ALL TABLES IN SCHEMA foo TO anon_dumper;
GRANT SELECT ON ALL SEQUENCES IN SCHEMA foo TO anon_dumper;

3. Launch pg_dump with the masked user

Now to export the anonymous data from a database named foo, let's use pg_dump:

pg_dump foo \
        --user anon_dumper \
        --no-security-labels \
        --exclude-extension="anon" \
        --file=foo_anonymized.sql

NOTES:

  • linebreaks are here for readability

  • --no-security-labels will remove the masking rules from the anonymous dump. This is really important because masked users should not have access to the masking policy.

  • --exclude-extension is only available with pg_dump 17 and later. As an alternative you can use --extension plpgsql.

  • --format=custom is supported

Anonymizing an SQL file

In previous versions of the documentation, this method was also called « anonymizing black box ».

You can also apply masking rules directly on a database backup file !

The PostgreSQL Anonymizer docker image contains a specific entrypoint script called /dump.sh. You pass the original data and the masking rules to to that /dump.sh script and it will return an anonymized dump.

Here's an example in 4 steps:

Step 1: Dump your original data (for instance dump.sql)

pg_dump --format=plain [...] my_db > dump.sql

Note this method only works with plain sql format (-Fp). You cannot use the custom format (-Fc) and the directory format (-Fd) here.

If you want to maintain the owners and grants, you need export them with pg_dumpall --roles-only like this:

(pg_dumpall -Fp [...] --roles-only && pg_dump -Fp [...] my_db ) > dump.sql

Step 2: Write your masking rules in a separate file (for instance rules.sql)


SECURITY LABEL FOR anon ON COLUMN people.lastname
IS 'MASKED WITH FUNCTION anon.dummy_last_name()';

-- etc.

Step 3: Pass the dump and the rules through the docker image and receive an anonymized dump !

IMG=registry.gitlab.com/dalibo/postgresql_anonymizer
ANON="docker run --rm -i $IMG /dump.sh"
cat dump.sql rules.sql | $ANON > anon_dump.sql

(this last step is written on 3 lines for clarity)

NB: You can also gather step 1 and step 3 in a single command:

(pg_dumpall --roles-only && pg_dump my_db && cat rules.sql) | $ANON > anon_dump.sql

NOTES:

You can use most the pg_dump output options with the /dump.sh script, for instance:

cat dump.sql rules.sql | $ANON --data-only --inserts > anon_dump.sql

DEPRECATED : pg_dump_anon.sh and pg_dump_anon

In version 0.x, the anonymous dumps were done with a shell script named pg_dump_anon.sh. In version 1.x it was done with a golang script named pg_dump_anon. Both commands are now deprecated.

However pg_dump_anon is kept for backward compatibility. If you are still using pg_dump_anon, you should switch to the pg_dump method described above as soon as possible.