Put on your Masks !
The main idea of this extension is to implement the concept of Privacy by Design, which is principle imposed by the Article 25 of the GDPR.
With PostgreSQL Anonymizer, you can declare a masking policy which is a set of masking rules stored inside the database model and applied to various database objects.
The data masking rules should be written by the people who develop the application because they have the best knowledge of how the data model works. Therefore masking rules must be implemented directly inside the database schema.
This allows to mask the data directly inside the PostgreSQL instance without using an external tool and thus limiting the exposure and the risks of data leak.
The data masking rules are declared simply by using security labels:
CREATE TABLE player( id SERIAL, name TEXT, total_points INT, highest_score INT);
INSERT INTO player VALUES
( 1, 'Kareem Abdul-Jabbar', 38387, 55),
( 5, 'Michael Jordan', 32292, 69);
SECURITY LABEL FOR anon ON COLUMN player.name
IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL FOR anon ON COLUMN player.id
IS 'MASKED WITH VALUE NULL';
Escaping String literals
As you may have noticed the masking rule definitions are placed between single quotes. Therefore if you need to use a string inside a masking rule, you need to use C-Style escapes like this:
SECURITY LABEL FOR anon ON COLUMN player.name
IS E'MASKED WITH VALUE \'CONFIDENTIAL\'';
Or use dollar quoting which is easier to read:
SECURITY LABEL FOR anon ON COLUMN player.name
IS 'MASKED WITH VALUE $$CONFIDENTIAL$$';
Listing masking rules
To display all the masking rules declared in the current database, check out
the anon.pg_masking_rules
:
SELECT * FROM anon.pg_masking_rules;
Debugging masking rules
When an error occurs to due a wrong masking rule, you can get more detailed
information about the problem by setting client_min_messages
to DEBUG
and
you will get useful details
postgres=# SET client_min_messages=DEBUG;
SET
postgres=# SELECT anon.anonymize_database();
DEBUG: Anonymize table public.bar with firstname = anon.fake_first_name()
DEBUG: Anonymize table public.foo with id = NULL
ERROR: Cannot mask a "NOT NULL" column with a NULL value
HINT: If privacy_by_design is enabled, add a default value to the column
CONTEXT: PL/pgSQL function anon.anonymize_table(regclass) line 47 at RAISE
SQL function "anonymize_database" statement 1
Removing a masking rule
You can simply erase a masking rule like this:
SECURITY LABEL FOR anon ON COLUMN player.name IS NULL;
To remove all rules at once, you can use:
SELECT anon.remove_masks_for_all_columns();
Multiple Masking Policies
By default, there is only one masking policy named 'anon'. Most of the times, a single policy is enough. However in more complex situations, the database owner may want to define different sets of masking rules for different use cases.
This can be achieved by declaring multiple masking policies.
For instance, we can add 2 new policies with:
ALTER DATABASE foo SET anon.masking_policies TO 'devtests, analytics';
Important: You need to reconnect to the database so that the change takes effect !
We can now define a "devtests" policy for a developer name "devin". Devin wants to run CI tests on his code using fake/random data.
SECURITY LABEL FOR devtests ON COLUMN player.name
IS 'MASKED WITH FUNCTION anon.fake_last_name()';
SECURITY LABEL FOR devtests ON COLUMN player.highest_score
IS 'MASKED WITH FUNCTION anon.random_int_between(0,50)';
SECURITY LABEL FOR devtests ON ROLE devin IS 'MASKED';
We can also define an "analytics" for a data scientist name "Anna". Anna needs
to run global stats over the dataset, she want to keep the real value on the
highest_score
column but she does not need to know the players names
SECURITY LABEL FOR analytics ON COLUMN player.name
IS 'MASKED WITH VALUE NULL';
SECURITY LABEL FOR analytics ON ROLE anna IS 'MASKED';
Only one policy can be applied to a role. If you define that a role is masked in several masking policies, only the first one in the list will be applied.
The "anon" policy is always declared and cannot be removed.
If you declare a function as TRUSTED
, it will be trusted for all masking
policies.
Limitations
- The masking rules are NOT INHERITED ! If you have split a table into multiple partitions, you need to declare the masking rules for each partition.