Privacy By Default
This feature is considered in beta and not ready for production until version 2.O is published.
Use with care.
The GDPR regulation (and other privacy laws) introduces the concept of data protection by default. In a nutshell, it means that by default, organisations should ensure that data is processed with the highest privacy protection so that by default personal data isn’t made accessible to an indefinite number of persons.
By applying this principle to anonymization, we end up with the idea of privacy by default which basically means that all columns of all tables should be masked by default, without having to declare a masking rule for each of them.
To enable this feature, simply set the option
Imagine a database named
foo with a basic table containing HTTP logs:
# SELECT * FROM access_logs LIMIT 1; date_open | ip_addr | url | browser_agent ---------------------+-----------------+------------+------------------------------ 2009-01-08 00:00:00 | 192.168.100.128 | /home.html | Mozilla/5.0 (Windows; en_US) (1 row)
Now let's activate privacy by default:
ALTER DATABASE foo SET anon.privacy_by_default = True;
The setting will be applied for the next sessions and we can now anonymize the table without writing any masking rule.
# SELECT anon.anonymize_database(); anonymize_database -------------------- t # SELECT * FROM access_logs LIMIT 1; date_open | ip_addr | url | browser_agent -----------+---------+-----+--------------- | | | unkown
As we can see, when the
anon.privacy_by_default is defined all the values will
be replaced by the column's default value or NULL. The entire dataset is
Now instead of writing rules to mask the sensible columns, we will write rules to unmask the ones we want to allow.
For instance, let's say that we want to keep the authentic value of the
field, we can simply "unmask" the column like this:
SECURITY LABEL FOR anon ON COLUMN access_logs.url IS 'NOT MASKED';
This can also be achieved by a masking rule that will replace the value with itself:
SECURITY LABEL FOR anon ON COLUMN access_logs.url IS 'MASKED WITH VALUE url';
Now we'd like to unmask the
date_open field in the anonymized dataset but
we need to generalize the dates to keep only the year:
SECURITY LABEL FOR anon ON COLUMN access_logs.date_open IS 'MASKED WITH FUNCTION make_date(EXTRACT(year FROM date_open)::INT,1,1)';
Caveat: Add a DEFAULT to the NOT NULL columns
It is a bit ironic that the
anon.privacy_by_default parameter is not
enabled by default. This reason is simple: activating this option may or may
not lead to contraint violations depending on the columns constraints placed
in the database model.
Let's say we want to add a
NOT NULL constraint on the
ALTER TABLE public.access_logs ALTER COLUMN date_open SET NOT NULL;
Now if we try to anonymize the table, we get the following violation:
SELECT anon.anonymize_table('public.access_logs') as test4; ERROR: Cannot mask a "NOT NULL" column with a NULL value HINT: If privacy_by_design is enabled, add a default value to the column
The solution here is simply to define a default value and this value will be
used for the
ALTER TABLE public.access_logs ALTER COLUMN date_open SET DEFAULT now();
Other constraints (foreign keys, UNIQUE, CHECK, etc.) should work fine without a DEFAULT value.