Using pre-trained TensorFlow models to remove vehicles and people from images
git clone https://github.com/MyosQ/image-anonymisation.git
(Requires git
to be installed)
pip install -r requirements.txt
The program will traverse the file-tree rooted at the input folder, and mask all .jpg images within the tree. The masked images will be written to an output directory with identical structure as the input folder. The program should be executed as a python-module from the root directory:
usage: python -m src.main [-h] [-i INPUT_FOLDER] [-o OUTPUT_FOLDER] [-a ARCHIVE_FOLDER]
[-l LOG_FOLDER] [--skip-clear-cache] [-k CONFIG_FILE]
Image anonymisation
optional arguments:
-h, --help show this help message and exit
-i INPUT_FOLDER, --input-folder INPUT_FOLDER
Base directory for input images.
-o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
Base directory for masked (output) images and metadata
files
-a ARCHIVE_FOLDER, --archive-folder ARCHIVE_FOLDER
Optional base directory for archiving original images.
-l LOG_FOLDER, --log-folder LOG_FOLDER
Optional path to directory of log file. The log file
will be named <log\folder>\<timestamp> <hostname>.log
--skip-clear-cache Disables the clearing of cache files at startup.
-k CONFIG_FILE Path to custom configuration file. See the README for
details. Default is config\default_config.yml
The user-specifiable configuration parameters can be found in config/default_config.yml. The available parameters are listed below.
draw_mask
: Apply the mask to the output image?delete_input
: Delete the original image from the input directory when the masking is completed?force_remask
: Recompute masks even though a .webp file exists in the output folder.lazy_paths
: Whenlazy_paths = True
, traverse the file tree during the masking process. Otherwise, all paths will be identified and stored before the masking starts.file_access_retry_seconds
: Number of seconds to wait before (re)trying to access a file/directory which cannot currently be reached. This applies to both reading input files, and writing output files.file_access_timeout_seconds
: Total number of seconds to wait before giving up on accessing a file/directory which cannot currently be reached. This also applies to both reading input files, and writing output files.datetime_format
: Timestamp format. See https://docs.python.org/3.7/library/datetime.html#strftime-strptime-behavior for more information.log_file_name
: Name of the log file.{datetime}
will be replaced with a timestamp formatted asdatetime_format
.{hostname}
will be replaced with the host name.log_level
: Logging level for the application. This controls the log level for terminal logging and file logging (if it is enabled). Must be one of {"DEBUG", "INFO", "WARNING", "ERROR"}.application_version
: Version number for the application. Will be written to JSON files and database.exif_mappenavn
: Formatter formappenavn
in the JSON file.relative_input_dir
is the path to the folder containing the image, relative toexif_top_dir
below. For instance, if the image is located atC:\Foo\Bar\Baz\Hello\World.jpg
, andexif_top_dir = Bar
, thenrelative_input_dir
will beBaz\Hello
.exif_top_dir
: Top directory forrelative_input_dir
. See above for an explanation.
remote_json
: Write the EXIF .json file to the output (remote) directory?local_json
: Write the EXIF .json file to the input (local) directory?archive_json
: Write the EXIF .json file to the archive directory?remote_mask
: Write mask file to the output (remote) directory?local_mask
: Write the mask file to the input (local) directory?archive_mask
: Write mask file to the archive directory?error_output_dir
: Directory to which to write faulty images. Default: /errors
enable_async
: Enable asynchronous post-processing? When True, the file exports (anonymised image, mask file and JSON file) will be executed asynchronously in order to increase processing speed.max_num_async_workers
: Maximum number of asynchronous workers allowed to be active simultaneously. Should be <= (CPU core count - 1)
model_type
: Type of masking model. Currently, there are three available models with varying speed and accuracy. The slowest model produces the most accurate masks, while the masks from the medium model are slightly worse. The masks from the "Fast" model are currently not recommended due to poor quality. Must be either "Slow", "Medium" or "Fast". "Medium" is recommended. Default: "Medium"mask_dilation_pixels
: Approximate number of pixels for mask dilation. This will help ensure that an identified object is completely covered by the corresponding mask. Setmask_dilation_pixels = 0
to disable mask dilation. Default:4
max_num_pixels
: Maximum number of pixels in images to be processed by the masking model. If the number of pixels exceeds this value, it will be resized before the masker is applied. This will NOT change the resolution of the output image.
mask_color
: "RGB tuple (0-255) indicating the masking color. Setting this option will override the colors specified below. Example: Settingmask_color = [50, 50, 50]
will make all masks dark gray.blur
: Blurring coefficient (1-100) which specifies the degree of blurring to apply within the mask. When this parameter is specified, the image will be blurred, and not masked with a specific color. Setblur = None
to disable blurring, and use colored masks instead. Default:15
gray_blur
: Convert the image to grayscale before blurring? (Ignored if blurring is disabled) Default:True
normalized_gray_blur
: Normalize the gray level within each mask after blurring? This will make bright colors indistinguishable from dark colors. NOTE: Requiresgray_blur=True
Default: True
uncaught_exception_email
: Send an email if the program exits abnormally due to an uncaught exception.processing_error_email
: Send an email if a processing error is encountered, but the program is able to continuefinished_email
: Send an email when the anonymisation finishes normally.email_attach_log
: Attach the log file to emails?
write_exif_to_db
: Write the EXIF data to the database?db_max_n_accumulated_rows
: Maximum number of rows to accumulate locally before writing all accumulated rows to the database.db_max_n_errors
: If the number of failed insertions/updates exceeds this number, a RuntimeError will be raised.db_max_cache_size
: If the number of cached rows exceeds this number, a RuntimeError will be raised.
The application supports custom configuration files with the same structure as config/default_config.yml
.
Note that custom configuration files should define all variables defined in config/default_config.yml
.
Use the -k
argument to specify a custom config file. (See Usage for details.)
The application can send an email notification on an abnormal exit, a processing error, or on completion. These noticifations can be enabled/disabled
with the flags uncaught_exception_email
, processing_error_email
and finished_email
, available in config.py
. The email sending feature requires a
sender, receiver(s), and an SMTP-server in order to work. These can be specified by creating a file named email_config.py
in the config
directory, which
contains the following:
# Sender's address
from_address = "[email protected]"
# Receiver address(es)
to_addresses = ["[email protected]", "[email protected]", ...]
# SMTP-server address
smtp_host = "<smtp server address>"
# SMTP-server port
port = <smtp port>
Use the scripts.db.create_db_config
script to create a database configuration file:
usage: python -m scripts.db.create_db_config [-h] --user USER --password PASSWORD --dsn DSN
[--schema SCHEMA] --table_name TABLE_NAME
Create the db_config.py file, which configures the database connection.
optional arguments:
-h, --help show this help message and exit
--user USER Database username
--password PASSWORD Database password (will be encrypted)
--dsn DSN Data source name (DSN)
--schema SCHEMA Optional schema. Default is None
--table_name TABLE_NAME
Database table name.
The program expects to find the table layout in the YAML file config/db_tables/<table_name>.yml
. The file should contain the following keys:
pk_column
: The name of thePRIMARY KEY
column.columns
: A list of columns, where each element has the keys:name
: Name of the column.dtype
: Oracle SQL datatype for the column.formatter
: Name of a function in formatters.py, which returns the column value from the given JSON-contents.extra
: Extra column contstraints, such asNOT NULL
orPRIMARY KEY
.spatial_metadata
: This is only required ifdtype
isSDO_GEOMETRY
. Contains geometric metadata about the objects in the column. Expected keys are:dimension
: Number of dimensions. Must be2
or3
.srid
: SRID for the object's coordinate system.dim_elements
: A list where each element hasname
,min
,max
andtol
. The elements are used to create theDIMINFO
array in the spatial metadata table.
For a table named my_table
, the contents of config/db_tables/my_table.yml
might look like:
pk_column: UUID
columns:
# ID column. Used as primary key
- name: UUID
dtype: VARCHAR(255)
formatter: uuid
extra: PRIMARY KEY
# Timestamp column
- name: Timestamp
dtype: DATE
formatter: timestamp
extra: NOT NULL
# Optional position column
- name: Position
dtype: SDO_GEOMETRY
formatter: position
extra:
spatial_metadata:
dimension: 2
srid: 4326
dim_elements:
- name: Longitude
min: -180
max: 180
tol: 0.5
- name: Latitude
min: -90
max: 90
tol: 0.5
Note that the example above expects to find the functions uuid
, timestamp
and position
, in src.db.formatters
.
When the parameters above have been configured correctly, the EXIF data can be written to the database by using the json_to_db
script:
python -m scripts.db.json_to_db -i <base input folder>
This will recursively traverse <base input folder>
, read all .json files, and write the contents to the specified database.
Database writing can also be done automatically during anonymisation. This is enabled by setting write_exif_to_db = True
in config.py
.
The tests/
directory provides a large number of tests which can be used to check that the application works as expected. Use the pytest
command
to run the tests:
pytest tests
Note that this will skip the tests marked as slow
and db
. Add the --run-slow
to run the slow
tests, and --run-db
to run the db
tests.
The tests marked with db
requires a test database to be running locally. The test database is a
Single instance Oracle database (18c XE), running in a docker container.
Docker is therefore required to build and run the test database.
To build the docker image, run:
.\tests\db\setup\build.ps1
To start the test database, run:
.\tests\db\setup\start.ps1
Note that the tests marked with db
will fail if the test database is not running.
The following extra scripts are available:
scripts.create_json
: Traverses a directory tree and creates JSON-files for all.jpg
files found in the tree.scripts.check_folders
: Traverses a set of input/output/archive folders and checks that all files are present/not present, as specified in the config file.scripts.evaluate
: Evaluates the current model on a specified testing dataset. Requirespycocotools
to be installed.scripts.db.create_table
: Creates the specified database table.scripts.db.insert_geom_metadata
: Inserts the appropriate metadata for the specified table into theMDSYS.USER_GEOM_METADATA
view.scripts.db.json_to_db
: Traverses a directory tree and writes the contents of all found.json
files to the specified database table.
Each script can be invoked by running:
python -m <script> <args>
Use the -h
argument to get a description for each script, and a list of possible arguments.