Repo containing MPhil thesis code for link prediction in supply chain knowledge graphs using Graph Representation Learning.
You can get started with recreating the GNN analysis by running
pip install -r requirements.txt
followed by running the training and testing scripts with:
python3 main.py --cpu
(alternatively --gpu
if you are lucky enough to have one π)
Modern supply chains lend themselves to a KG representation based on rich metadata regarding their certifications, location, buying and selling relationships, and capabilities. A KG representation allows for companies to interrogate their supply chains in a novel way. Examples including finding alternative suppliers, building relationships (or removing relationships in nefarious instances). The following image is an extrac of the KG built for an automotive suppply chain.
The ontology of the graph is given as:
Nodes | Number |
---|---|
company (e.g. General Motors) | 119,599 |
product (e.g. Floor mat) | 119,618 |
capability (e.g. Machining) | 36 |
certification (e.g. ISO9001) | 9 |
Edges in the ontology
Edges | Number |
---|---|
('capability', 'capability_produces', 'product') | 21,857 |
('company', 'buys_from', 'company') | 88,997 |
('company', 'has_capability', 'capability') | 83,787 |
('company', 'has_cert', 'certification') | 32,654 |
('company', 'located_in', 'country') | 40,421 |
('company', 'makes_product', 'product') | 119,618 |
('product', 'complimentary_product_to', 'product') | 260,658 |
If you find this implementation useful, please consider citing the following article.
@article{
author = {Aziz, Ajmal and Kosasih, Edward and Griffiths, Ryan-Rhys and Brintrup, Alexandra},
journal = {International Conference for Machine Learning (ICML) workshop on ML4Data},
year = {2021},
month = {07},
title = {Data Considerations in Graph Representation Learning for Supply Chain Networks}
}
The file structure is laid out as follows:
βββ README.md # The top-level README
βββ config # ππ» Run Project configurations
βΒ Β βββ config.yml # For changing run parameters (e.g. number of epochs π)
βΒ Β βββ sweep_config.yml
βββ data # π Ask for GDrive Access
βΒ Β βββ 01_raw # πΎ Data from third party sources.
βΒ Β βΒ Β βββ raw_df.pkl
βΒ Β βΒ Β βββ supplier_product_df.parquet
βΒ Β βββ 02_intermediate # πͺπΌ Intermediate data that has been transformed.
βΒ Β βΒ Β βββ G.pickle
βΒ Β βΒ Β βββ bG.pickle
βΒ Β βΒ Β βββ cG.pickle
βΒ Β βΒ Β βββ dataset
βΒ Β βΒ Β βββ dataset.pickle
βΒ Β βΒ Β βββ marklinesEdges.p
βΒ Β βββ 03_models # Saved GNN models
βΒ Β βββ 04_results # Results from the analysis π
βββ images
βΒ Β βββ kg_extract.png
βββ main.py
βββ notebooks # π¨π»βπ» Exploratory notebooks
βΒ Β βββ 1_analyse_dgl_creation.ipynb
βΒ Β βββ 2_parameter_sweep-Copy1.ipynb
βββ requirements.txt
βββ src
βββ __init__.py
βββ common
βΒ Β βββ formats.py
βββ exploration # π£ Exploring data (e.g. degree distributions)
βΒ Β βββ Marklines.py
βΒ Β βββ __init__.py
βΒ Β βββ dataset.py
βΒ Β βββ visualise_graph.py
βΒ Β βββ visualise_knowledge_graph.py
βββ ingestion # Data loaders and utils for torch π₯
βΒ Β βββ __init__.py
βΒ Β βββ dataloader.py
βΒ Β βββ dataset.py
βΒ Β βββ dgl_dataset.py
βΒ Β βββ utils.py
βββ managers # Training and testing managers in torch π₯
βΒ Β βββ evaluator.py
βΒ Β βββ trainer.py
βββ model # π€ DGL Models
βΒ Β βββ __init__.py
βΒ Β βββ dgl
βΒ Β βββ StochasticRGCN.py
βΒ Β βββ __init__.py
βΒ Β βββ __pycache__
βΒ Β βΒ Β βββ StochasticRGCN.cpython-39.pyc
βΒ Β βΒ Β βββ __init__.cpython-39.pyc
βΒ Β βΒ Β βββ layers.cpython-39.pyc
βΒ Β βββ layers.py