Git Product home page Git Product logo

aws-data-mesh-utils's Introduction

AWS Data Mesh Helper Library

The AWS Data Mesh Helper library provides automation around the most common tasks that customers need to perform to implement a data mesh architecture on AWS. A data mesh on AWS uses a central AWS Account (the mesh account) to store the metadata associated with Data Products created by data Producers. This allows other AWS Accounts to act as Consumers, and to request Subscriptions, which must be approved by Producers. Upon approval, the approved grants are provided to the Consumer and can be imported into their AWS Account.

View aws-data-mesh-utils on Pypi

Definition of Terms

  • Data Mesh - An architectural pattern which provides a centralised environment in which the data sharing contract is managed. Data stays within Producer AWS Accounts, and they own the lifecycle of granting Subscriptions.
  • Producer - Any entity which offers a Data Product through the Data Mesh
  • Consumer - Any entity who subscribes to a Data Product in the Data Mesh
  • Subscription - The central record and associated AWS Lake Formation permissions linking a Data Product to a Consumer
  • Data Product - Today, a Data Product is scoped to be only an AWS Lake Formation Table or Database. In future this definition may expand.

The Workflow

To get started, you must first enable an AWS Account as the Data Mesh Account. This is where you will store all Lake Formation metadata about the Data Products which are offered to Consumers. Within this Account, there exist IAM Roles for Producer and Consumer which allow any AWS Identity who has access to perform tasks within the Data Mesh.

Once you have setup an Account as the Data Mesh, you can then activate another AWS Account as a Producer, Consumer, or both. All of these tasks are performed by the Data Mesh Admin, which is accessible through an additional IAM Role or as any Administrator Identity within the mesh Account. Once completed, end users can perform the following Data Mesh tasks:

Data Mesh Tasks

Producer Data Mesh Administrator Consumer
* Create Data Product - Exposes a Lake Formation Database and/or one-or-more Tables as Data Products
* Approve/Deny Subscription Request - Allows for a Producer to approve a set of permissions against a Data Product
* Modify Subscription - Allows a Producer to expand or reduce the scope of a Consumer's access to a Data Product

* Initialize Mesh Account - Sets up an AWS Account to act as the central Data Mesh governance account
* Initialize Producer Account - Sets up an AWS Account to act as a Data Producer
* Initialize Consumer Account - Sets up an AWS Account to act as a Data Consumer
* Enable Account as Producer - Identifies an account as a Producer within the Data Mesh Account
* Enable Account as Consumer - Identifies an account as a Consumer within the Data Mesh Account

* Request Access to Product - Creates a request for access to a Data Product including requested grants
* Finalize Subscription - Once a subscription has been granted for a data product, imports the metadata into the Consumer Account
* List Product Access - Lists which subscriptions are available to the consumer including the status of the request

The following general functionality available to any Data Mesh role:

  • Delete Subscription - Allows a Consumer or Producer to delete a Subscription request. Can be used at any time. Please note the Subscription is not deleted, but instead is archived.
  • List Subscriptions - Lists all Subscriptions and their associated status for any number of filters
  • Get Subscription - Retrieves a single Subscription

Using Data Mesh Utils

This module is provided as a python package, but also includes a command line utility (cli) for convenience. Please click here for cli usage instructions.

Overall System Architecture

The following diagram depicts the overall system architecture associated with a Data Mesh that is in use by a separate Producer and Consumer Accounts:

Architecture

In this architecture, we can see that the data mesh is configured in AWS Account 555555555555, and contains a set of IAM Roles which allow identities within producer and consumer accounts to access the mesh. This includes:

  • DataMeshManager: IAM Role allowing administration of the Data Mesh itself
  • DataMeshAdminProducer: IAM Role enabling the assuming Identity to act as a Producer
  • DataMeshAdminConsumer: IAM Role enabling the assuming Identity to act as a Consumer
  • DataMeshAdminReadOnly: IAM Role that can be used for reading Metadata from the Data Mesh Account (only)

For testing and simplicity, every IAM Role in the solution is accompanied by a single IAM User who is a member of a Group specific to the function. This will enable you to add users to this Group should you wish to, rather than using a programmatic approach. IAM Roles are backed by an IAM Policy of the same name as the Role, and all objects in the IAM stack for AWS Data Mesh reside at path /AwsDataMesh/.

You can then see that there is a Producer Account 111111111111 who has been enabled to act as a Producer. Within this account we see a similar approach to IAM principals, with the creation of a DataMeshProducer IAM Role which is accompanied by an associated user and group. When configured, the DataMeshProducer group is granted rights to assume the DataMeshProducer-<account id> role in the data mesh Account.

Similarly, we have a consumer Account 999999999999. This Account also includes IAM objects to enable data mesh access, including the DataMeshConsumer IAM Role, and associated IAM users and groups. Only the DataMeshConsumer role may assume the DataMeshAdminConsumer-<account id> role in the data mesh Account.

All information around current or pending subscriptions is stored in DynamoDB, in table AwsDataMeshSubscriptions. This table is secured for only those operations which Producers or Consumer roles are allowed to execute, and stores the overall lifecycle for Subscriptions.

Library Structure

This functionality is presented to customers as a Python library to allow maximum re-use. It is divided into 3 modules, each specific to a persona within the overall Data Mesh architecture:

  • src
    • data_mesh_util
      • DataMeshAdmin.py - Includes functionality to be performed by the Administrative function for each account type
      • DataMeshProducer.py - Includes functionality performed by the Producer persona, to create and manage Data Products and manage subscriptions for their products
      • DataMeshConsumer.py - Includes functionality allowing principals to subscribe to Data Products
      • DataMeshMacros.py - Includes functions that span accounts using mulitple credentials
    • lib
      • constants.py - Contains constant values used in user or class interaction
      • SubscriberTracker.py - Class that manages data product Subscription status
      • ApiAutomator.py - Helper class that automates API requests against AWS Accounts
      • utils.py - Various utility functions shared across the codebase
    • resource - Pystache templates used to generate IAM policies
  • examples - Examples of how to use the module. Simplifies credentials configuration by using a credentials file with structure as:

Example Credentials File

To run these functions, you must provide identities that can operate on the producer, consumer, or mesh accounts. These can be configured in a credentials file for simplicity, with the following structure:

{
  "AWS_REGION": "us-east-1",
  "Mesh": {
    "AccountId": "",
    "AccessKeyId": "",
    "SecretAccessKey": ""
  },
  "Producer": {
    "AccountId": "",
    "AccessKeyId": "",
    "SecretAccessKey": ""
  },
  "ProducerAdmin":{
    "AccountId": "",
    "AccessKeyId": "",
    "SecretAccessKey": ""
  },
  "Consumer": {
    "AccountId": "",
    "AccessKeyId": "",
    "SecretAccessKey": ""
  },
  "ConsumerAdmin": {
    "AccountId": "",
    "AccessKeyId": "",
    "SecretAccessKey": ""
  }
}

This file includes the following identities:

  • Mesh - Administrative identity used to configure and manage central Data Mesh objects like catalogs and shared tables. This identity is required for initializing the Data Mesh infrastructure.
  • ProducerAdmin - Administrative identity used to setup an account as as data producer. This identity is only used to enable an AWS account on initial setup.
  • ConsumerAdmin - Administrative identity used to setup an account as as data consumer. This identity is only used to enable an AWS account on initial setup.
  • Producer - Identity used for day-to-day producer tasks such as create-data-product, approve-access-request and modify-subscription. In general, you should use the pre-installed DataMeshProducer user or those users who are part of the DataMeshProducerGroup in the Producer AWS Account.
  • Consumer - Identity used for day-to-day consumer tasks such as request-access and import-subscription. In general, you should use the pre-installed DataMeshConsumer user or those users who are part of the DataMeshConsumerGroup in the Consumer AWS Account.

For the example usage scripts, you can configure a file on your filesystem, and eference this file in through the CredentialsFile environment variable. For the cli, you can provide the path to this file using argument --credentials-file. Please make sure not to add this file to any publicly shared resources such as git forks of the codebase!

Getting Started

To install AWS Data Mesh Utils, install from Pypi:

Step 0.0 - Install Data Mesh Utils Helper Library

pip install aws-data-mesh-utils

Step 1.0 - Install the Data Mesh

To start using Data Mesh Utils, you must first configure an AWS Account to act as the data mesh account. Today, due to how Lake Formation manages Resource Links, you must have a single data mesh account per region where you want to share data. In future we'll look to support multi-region sharing.

Installing the Data Mesh Utility functions must be run as 1/an AWS Administrative account, which 2/has Lake Formation Data Lake Admin permissions granted. This activity will only be done once. When you have granted the needed permissions, run the Data Mesh Installer with:

import logging
from data_mesh_util import DataMeshAdmin as dmu

'''
Script to configure an set of accounts as central data mesh. Mesh credentials must have AdministratorAccess and Data Lake Admin permissions.
'''

data_mesh_account = 'insert data mesh account number here
aws_region = 'insert the AWS Region you want to install into'
credentials = {
    "AccessKeyId": "your access key",
    "SecretAccessKey": "your secret key",
    "SessionToken": "optional - a session token, if you are using an IAM Role & temporary credentials"
}

# create the data mesh
mesh_admin = dmu.DataMeshAdmin(
    data_mesh_account_id=data_mesh_account,
    region_name=aws_region,
    log_level=logging.DEBUG,
    use_credentials=credentials
)
mesh_admin.initialize_mesh_account()

or

./data-mesh-cli install-mesh-objects --credentials-file <my credentials file> ...

You can also use examples/0_setup_central_account.py as an example to build your own application.

If you get an error that looks like:

An error occurred (AccessDeniedException) when calling the PutDataLakeSettings operation: User: arn:aws:iam::<account>:user/<user> is not authorized to perform: lakeformation:PutDataLakeSettings on resource: arn:aws:lakeformation:us-east-1:<account>:catalog:<account> with an explicit deny in an identity-based policy

This probably means that you have attached the AWSLakeFormationDataAdmin IAM policy to your user, which prevents you setting data lake permissions.

Step 1.1 - Enable an AWS Account as a Producer

You must configure an account to act as a Producer in order to offer data shares to other accounts. This is an administrative task that is run once per AWS Account. The configured credentials must have AdministratorAccess as well as Lake Formation Data Lake Admin. To setup an account as a Producer, run:

import logging
from data_mesh_util.lib.constants import *
from data_mesh_util import DataMeshMacros as data_mesh_macros

'''
Script to configure an set of accounts as central data mesh. Mesh credentials must have AdministratorAccess and Data Lake Admin permissions.
'''

data_mesh_account = 'insert data mesh account number here
aws_region = 'the AWS region you are working in'
mesh_credentials = {
    "AccessKeyId": "your access key",
    "SecretAccessKey": "your secret key",
    "SessionToken": "optional - a session token, if you are using an IAM Role & temporary credentials"
}
producer_credentials = {
    "AccountId": "the target AWS Account ID",
    "AccessKeyId": "your access key",
    "SecretAccessKey": "your secret key",
    "SessionToken": "optional - a session token, if you are using an IAM Role & temporary credentials"
}

# create a macro handler which works across accounts
mesh_macros = data_mesh_macros.DataMeshMacros(
    data_mesh_account_id=data_mesh_account,
    region_name=aws_region,
    log_level=logging.DEBUG
)

# configure the producer account
mesh_macros.bootstrap_account(
    account_type=PRODUCER,
    mesh_credentials=mesh_credentials,
    account_credentials=producer_credentials
)

or

./data-mesh-cli enable-account --credentials-file <credentials-file> --account-type producer ...

You can also use examples/0_5_setup_account_as.py as an example to build your own application.

Step 1.2: Enable an AWS Account as a Consumer

Accounts can be both producers and consumers, so you may wish to run this step against the account used above. You may also have Accounts that are Consumer only, and cannot create data shares. This step is only run once per AWS Account and must be run using credentials that have AdministratorAccess as well as being Lake Formation Data Lake Admin:

from data_mesh_util.lib.constants import *
from data_mesh_util import DataMeshMacros as data_mesh_macros

data_mesh_account = 'insert data mesh account number here
aws_region = 'the AWS region you are working in'
mesh_credentials = {
    "AccessKeyId": "your access key",
    "SecretAccessKey": "your secret key",
    "SessionToken": "optional - a session token, if you are using an IAM Role & temporary credentials"
}
consumer_credentials = {
    "AccountId": "the target AWS Account ID",
    "AccessKeyId": "your access key",
    "SecretAccessKey": "your secret key",
    "SessionToken": "optional - a session token, if you are using an IAM Role & temporary credentials"
}

# create a macro handler which works across accounts
mesh_macros = data_mesh_macros.DataMeshMacros(
    data_mesh_account_id=data_mesh_account,
    region_name=aws_region,
    log_level=logging.DEBUG
)

# configure the consumer account
mesh_macros.bootstrap_account(
    account_type=CONSUMER,
    mesh_credentials=mesh_credentials,
    account_credentials=consumer_credentials
)

or

./data-mesh-cli enable-account --credentials-file <credentials-file> --account-type producer ...

The above Steps 1.1 and 1.2 can be run for any number of accounts that you require to act as Producers or Consumers. You can also use examples/0_5_setup_account_as.py as an example to build your own application.. If you want to provision an account as both Producer and Consumer, then use account_type='both' in the above call to bootstrap_account().

Step 2: Create a Data Product

Data products can be created from one-or-more Glue tables, and the API provides a variety of configuration options to allow you to control how they are exposed. To create a data product:

from data_mesh_util import DataMeshProducer as dmp

data_mesh_account = 'insert data mesh account number here'
aws_region = 'the AWS region you are working in'
producer_credentials = {
    "AccountId": "The Producer AWS Account ID",
    "AccessKeyId": "Your access key",
    "SecretAccessKey": "Your secret key",
    "SessionToken": "Optional - a session token, if you are using an IAM Role & temporary credentials"
}
data_mesh_producer = dmp.DataMeshProducer(
    data_mesh_account_id=data_mesh_account,
    log_level=logging.DEBUG,
    region_name=aws_region,
    use_credentials=producer_credentials
)

database_name = 'The name of the Glue Catalog Database where the table lives'
table_name = 'The Table Name'
domain_name = 'The name of the Domain which the table should be tagged with'
data_product_name = 'If you are publishing multiple tables, the product name to be used for all'
cron_expr = 'daily'
crawler_role = 'IAM Role that the created Glue Crawler should run as - calling identity must have iam::PassRole on the ARN'
create_public_metadata = True if 'Use value True to allow any user to see the shared object in the data mesh otherwise False' else False

data_mesh_producer.create_data_products(
    source_database_name=database_name,
    table_name_regex=table_name,
    domain=domain_name,
    data_product_name=data_product_name,
    create_public_metadata=True,
    sync_mesh_catalog_schedule=cron_expr,
    sync_mesh_crawler_role_arn=crawler_role,
    expose_data_mesh_db_name=None,
    expose_table_references_with_suffix=None,
    use_original_table_name=None
)

or

./data-mesh-cli create-data-product --credentials-file <credentials-file> --source-database-name <database> --table-regex <regular expression matching tables> ...

You can also use examples/1_create_data_product.py as an example to build your own application.

By default, a data product replicates Glue Catalog metadata from the Producer's account into the Data Mesh account. The new tables created in the Data Mesh account are shared back to the Producer account through a new database and resource link which let's the Producer change objects in the mesh from within their own Account.

Alternatively, some customers may wish to have a single version of their table metadata which only resides within the Data Mesh, for example for when datasets are prepared specifically for sharing. In this case, the create-data-product request allows for the version of the Table in the Data Mesh to be the only master copy, and transparently shared back to the producer. To use this option, instead use API migrate_tables_to_mesh:

...

data_mesh_producer.migrate_tables_to_mesh(
    source_database_name=database_name,
    table_name_regex=table_name,
    domain=domain_name,
    data_product_name=data_product_name,
    create_public_metadata=True,
    sync_mesh_catalog_schedule=cron_expr,
    sync_mesh_crawler_role_arn=crawler_role
)

Upon completion, you will see that the table in the Producer AWS Account has been replaced with a Resource Link shared from the Mesh Account. Your producer Account may now be able to query data both from within the data mesh and from their own account, but the security Principal used for Data Mesh Utils may require additional permissions to use Athena or other query services.

Step 3: Request access to a Data Product Table

As a consumer, you can gain view public metadata by assuming the DataMeshReadOnly role in the mesh account. You can then create an access request for data products using:

from data_mesh_util import DataMeshConsumer as dmc

data_mesh_account = 'insert data mesh account number here'
aws_region = 'the AWS region you are working in'
consumer_credentials = {
    "AccountId": "The Consumer AWS Account ID",
    "AccessKeyId": "Your access key",
    "SecretAccessKey": "Your secret key",
    "SessionToken": "Optional - a session token, if you are using an IAM Role & temporary credentials"
}
data_mesh_consumer = dmp.DataMeshConsumer(
    data_mesh_account_id=data_mesh_account,
    region_name=aws_region,
    use_credentials=consumer_credentials
)

owner_account_id = 'the account ID of the producer who owns the objects'
database_name = 'the name of the database containing the objects to subscribe to'
tables = 'The table name or regular expression that matches multiple tables'
request_permissions = ['The list of permissions you are asking for, such as', 'SELECT', 'DESCRIBE']

subscription = data_mesh_consumer.request_access_to_product(
    owner_account_id=owner_account_id,
    database_name=database_name,
    tables=tables,
    request_permissions=request_permissions
)
print(subscription.get('SubscriptionId')

or

./data-mesh-cli request-access --credentials-file <credentials-file> --database-name <database> --tables <table1, table2, table3> --request-permissions <list of permissions requested, including INSERT, SELECT, DESCRIBE, UPDATE, DELETE> ...

You can also use examples/2_consumer_request_access.py as an example to build your own application.

Step 4: Grant or Deny Access to the Consumer

In this step, you will grant permissions to the Consumer who has requested access:

from data_mesh_util import DataMeshProducer as dmp

data_mesh_account = 'insert data mesh account number here'
aws_region = 'the AWS region you are working in'
producer_credentials = {
	"AccountId": "The Producer AWS Account ID",
	"AccessKeyId": "Your access key",
	"SecretAccessKey": "Your secret key",
	"SessionToken": "Optional - a session token, if you are using an IAM Role & temporary credentials"
}
data_mesh_producer = dmp.DataMeshProducer(
   data_mesh_account_id=data_mesh_account,
   region_name=aws_region,
	use_credentials=producer_credentials
)

# get the pending access requests
pending_requests = data_mesh_producer.list_pending_access_requests()

# pick one to approve
choose_subscription = pending_requests.get('Subscriptions')[0]

# The subscription ID that the Consumer created and returned from list_pending_access_requests()
subscription_id = choose_subscription.get('SubscriptionId')

# Set the permissions to grant to the Consumer - in this case whatever they asked for
grant_permissions = choose_subscription.get('RequestedGrants')

# List of permissions the consumer can pass on. Usally only DESCRIBE or SELECT
grantable_permissions = ['DESCRIBE','SELECT']

# String value to associate with the approval
approval_notes = 'Enjoy!'

# approve access requested
approval = data_mesh_producer.approve_access_request(
    request_id=subscription_id,
    grant_permissions=grant_permissions,
    grantable_permissions=grantable_permissions,
    decision_notes=approval_notes
)

# or deny access request
approval = data_mesh_producer.deny_access_request(
    request_id=subscription_id,
    decision_notes="no way"
)

or

./data-mesh-cli approve-subscription --credentials-file <credentials-file> --request_id <request id> --notes <notes with the approval> ...

./data-mesh-cli deny-subscription --credentials-file <credentials-file> --request_id <request id> --decision-notes <notes for the denial>

You can also use examples/3_grant_data_product_access.py as an example to build your own application.

Step 5: Import Permissions to Consumer Account

Permissions have been granted, but the Consumer must allow those grants to be imported into their account:

from data_mesh_util import DataMeshConsumer as dmc

data_mesh_account = 'insert data mesh account number here'
aws_region = 'the AWS region you are working in'
consumer_credentials = {
    "AccountId": "The Consumer AWS Account ID",
    "AccessKeyId": "Your access key",
    "SecretAccessKey": "Your secret key",
    "SessionToken": "Optional - a session token, if you are using an IAM Role & temporary credentials"
}
data_mesh_consumer = dmp.DataMeshConsumer(
    data_mesh_account_id=data_mesh_account,
    region_name=aws_region,
    use_credentials=consumer_credentials
)

# use the subscription ID which has been requested
subscription_id = 'GUxwswjEFRzgwow8zqVwGC'

data_mesh_consumer.finalize_subscription(
	subscription_id=subscription_id
)

or

./data-mesh-cli import-subscription --credentials-file <credentials-file> --subscription_id <subscription request id> ...

You can also use examples/4_finalize_subscription.py as an example to build your own application.


Amazon Web Services, 2021 All rights reserved.

aws-data-mesh-utils's People

Contributors

amazon-auto avatar ianmeyers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-data-mesh-utils's Issues

DataMeshConsumer role has no access to shared data product

We experimented with the aws-data-mesh-utils to create a simple setup similar to the one outlined in the documentation here.

However, we noticed that the DataMeshConsumer role does not seem to have access to the shared data product (it can see the Glue database, but not the shared Glue table due to insufficient Lake Formation permissions). The reason for this seems to be that the approve-subscription step grants Lake Formation permissions to the consumer account as principal, and not to the DataMeshConsumer IAM role as principal.

Is this the intended setup? From what I can see in the code, the request-access step uses the requesting account ID as the subscriber principal in the DynamoDB table entry and in the subsequent approve-subscription step, this account is used as the principal the Lake Formation permissions are granted to.

Step 2 IAM problem from readme file

Hello,

First of all, thank you for this data mesh repository and the hard work behind. I have been testing the manual step by step from the readme file and I am facing some issues:

  1. I faced this error while running the step 2:
    botocore.errorfactory.AccessDeniedException: An error occurred (AccessDeniedException) when calling the CreateCrawler operation: User: arn:aws:sts::753878601211:assumed-role/DataMeshProducer/AIDA27BVKRX56JUXBMRWP-753878601211-2022-02-03 is not authorized to perform: iam:PassRole on resource: arn:aws:iam::753878601211:role/service-role/AWSGlueServiceRole-datamesh because no identity-based policy allows the iam:PassRole action

I sorted it by manually adding the iam:PassRole to the user.

Also, in my humble opinion, I would recommend to clearly say that it is a list to pass for the tables variable in step 3 since I had the same issue in terms of passing a string.

Atomic APIs

Today APIs are idempotent but not atomic. A failure on one of the API calls put resources in an inconsistent state. We need a rollback mechanism to revert the successful operations.

Unable to create Data Product

I have a database and table in producer account. I am trying to run the following script as mentioned in your blog
import logging
from data_mesh_util import DataMeshProducer as dmp

data_mesh_account = '~~~~~~~~'
aws_region = 'us-east-1'
producer_credentials = {
"AccountId": "#############",
"AccessKeyId": "###################",
"SecretAccessKey": "#########################"
}
data_mesh_producer = dmp.DataMeshProducer(
data_mesh_account_id=data_mesh_account,
log_level=logging.DEBUG,
region_name=aws_region,
use_credentials=producer_credentials
)

database_name = 'redshift'
table_name ='cars'
domain=None
data_product_name=None
cron_expr=None
crawler_role =None
create_public_metadata = True

data_mesh_producer.create_data_products(
source_database_name=database_name,
table_name_regex=table_name,
domain=domain,
data_product_name=data_product_name,
create_public_metadata=True,
sync_mesh_catalog_schedule=cron_expr,
sync_mesh_crawler_role_arn=crawler_role,
expose_data_mesh_db_name=None,
expose_table_references_with_suffix=None
)

Now I have tried to run it in my producer account and data mesh admin account as well after facing error but nothing works...
It throws the following error :-

Loaded 3 tables matching None from Glue
Verified Database redshift-175908995626
Validated Data Mesh Database redshift-175908995626
175908995626 Database redshift-175908995626 Permissions:['CREATE_TABLE', 'DESCRIBE']
Granted access on Database redshift-175908995626 to Producer
Verified Database redshift-175908995626
Validated Producer Account Database redshift-175908995626
Existing Table Definition
{'Name': 'cars', 'Owner': '175908995626', 'LastAccessTime': datetime.datetime(2022, 1, 7, 17, 43, 9, tzinfo=tzlocal()), 'Retention': 0, 'StorageDescriptor': {'Columns': [{'Name': 'id', 'Type': 'bigint'}, {'Name': 'car', 'Type': 'string'}], 'Location': 's3://aws-analytics-course/redshift/data/csv/cars/', 'InputFormat': 'org.apache.hadoop.mapred.TextInputFormat', 'OutputFormat': 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', 'Compressed': False, 'NumberOfBuckets': -1, 'SerdeInfo': {'SerializationLibrary': 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe', 'Parameters': {'field.delim': ','}}, 'BucketColumns': [], 'SortColumns': [], 'Parameters': {'CrawlerSchemaDeserializerVersion': '1.0', 'CrawlerSchemaSerializerVersion': '1.0', 'UPDATED_BY_CRAWLER': 'redshift', 'areColumnsQuoted': 'false', 'averageRecordSize': '7', 'classification': 'csv', 'columnsOrdered': 'true', 'compressionType': 'none', 'delimiter': ',', 'objectCount': '1', 'recordCount': '16', 'sizeKey': '112', 'skip.header.line.count': '1', 'typeOfData': 'file'}, 'StoredAsSubDirectories': False}, 'PartitionKeys': [], 'TableType': 'EXTERNAL_TABLE', 'Parameters': {'CrawlerSchemaDeserializerVersion': '1.0', 'CrawlerSchemaSerializerVersion': '1.0', 'UPDATED_BY_CRAWLER': 'redshift', 'areColumnsQuoted': 'false', 'averageRecordSize': '7', 'classification': 'csv', 'columnsOrdered': 'true', 'compressionType': 'none', 'delimiter': ',', 'objectCount': '1', 'recordCount': '16', 'sizeKey': '112', 'skip.header.line.count': '1', 'typeOfData': 'file'}}
Created new Glue Table cars
175908995626 Table cars Column Permissions:['INSERT', 'SELECT', 'ALTER', 'DELETE', 'DESCRIBE'], ['INSERT', 'SELECT', 'ALTER', 'DELETE', 'DESCRIBE'] WITH GRANT OPTION
175908995626 Table cars Permissions:['ALTER', 'DESCRIBE', 'INSERT', 'DELETE'], ['ALTER', 'DESCRIBE', 'INSERT', 'DELETE'] WITH GRANT OPTION
Traceback (most recent call last):
File "create-data-product", line 35, in
expose_table_references_with_suffix=None
File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/DataMeshProducer.py", line 314, in create_data_products
use_original_table_name=use_original_table_name
File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/DataMeshProducer.py", line 145, in _create_mesh_table
grantable_permissions=perms
File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/lib/ApiAutomator.py", line 892, in lf_grant_permissions
grantable_permissions=grantable_permissions
File "/home/cloudshell-user/.local/lib/python3.7/site-packages/data_mesh_util/lib/ApiAutomator.py", line 872, in lf_batch_grant_permissions
Entries=entries
File "/home/cloudshell-user/.local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/cloudshell-user/.local/lib/python3.7/site-packages/botocore/client.py", line 678, in _make_api_call
api_params, operation_model, context=request_context)
File "/home/cloudshell-user/.local/lib/python3.7/site-packages/botocore/client.py", line 726, in _convert_to_request_dict
api_params, operation_model)
File "/home/cloudshell-user/.local/lib/python3.7/site-packages/botocore/validate.py", line 319, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in Entries[0]: "Id"
Missing required parameter in Entries[1]: "Id"

I am unable to get what this error means , I tried everything to solve it but nothing works. I request you to please let me know what this error means or am I doing anything wrong.

My contact Details are πŸ‘ Email : [email protected]
M- +91-9650819894
Name : Rijul Seth

Recurring error for Consumer Account (lakeformation:GetDataLakeSettings)

Hi all!

For the consumer account, I don't know which policies should be applied for Lakeformation.
This error pops up when I apply too many policies on this account:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "lakeformation:GetDataLakeSettings" ], "Resource": "*" } ] }

File "C:\Users\Anaconda3\lib\site-packages\data_mesh_util\DataMeshAdmin.py", line 381, in _initialize_account_as self._automator.assert_is_data_lake_admin( File "C:\Users\64324\Anaconda3\lib\site-packages\data_mesh_util\lib\ApiAutomator.py", line 668, in assert_is_data_lake_admin raise Exception(f"Principal {principal} is not Data Lake Admin") Exception: Principal arn:aws:iam::[ACCOUNT_ID]:user/Consumer is not Data Lake Admin

However, when I apply too little policies (removing the above policy), another error pops up:

botocore.errorfactory.AccessDeniedException: An error occurred (AccessDeniedException) when calling the GetDataLakeSettings operation: User: arn:aws:iam::[ACCOUNT_ID]:user/AwsDataMesh/DataMeshProducer is not authorized to perform: lakeformation:GetDataLakeSettings on resource: arn:aws:lakeformation:us-east-1:[ACCOUNT_ID]:catalog:[ACCOUNT_ID] because no identity-based policy allows the lakeformation:GetDataLakeSettings action

Even when I use the DataMeshProducer user generated by the DataMeshManager I get this (second) error. And when I manually add lakeformation:GetDataLakeSettings to the permissions of my user, the first error returns.

Could you help me getting the right policy structure for the consumer account in this repo?

Kind regards,

Tom

data_mesh_util\lib\utils.py - AttributeError: 'Session' object has no attribute 'client'

First of all I would like to say that we are very grateful for this great contribution, thank you!

I may be doing something wrong because I had a problem performing the step "Step 1.0 - Install the data mesh":

first - error: use_creds=credentials TypeError: init() got an unexpected keyword argument 'use_creds'.
second - After seeing the class declaration in DataMeshAdmin.py replace use_creds by use_credentialss and rerun, prem I had a new problem

"....data_mesh_util\lib\utils.py", line 249, in generate_client
return session.client(service)
AttributeError: 'Session' object has no attribute 'client'

I apologize if this is a beginner's question.

I tried running examples/0_setup_central_account.py example and got to the same point

Products not being shown in the consumer side

After building the data mesh architecture, with a colleague we are sharing a solution to the problem of the producer tables not showing in the consumer side.

--Context
With a colleague we have run end to end the data-mesh library (steps examples). We encounter a few challenges on running this. We had to manually play with some parts of the code or adapt some ways to solve issues we encounter through the journey. We each built the end to end architecture separately and solved the problems we faced in different ways and started communicating at the end when we faced the same final issue.

--Issue
The issue is that we were not able to share the product to the consumer side after building the architecture. We were seeing the databases from the producer side created in the central account but we would not be able to share this to the consumer side. The source of the problem seemed to be coming from the resource access manager. When we tried to manually share the resources via resource access manager or the Lake Formation, it would appear as failed without much information about it to deep dive.

--Solution
The solution to this is to delete the setting in the Glue of the central account. This harsh solution allowed us to after share the databases and tables from the central account to the consumer side and allow the consumer to link a database and access the data. We do not have/know yet the repercussions (if any) it will bring to the rest of architecture by doing this way. We recommend you as well to save what you are deleting in the settings from the Glue into an external file in case you need to reverse the process.

Once you have deleted the Glue settings, you can remove in the Lake Formation side in the central account any access to the database (the duplicated database) and table that the consumer is suppose to have. You can do this by navigating in the Data lake permissions tab in the lake formation service, select the data permission and click on revoke. After removing it, give back the access with the same permissions (to the consumer account) by clicking on grant and following the steps there. You will need to do it twice, one for the database, and one for the table(s). Once this done, you can go to the consumer account resource access manager and accept the request to share resources sent by the central account. That’s it, you have now access in the consumer side. Make sure to create a link from the database since it will serve you as a base for fine grained permission in the consumer account

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.