Motivation As we use multiple GCP projects to execute BigQuery que

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Feature] Support multiple BigQuery projects about elementary HOT 5 CLOSED

elementary-data commented on May 11, 2024

[Feature] Support multiple BigQuery projects

from elementary.

Comments (5)

Maayan-s commented on May 11, 2024

Hi @yu-iskw ,
We support cross projects lineage, I think the reason it didn't work is the bug you fixed with the quoting (thanks for that :)).
It is already in master, I will let you know when we release a new version with it later today so you could update and try it.

The flow for creating a cross-projects lineage would be:

## Generate a lineage graph from query logs of the databases (projects) db1,db2
## make sure you don't have white spaces between the project names:

edr lineage generate -dbs db1,db2

This will create a local file with the lineage from both projects. You can filter on dates as well.
Next use the lineage command and filters to visualize:

## Visualize the lineage graph created in the previous command
## (cached in a local file)

edr lineage -o true

Note that to create a new local file you need to run edr lineage generate again.

from elementary.

yu-iskw commented on May 11, 2024

Hi @Maayan-s , Thak you for the instruction. I misunderstood --databases is used to pass BigQuery datasets, not GCP project ID. I was able to generate elementary_lineage.html, but edr lineage -o true didn't launch browser.

Aside from that, as the number of tables in my lineage is large, drawing the graph doesn't work as expected. I tried to zoom up the graph, but I couldn't do that well. Of course, when we generate a graph with less-frequently used project, that work.

Aside from that, as elementary uses networkx to deal with graph, I would like to export a graph as GraphML to enable us to visualize a graph with other graph visualization tools as well. For instance, Gephi can deal with millions of nodes with ease. So, even if a graph contains a large number of nodes and edges, we may be able to visualize it. What do you think of the idea?

from elementary.

Maayan-s commented on May 11, 2024

@yu-iskw
Thanks for the recommendations on the graph visualization alternatives, we are exploring how to improve the UI, we know that networkx has its limitations. I'll take a look at your recommendations for it!
As a mean to overcome this until than, we separated the graph generation (extract queries, parse, build graph) and the visualization (filter and visualize) into two different commands.
Our assumption was that if the graph is too big for networkx, it is also too big for a person to work with it.
So the flow is edr lineage generate to create a big graph (creates a local pickle file) -> then edr lineage --table +<table_name>+ or any other filter, to visualize the part of the graph you want to see at the moment. This means you can iterate on the same graph with different filters fast (because you filter against a local file).
If you try to open use the pickle with a different visualizations and get better results I would love to hear about that :)

The open-browser command does not always work, it depends on the process you used to execute edr and whether it has permissions to launch a browser or not.

from elementary.

yu-iskw commented on May 11, 2024

It sounds good to separate the features as independent commands. As for me, it would be awesome if we can specify an output format.

from elementary.

yu-iskw commented on May 11, 2024

By the way, as I understand the feature, I close the issue.

from elementary.

[Feature] Support multiple BigQuery projects about elementary HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent