Comments (5)
Hi @yu-iskw ,
We support cross projects lineage, I think the reason it didn't work is the bug you fixed with the quoting (thanks for that :)).
It is already in master, I will let you know when we release a new version with it later today so you could update and try it.
The flow for creating a cross-projects lineage would be:
## Generate a lineage graph from query logs of the databases (projects) db1,db2
## make sure you don't have white spaces between the project names:
edr lineage generate -dbs db1,db2
This will create a local file with the lineage from both projects. You can filter on dates as well.
Next use the lineage command and filters to visualize:
## Visualize the lineage graph created in the previous command
## (cached in a local file)
edr lineage -o true
Note that to create a new local file you need to run edr lineage generate
again.
from elementary.
Hi @Maayan-s , Thak you for the instruction. I misunderstood --databases
is used to pass BigQuery datasets, not GCP project ID. I was able to generate elementary_lineage.html
, but edr lineage -o true
didn't launch browser.
Aside from that, as the number of tables in my lineage is large, drawing the graph doesn't work as expected. I tried to zoom up the graph, but I couldn't do that well. Of course, when we generate a graph with less-frequently used project, that work.
Aside from that, as elementary uses networkx to deal with graph, I would like to export a graph as GraphML to enable us to visualize a graph with other graph visualization tools as well. For instance, Gephi can deal with millions of nodes with ease. So, even if a graph contains a large number of nodes and edges, we may be able to visualize it. What do you think of the idea?
- https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.graphml.write_graphml.html
- https://gephi.org/
- https://gephi.org/users/supported-graph-formats/
from elementary.
@yu-iskw
Thanks for the recommendations on the graph visualization alternatives, we are exploring how to improve the UI, we know that networkx has its limitations. I'll take a look at your recommendations for it!
As a mean to overcome this until than, we separated the graph generation (extract queries, parse, build graph) and the visualization (filter and visualize) into two different commands.
Our assumption was that if the graph is too big for networkx, it is also too big for a person to work with it.
So the flow is edr lineage generate
to create a big graph (creates a local pickle file) -> then edr lineage --table +<table_name>+
or any other filter, to visualize the part of the graph you want to see at the moment. This means you can iterate on the same graph with different filters fast (because you filter against a local file).
If you try to open use the pickle with a different visualizations and get better results I would love to hear about that :)
The open-browser command does not always work, it depends on the process you used to execute edr and whether it has permissions to launch a browser or not.
from elementary.
It sounds good to separate the features as independent commands. As for me, it would be awesome if we can specify an output format.
from elementary.
By the way, as I understand the feature, I close the issue.
from elementary.
Related Issues (20)
- Error when trying to run report for source freshness HOT 2
- Support boolean columns and their metrics in BigQuery HOT 1
- Ability to show the row count of a given model HOT 1
- month option for time bucket throws error HOT 2
- Add test duration to the report
- avg_percent_anomalous_condition macro should use OR instead of AND condition to correctly calculate is_anomalous when anomaly_direction='both' HOT 1
- [ELE-2061] Performance tuning of table_monitoring_query macro HOT 1
- Add flexibility to Elementary Report with select exclude options to Lineage Graph
- Add Argument for aws session token in edr send-report HOT 1
- Error while installing Elementary plug in to my dbt_project HOT 5
- on_run_end hook to clean up test tables
- Airflow HOT 2
- Consolidate results and Display Test reports once for multiple dbt projects HOT 1
- snowflake__get_profile_creation_query does not work depending on environment HOT 1
- Request for volume anomalies to pre-aggregate data before computing statistics
- [ELE-2165] No error message on channel ID not found in send report
- Real test names in report
- Issue on docs - add externalbrowser to example?
- Issue on docs
- PostgreSQL Replica Issue when Using Elementary HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elementary.