Git Product home page Git Product logo

Comments (5)

Maayan-s avatar Maayan-s commented on May 11, 2024

Hi @yu-iskw ,
We support cross projects lineage, I think the reason it didn't work is the bug you fixed with the quoting (thanks for that :)).
It is already in master, I will let you know when we release a new version with it later today so you could update and try it.

The flow for creating a cross-projects lineage would be:

## Generate a lineage graph from query logs of the databases (projects) db1,db2
## make sure you don't have white spaces between the project names:

edr lineage generate -dbs db1,db2

This will create a local file with the lineage from both projects. You can filter on dates as well.
Next use the lineage command and filters to visualize:

## Visualize the lineage graph created in the previous command
## (cached in a local file)

edr lineage -o true

Note that to create a new local file you need to run edr lineage generate again.

from elementary.

yu-iskw avatar yu-iskw commented on May 11, 2024

Hi @Maayan-s , Thak you for the instruction. I misunderstood --databases is used to pass BigQuery datasets, not GCP project ID. I was able to generate elementary_lineage.html, but edr lineage -o true didn't launch browser.

Aside from that, as the number of tables in my lineage is large, drawing the graph doesn't work as expected. I tried to zoom up the graph, but I couldn't do that well. Of course, when we generate a graph with less-frequently used project, that work.

Aside from that, as elementary uses networkx to deal with graph, I would like to export a graph as GraphML to enable us to visualize a graph with other graph visualization tools as well. For instance, Gephi can deal with millions of nodes with ease. So, even if a graph contains a large number of nodes and edges, we may be able to visualize it. What do you think of the idea?

image

from elementary.

Maayan-s avatar Maayan-s commented on May 11, 2024

@yu-iskw
Thanks for the recommendations on the graph visualization alternatives, we are exploring how to improve the UI, we know that networkx has its limitations. I'll take a look at your recommendations for it!
As a mean to overcome this until than, we separated the graph generation (extract queries, parse, build graph) and the visualization (filter and visualize) into two different commands.
Our assumption was that if the graph is too big for networkx, it is also too big for a person to work with it.
So the flow is edr lineage generate to create a big graph (creates a local pickle file) -> then edr lineage --table +<table_name>+ or any other filter, to visualize the part of the graph you want to see at the moment. This means you can iterate on the same graph with different filters fast (because you filter against a local file).
If you try to open use the pickle with a different visualizations and get better results I would love to hear about that :)

The open-browser command does not always work, it depends on the process you used to execute edr and whether it has permissions to launch a browser or not.

from elementary.

yu-iskw avatar yu-iskw commented on May 11, 2024

It sounds good to separate the features as independent commands. As for me, it would be awesome if we can specify an output format.

from elementary.

yu-iskw avatar yu-iskw commented on May 11, 2024

By the way, as I understand the feature, I close the issue.

from elementary.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.