oss-aspen / 8knot Goto Github PK

Dash app in development to serve open source community visualizations using GitHub data from Augur. Hosted app: https://eightknot.osci.io

License: MIT License

Python 99.06% CSS 0.82% Dockerfile 0.11%

8knot's Issues

UI: add descriptions to visualizations

For new users coming onto the page, context on each visualization would be helpful. What format this is TBD by the contributor that works on this, but the following should be done:

Create a consistent format for the descriptions to be added to each viz but only visible if someone hovers or clicks something. This should be reproducible for each future visualization made
type out a description for each one currently made with guidance on toggles

Determine next page to start on for explorer

We should start to think about the next page(s) for explorer once the overview page is on the "plug and chug" phase on visualizations.
Questions:

What page should be next?
What are some proposed visualizations for this page?
What are the pages we play on doing? How many? What grouping?

Adding user persistence - settings and history

Various opportunities of users to save some instance state relating to their workflow have become evident. Therefore, we ought to see how we can integrate user persistence where possible and explore best practices for doing this with Dash.

We could integrate it into a Javascript / React-based webapp:
https://dash.plotly.com/integrating-dash

Or we could try to embed the Dash app inside of a Flask app:
https://hackersandslackers.com/plotly-dash-with-flask/

Or we could ultimately move to a flask / plotly stack rather than relying on Dash:
https://towardsdatascience.com/web-visualization-with-plotly-and-flask-3660abf9c946

Interestingly, the last option seems to be more extendible in this respect, but it would obviously take us out of the nice sandbox that Dash allows us to play in.

More to come.

Visualization: Love-orbit

Updates to come with more details. As a starting point as we explore anonymity, we will start a graph ploting the number of contributors over time with a variable "love score". This will allow for seeing active contributors based on multiple different inputs scaled over time (issues, PRs, comments, etc)

Inspiration: https://github.com/orbit-love/orbit-model

Chaoss collaboration trial

With transitioning into the visualization creation phase of this project, the goal of explorer was to provide a deeper perspective than other tooling available. Choass metrics working group has spent time collaborating and working through the theoretical side of this, the "what to measure and why" side, with diverse perspectives in the open source landscape. More details to come on the break down of tasks

Visualization to be created (more to be added):

Total issues submitted over time

New Visualization: Response rate to Bugs

Please describe the background and context for this new visualization
Response/Close rate to issues with the bug/defect tag

Describe the perspective you'd like the final visual to give
We would like to show the response rate to an issue that can be assumed to be associated with a bug.

Describe the acceptance criteria for the issue and visualization to be complete
This should be completed in the finalized visualization tool or as a demo testing tools out

Additional context
Add any other context or screenshots about the feature request here.

Active contributors by Action

This visualization will show the number of contributors by time interval per action.

User inputs:

selects from drop down which action to look at
chooses # of months for time buckets

This will be implemented using a histogram.

Integrate non-visualization "metrics" to Overview Page

Refer to the miro board design for reference

The work pipeline for non visualization data points is TBA

Non-Visualizations Required for Completion(short hand, refer to miro board for more details):

Question: Is there a different/better way of doing queries in python/dash than sqlalchemy?

No issues with uses sqlalchemy, just want to make sure this early in the dev process that we research other options a little to make sure this is the correct path

Search bar options based off of the github repos or organizations in our augur instance

TODO:

figure out query to get all github urls for organizations (only list orgs that have all repos loaded in )
determine how to identify if repo is in an org or just on a users github account
If query is resource extensive (and most likely even if not) how to store query values for it is only triggered once when app is opened

Clean up page headers (remove "live update")

Bug Fix: Connection to the Augur instance is severed in between calls.

The connection to Augur drops and causes a callback exception sometimes. Working on minimizing this problem with a pessimistic connection-checking process detailed in the sqlalchemy documentation using 'pre_pool_ping'.

Question: Can a search bar be connected across pages?

Once the search bar is created on opening page, is there a way to

Show selected options on all pages
make it where it shows as the boxes with x's (figure out better way to verbalize this) for people can deselect options and call back triggers visualizations reload
connecting search bar across pages where its all connected (some form of "global variable esque thing)

Search bar- Object across pages

We want our user to be able to change their inputs to the search bar on any page they are on where it triggers updates across pages. More details to come

Loading bar

Currently the queries are making the page load slowly. We need to add a loading bar to inform users that the app is working, just loading. Add in loading view for the visualizations as well

https://www.youtube.com/watch?v=t1bKNj021do&list=PLh3I780jNsiS3xlk-eLU2dpW3U-wCq4LW&index=6

Collaborate for UX/UI design guidance

TODO:

meet with tigger and liz to get input
determine color scheme
determine basic design template for pages
design for opening page

dash bootstrap themes

Bug fix: ~12 second wait time for app to load

Apparently not linked to getting or setting data, our app can take upward of 12 seconds to become usable upon initialization.

I'm going to profile the callbacks to determine the origin of this bottleneck.

EDIT:
How will a dynamic callback for the dropdown affect this problem?

Implement the dynamic dropdown
check speed compared to the current soln
verify if the dynamic dropdown is the alternative we should be using

Charming data Dash Overview

@cdolfi will be going through the following videos and documenting notes and reflections here:

Introduction to Dash Plotly Data Visualization in Python - https://youtu.be/hSPmj7mK6ng
Introduction to Plotly Data Visualization - https://youtu.be/_b2KXL0wHQg
The Dash Callback Input, Output, State, and more - https://youtu.be/mTsZL-VmRVE
Dropdown Selector Python Dash Plotly - https://youtu.be/UYH_dNSX1DM
Pie Chart (Dropdowns) Python Dash Plotly - https://youtu.be/iV51JqP6y_Q
All about the Graph Component Python Dash Plotly - https://youtu.be/G8r2BB3GFVY
Complete Guide to Bootstrap Dashboard Apps Dash - https://youtu.be/0mfIK8zxUds

Repo Link: https://github.com/Coding-with-Adam/Dash-by-Plotly (all credit goes to @Coding-with-Adam for content)

New Visualization: Seasonality of Community:

The seasonality of a community pertains to the different time based cycles on community activity. As described by @GregSutcliffe :
"

The day-of-week is useful to maintainers when thinking about things like when to post news, event invites etc, and when to be available/have office hours. #203
Day-of-year gives maintainers some way to adjust expectation for the coming time-horizon (i.e is it March? I might expect a dip in contribution then, and should not panic when it happens). Clearly this does not work for new projects (STL requires 2 time periods, so 2 years for this) #205
Not shown, but other seasonalities are possible. Hourly? Might reveal geographic information (i.e. if you get peaks at UTC +/-1 then you perhaps have an EU-centric community) #205
Non-periodic things can be done too, holidays are common. Also I once used an STL with "holidays" corresponding to release dates to analyse the effect of new releases on people upgrading from old versions."

The trend is useful to maintainers and contributors alike - the former will want to know how the project is faring, the latter will want to know if the project is alive and worth contributing too"

Criteria for acceptance(to be updated):

Determine which of the following to measure, why, and how
Create a small write up to go with the charts to make them accessible and understandable for all
Make conscious design choices that make it readable for people with non data science-statistics background
create a notebook with the visuals

How does layering pages work with column width?

It is known that the overall view of a dash app has a 12 units to work with. When we have layered page view, we have 9 units to work with. When coding the pages (not index page), do we have 12 to work with or 9? Will figure out better wording to explain this question

Bus factor graph

Bus factor esc graph: show the number/percent of files that have been updated by 1,2,3, etc contributors in x amount of time (input by user)

2 Bucket Bus Factor:

Files changed in past 18 months
Contributors active in past 18 months
Intersection

Current TODOs:

@sgoggins creating initial query for this visualization

Create testing framework

Our application lacks any testing process before it is deployed or merged into dev or main. A difficulty in testing our app is that the data we are using are constantly updating- we need to target functions of the app that are data-agnostic.

The following are good candidate technical tests:

Augur database is available
Query to Augur with known return (up to max historical time) matches previously expected return value (past data isn't changing underfoot)
Augur query of reasonably large size doesn't take too long
App doesn't raise any runtime errors on deployment (run the app, hooks into logging for graph rendering)
Large query takes any amount of time but doesn't error out.
Moving between tabs w/ web driver changes url route as expected (Selenium)

We should have the long-run ability to test:

different versions of python
different operating systems (good practice for contributor accessibility)

We'll begin by implementing tests in PyTest and move to using Tox later when we want to test across environments. GithubActions handles these OS environments as well, so that's an option.

Dash code review?

As we are starting the development process with dash, can we find someone with significant experience in dash to do some code review? Especially as myself and @JamesKunstle are getting more acquainted with dash, it would be beneficial to have some tuned eyes on our work

Bug: Time out error on query call back

⛑️Callback error updating contributions.data2:54:12 PM
Callback error updating contributions.data

504 Gateway Time-out

The server didn't respond in time.

@sgoggins this is on the query you made, any thoughts? Might have to move to materialize view

Alter name of "Drive By" contributor/contributors

Given the loaded meaning behind the term "drive-by", we should probably change the term to something like "infrequent" or "irregular".

Determine formatting necessary for compatibility formatting augur config variables and open shift secrets

Feature: allow user to select/deselect repos when org is selected

Add visualizations to pages -- Overview

Refer to the miro board design for reference

The visualizations will be generated via jupyter notebook in the sandiego repo. An issue is to be created for each integration once the corresponding notebook is created.

Visualizations Required for Completion(short hand, refer to miro board for more details):

Chaoss page: graph formatting

When working on the dash app locally, the two graphs are on the same row but on separate rows in the open shift deployment

New visualization: Response/Merge time analysis comparing contributors

There may be some value in comparing the speed and engagement in responding to and merging PRs between the top (10ish?) contributors and the median/mean contributors. This visualization is not fully fleshed out for the purpose of exploring some different ways of looking at this to determine if this is useful to look at and which fashion

Acceptance criteria:

Determine if the number of "top contributors" should be a set number or a percent of the contributor count
Response: time to first response should be tracked, but is there a value in number of response or other metrics in that area?
Determine if mean, median, (or something else) should be the comparison group
Would a histogram of some sort be better for this?
Determine if this should be added to an existing notebook or stand on its own

New Visualization: Bus Factor

Please describe the background and context for this new visualization
The Bus Factor is a compelling metric because it visualizes the question "how many contributors can we lose before a project stalls?" by hypothetically having these people get run over by a bus (more pleasantly, how many would have to win in a lottery and decide to move on).

The Bus Factor is the smallest number of people that make 50% of contributions.
Describe the perspective you'd like the final visual to give
https://chaoss.community/metric-bus-factor/

Describe the acceptance criteria for the issue and visualization to be complete
when the issue is taken on, the people working on this should edit to describe the steps necessary to complete this as a visualization in the explorer dashboard

Search Bar- Make outputs accessible across pages

TODOs to come as more is known

Reorganize repo to enable multipage callbacks

Page layouts of non-main pages don't have access to callbacks of app object because they're 'lower' in the directory hierarchy than app.py.

The repository needs to be reorganized to fix this, and the best way to do this is likely with URL routing via the index.py invocation page.

Backend OO rework

To support faster integration of visualizations from sandiego-rh/sandiego into sandiego-rh/explorer, it is necessary to flesh out a more organize backend.

Connecting to the Augur DB, setting the dbschema, and getting a repo's id from its name, among other operations, are common to all visualization scripts and ought to be presented to the user as a a simple user interface.

Component Update: DropDown component on index page.

The Dropdown contributed to ~12 seconds of wait-time for the user because it's doing background work, likely loading data into memory. We would like to speed this up and one option is to make a dynamic callback.

#49

The above issue explores this option.

For this issue, our definition of done is:

Try dynamic callbacks for the Dropdown bar
Ensure case-insensitivity and look into very conservative fuzzy-matching.
Implement the dynamic callback for the dropdown bar, ensure that it's more timely.

Call back error when opening app online

Add sidebar navigation

https://www.youtube.com/watch?v=ln8dyS2y4Nc

Question Should we change our page structure to use Dash's new page functionality

Discussion needed on if we should change our page structure to be consistent with dash documentation. Include conversation around using validation_layout

See resources:
https://community.plotly.com/t/introducing-dash-pages-a-dash-2-x-feature-preview/57775
https://dash.plotly.com/urls
https://www.youtube.com/watch?v=RMBSQ6leonU
https://www.youtube.com/watch?v=sxGO1FAeQwU

ERROR: Exceeds Quota

When storing a large org dash gives the following error: "Failed to execute 'setItem' on 'Storage': Setting the value of 'commits-data' exceeded the quota."

With that, the graph still updates with new data and performs as expected. Only thing I would be concerned about is if this is a subset of the data. More investigation to come

Investigation needs to be done on what are the data transfer limitations are and how we need to format queries to work through this.

Enhancement Idea Home page: Show repo density in orgs

Fix dash deployment on OpenShift, learn how to do so in the future.

Currently, a baby version of Dash Overview page is supposed to be deployed to OpenShift, with the help of Misc, but it's broken. Next steps are to fix this deployment and learn how to deploy on OpenShift in the future.

Code/Repo Clean up

After getting a little farther in the dash app, the following things need to be done to improve the repo and allow for better experience for new developers to come in:

TODO:

Move db_interface folder
clean up code that was created to use for repeating functions, make it applicable to what we know now
made py file to have all call back code for queries to keep index page clean
add comments in index page explaining where the call backs are and how to add more
update all of the in line comments for current app
clean code from a design perspective (CSS, formatting) to make sure the code is all consistent

Notes on the last bullet point: We use a lot of different template code using different strategies with formatting (class_name, style, etc). Before things build too much, lets get this all consistent for when we build there is not weird format issue and we have items to use as templates

Create basic Search bar

This will be the initial step with completing the search bar to be accessible across pages.
TODO:

Determine best dash object for search bar to be connected across many visualizations and pages (is there a single object for it or must it be directly connected to a single graph?)
create search bar with repos/orgs as searchable inputs
allow for multiple selections
show selected options on search page

New Visualization: Metric License Coverage

Please describe the background and context for this new visualization
Determine how many files are covered by licenses and number of files covered by each license

Describe the perspective you'd like the final visual to give
https://chaoss.community/metric-license-coverage/
https://chaoss.community/metric-license-declared/

Describe the acceptance criteria for the issue and visualization to be complete
Not to be done in a notebook. Either used to test a visualization tooling option or completed when the tool is established

Additional context
Add any other context or screenshots about the feature request here.

REMINDER:
Before a visualization issue can be closed,there must be clear documentation on the notebook of the decisions made at each step and the "why." Also,
any ml ideas generated from this process should be created into as issue with the ml request tag**

Should we be using client side callbacks/ docker runtime options?

Query for gathering repo_ids from multiple github urls

Given the input from the search bar, generate the query and any additional code necessary to output the necessary repo_ids for the explorer pages

input could be:

-one or many repos
-one or many orgs
-combination of both

Enhancement: Inform user of Plotly view change ability

Plotly gives the ability to change view, which would be important to users with tables that have edited views and a user wants to see a different subset. Determine how to have noted functionality pop up. This may already be a plotly functionality that may need to be turned on

UX Enhancements

Through different conversations with @harishpillay and @sgoggins, the following UX suggestions will be implemented:

Notes on the start page to help guide users, a placeholder until we do more design implementations
Add button for GH issue creation: bug report, new visualization request, new repo/org request
loading bar as graphs/search bar updates (see #37 )
Create issue templete to link to buttons

Add deterministic requirements using pipenv

Right now we install our Dash application requirements from a requirements.txt file.

We ought to have closer control of which versions of modules we use when necessary to ensure that
our production state matches our development state as nearly as possible, avoiding "gotcha's" arising
from unvetted library incompatibilities that appear in prod but not in dev.

We can do this by converting from the typical python "pip install -r requirements.txt" workflow to a pipenv workflow.

oss-aspen / 8knot Goto Github PK

8knot's Issues

504 Gateway Time-out

Recommend Projects

Recommend Topics

Recommend Org