Git Product home page Git Product logo

glean-dictionary's Introduction

Glean Dictionary

CircleCI

The Glean dictionary aims to provide a comprehensive index of datasets generated inside Mozilla for applications built using the Glean SDKs.

This project is under active development. For up to date information on project structure and governance, see:

https://wiki.mozilla.org/Data/WorkingGroups/GleanDictionary

The production version of the Glean Dictionary is deployed at:

https://dictionary.telemetry.mozilla.org

Getting Started

You should be able to create your own local copy of the dictionary so long as you have Python (version 3.8+) and node.js (version 18+) installed. You will also need npm v8 or greater: run npm install -g npm@latest if you need to upgrade.

Assuming those requirements are met, follow these instructions:

# Create and activate a python virtual environment.
python3 -m venv venv/
venv/bin/pip install -r requirements.txt

# Build data needed by dashboard
./scripts/gd build-metadata
# Or, on Windows: python3 -m etl build-metadata

# Install npm dependencies and start a local
# instance of the GUI
npm install
npm run dev

If that worked, you should be able to see a local version of Glean at http://localhost:5555

You can speed up the "build data" step by appending the name of a set of application(s) you want to build metadata for. This can speed up the process considerably. For example, to build a metadata index for Fenix (Firefox for Android) only, try:

./scripts/gd build-metadata fenix

Search Service

The Glean Dictionary also includes a search service which enables searching through active metrics. Under the hood, this service is implemented with netlify functions. For example:

https://dictionary.telemetry.mozilla.org/api/v1/metrics_search_burnham?search=techno

You can start it up via the netlify command line interface (assuming you have it installed):

netlify dev

If you have generated metadata as described above, you should then be able to test the search functions locally:

http://localhost:8888/api/v1/metrics_search_burnham?search=techno http://localhost:8888/api/v1/metrics_search_firefox_legacy?search=ms

Storybook

We use Storybook for developing and validating Svelte components used throughout the app. To view the existing list of stories, run:

npm run storybook

Storybook Snapshot Testing

To give us more confidence that changes don't unintentionally break the UI, we run storybook snapshot tests.

You can run them manually as follows:

npm run test:jest

If you intentionally made a change to a component that results in a change to the output of the storybook snapshots, you can re-generate them using the following command:

npm run test:jest -- -u

End-to-End Testing

We use Playwright for our end-to-end tests.

Before testing, download the supported browsers needed for Playwright to execute successfully by running:

npx playwright install

To run the end-to-end tests along with other tests:

npm run test

To run only the Playwright tests:

npx playwright test

ETL Testing

The transforms used by the Glean Dictionary have their own tests. Assuming you've run the set up as described above, you can run these tests by executing:

venv/bin/pytest

Glean Debugging

In order to enable ping logging set the GLEAN_LOG_PINGS environment variable.

GLEAN_LOG_PINGS=true npm run dev

In order to send Glean pings to the debug viewer set the GLEAN_DEBUG_VIEW_TAG environment variable.

GLEAN_DEBUG_VIEW_TAG=my-tag npm run dev

Deployment

A version of the Glean Dictionary running the development branch (main) is accessible at https://glean-dictionary-dev.netlify.app/ .

The production version of the Glean Dictionary (https://dictionary.telemetry.mozilla.org) is deployed from the production branch on this repository, which usually corresponds to the latest GitHub release. To update the Glean Dictionary to the latest version, follow this procedure:

  • Do a quick test of https://glean-dictionary-dev.netlify.app to make sure it's working as expected.
  • Create a new release off of the main branch:
    • use the auto-generated release notes, omitting dependency updates, unless it's glean.js;
    • use the format vX.Y.Z for the tag, where X.Y.Z is the new version number.
  • From a local checkout (assuming origin is the name of the remote):
    • fetch the newly created tags, git fetch --tags origin;
    • switch to the production, git checkout production;
    • make it in sync with the tag you just created, git merge tags/vX.Y.Z (where X.Y.Z is the new version number).
    • push to the production branch, git push origin production.
  • Wait for the integration tests to pass by monitoring CircleCi.
  • Ensure that https://dictionary.telemetry.mozilla.org is automatically updated to the released version by checking that <HASH> in Built from revision: <HASH> at the bottom of the Glean Dictionary page matches the one reported at the top right of the release page https://github.com/mozilla/glean-dictionary/releases/tag/vX.Y.Z.

Contributing

For more information on contributing, see CONTRIBUTING.md in the root of this repository.

glean-dictionary's People

Contributors

abhi-agg avatar akkomar avatar alvesitalo avatar badboy avatar brizental avatar chichi012 avatar chutten avatar dawoodshahat avatar dependabot[bot] avatar dexterp37 avatar fbertsch avatar fenn-cs avatar harnaman-hk avatar hngerebara avatar iinh avatar jcads avatar jklukas avatar joylubega avatar lilylme avatar meghajain-1711 avatar mhmohona avatar nonbinaryfrog avatar relud avatar riju19 avatar robhudson avatar rosahbruno avatar scholtzan avatar singhvaishnavi avatar travis79 avatar wlach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

glean-dictionary's Issues

Add a footer to each page

This issue is intended as an onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

GLAM has a footer element with some useful information at the bottom of each page:

image

We should have something similar for the Glean dictionary.

You can mostly copy over the existing footer from GLAM (putting it into the src/components/ directory) and then put it into each of the pages we display:

https://github.com/mozilla/glam/blob/77c4689297e3c81f6cf49f9fd6f73a9435d1761b/src/components/regions/Footer.svelte

Some styling should be adjusted and obviously the links should be different (e.g. the link to a slack channel should instead be a link to our channel on Matrix: https://chat.mozilla.org/#/room/#glean-dictionary:mozilla.org)

Add Python linting to CI and make it easy to run

This issue is intended as an initial onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

We only have a minimal amount of python code so far, but we will accumulate more over time. To keep the quality level up, we should enable linting with flake8, black, and isort.

This is a somewhat non-trivial issue, as it will involve:

  • Setting up circleci to run python code (it only validates javascript at the moment)
  • Writing up the correct linter configuration (always harder than it seems)

Some prior art which might be helpful is the mozregression repository where I recently added some linter code:

mozilla/mozregression#575

Note however that it uses travis rather than circleci. You'll need to do some extra research to get this going with CircleCI.

Have the UI flag when an app is on the prototype stage

On mozilla/probe-scraper#244 we added a flag that signals when an app is still on the prototype stage.

We should have the dictionary show that on the UI somehow.

This bug should be handled in two stages:

  1. Define how we are going to add this flag to the Glean dictionary's UI, discuss the proposed solution (can be done on this issue's comments);
  2. Implement whatever is decided.

Processing invalid links in Bugs and Data Reviews Json

Issue description
glean.page.path bugs and data reviews hyperlinks takes to invalid Bugzilla page.

Steps to reproduce the issue :
http://localhost:5000/data/glean-js/metrics/glean.page.path.json , check the json key: value pair of bugs and data_reviews
"bugs": ["https://bugzilla.mozilla.org/show_bug.cgi?id=actually-we-dont-have-this"], "data_reviews": ["https://bugzilla.mozilla.org/show_bug.cgi?id=actually-we-dont-have-this"],

What can be done here :

  1. Check with the data source for wrong data reason and process it.
    OR
  2. Preprocess such noisy data at UI end.

Solving this issue might also require spec'ing what would the UI page say when such bugs/links are found.

Improve filter

In the filter box when we search for something and it doesnt match with any item, just a blank page stares at us. There should a text telling search doesnt match with any application or martics.

Display notification emails on application pages

We currently display email addresses on the ping page:

Screen Shot 2020-10-21 at 3 57 49 PM
http://localhost:5000/#!/apps/mozregression/pings/usage

The repository metadata also has this information, however, and it would be good to display it on the application page (http://localhost:5000/#!/apps/mozregression).

As part of this implementation, let's create a component for rendering this type of information (maybe EmailAddresses?) nicely and create a story for it. One nicety we could add would be creating a mailto: URL for the email address, to make it a little easier to send a mail to the relevant party.

Run storybook tests in ci

Storybook snapshot tests in CI (where we verify that stories still render after a pull request) can often catch problems, e.g. it was helpful in the iodide project (see iodide-project/iodide#2506). I'm not sure how to set this up in Svelte but I'm guessing it should be possible. You can see the aforementioned PR for iodide for some ideas.

We should gather all metric/ping types, not just those specified by the applications

This issue is intended as an onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

Currently we're only gathering metrics and ping data from the applications themselves, not those specified by the libraries (e.g. the events ping is part of glean-core: https://github.com/mozilla/glean/blob/2261845761251d91b6968f29846dc3aabbc0cc45/glean-core/pings.yaml#L80)

Mozilla Schema Generator (which does something similar to what we do) enumerates dependencies for each application and gathers data for them e.g. https://github.com/mozilla/mozilla-schema-generator/blob/1264148ff82adc3357de33a25541f1313f93449f/mozilla_schema_generator/glean_ping.py#L126

We should modify our build-glean-metadata.py script (https://github.com/mozilla/glean-dictionary/blob/main/scripts/build-glean-metadata.py) with similar logic.

Add a filter box to the applications page

This issue is intended as an onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

If you look at the bigquery table view, you'll see that there's a filter box that lets you easily search for the subset of columns that you're interested in:

image

We should have a similar widget on the main page to filter through the list of applications where e.g. putting "firefox" into the box will only show applications with Firefox in the name.

Render markdown in description fields

Currently we're now rendering markdown in Glean description fields, which doesn't look great:

image
http://localhost:5000/#!/apps/mozregression

We should render these types of fields with markdown. For the schema dictionary I used the marked parser, which seems to work pretty well: https://www.npmjs.com/package/marked

It seems most likely that we'll define a svelte component for rendering this type of information, in which case we should add a story for it in: https://github.com/mozilla/glean-dictionary/tree/main/stories

Make Metrics section filterable

While reading the proposal for glean-dictionary I came across this, Which proposes that the Metrics section on the ping page should be filterable

filterable_list_of_metrics

Currently, If we navigate to this page
http://localhost:5000/#!/apps/fenix

It shows a large list of metrics for the fenix app which shows that the current Metrics section on the ping page is not filterable.

Screenshot 2020-10-11 at 1 11 55 AM

I think it would be great if we add the FilterInput.svelte component to the metrics section to make it filterable.

Failed to fetch some metrics data for fenix apps.

Issue description

Failed to fetch some metrics data for fenix apps.
The issue arises only when the metric name starts with metrics.any_metric

Steps to reproduce the issue

  1. navigate to http://localhost:5000/#!/apps/fenix

  2. search for a metric that starts with metrics.default_browser. Use this link http://localhost:5000/#!/apps/fenix/metrics/metrics.default_browser

  3. Check the browser's console, and It throws this error
    Uncaught (in promise) TypeError: Failed to fetch

  4. The issue arises only when the metric name starts with metrics.any_metric

What's the expected result?

  • It should show details about the metrics by rendering MetricDetail.svelte component.

What's the actual result?

  • It shows a blank screen.
  • An error on the browser's console.

Screenshots

  • Screenshot 2020-10-12 at 12 07 45 AM

Include more application metadata

We don't have a ton of Glean application metadata currently, but we have more than we're currently displaying (name + description).

In particular, we should include the following:

  • source code URL -- where to look up the source code
  • application id -- the id of the application, corresponding to what we use on the play store with android applications
  • deprecated (if true) -- whether the application is deprecated (no longer under active development)

application id and source code URL should only be displayed in the application detail screen. For now, just display them as a table, as we do for the BigQuery table view e.g. http://localhost:5000/#!/apps/fenix-nightly/tables/activation

"deprecated" should be displayed as a pill (in both the application list and application detail screen). Create a new svelte component using tailwind with a pleasing style. You can see some documentation on how to create a pill here:

https://tailwindcss.com/docs/border-radius#pills-and-circles

BigQuery table link doesn't fit into rest of metadata

I added a link to the BigQuery table view when working on the initial skeleton. With some of the recent changes, it now looks pretty out of place:

Screen Shot 2020-11-09 at 10 20 57 AM

It should be an item in the table below (just after "notification email"). I would propose the following structure:

  • Name: "BigQuery Table"
  • Link: name of stable table name in the table view (e.g. org_mozilla_fenix.activation for fenix activation table)

You may need to update the metadata gathering step in scripts/build-glean-metadata.py to fetch the name of the stable table to put in the ping view.

Add link to github repo of Applications

As per the Glean dictionary proposal :
image

Application page should have a link to the application's source repository which is currently missing.

Where to add:

  1. Below the Header for application name / Make the Application Name a hyperlink to the source repo
    image

How to add:
We might need a pre-existing database with this information
OR
have to store it in a JSON format , which would need maintenance if any application is added.

Code to change : src/pages/AppDetail.svelte

Testing

Currently there's only an example test. It would be a good idea to add some testing maybe with something like svelte-testing-library along with Jest. This way we can familiarise ourselves with the codebase through writing some tests.

Persist search state in BigQuery table view

In the BigQuery table view we currently allow the user to search through the column names to find the ones of interest:

image
http://localhost:5000/#!/apps/mozphab/tables/usage

It would be very handy if we could persist that search in the URL (and restore it when they visit it), so that people could link to specific views and have the search prepopulated. In the above example, that would be:

http://localhost:5000/#!/apps/mozphab/tables/usage?search=build

To perform this task, have a look at the documentation for page.js, which is what we currently use for routing: https://visionmedia.github.io/page.js/

You will probably need to modify the main router (App.svelte) in addition to the component for the table view.

Add info about the type of metric and expiry date on Metric Detail Page

Currently the enhancement on Metric detail page so far shows only Json data.

We should add the information about :

  1. Type of metric
  2. Expiry date of the metric
  3. Starting point(app) of the metric [ fenix/mach/...]

Where :
Below description of the Metric

More to do :

  1. We can also add more_info links as hyperlink to the type of metric
  2. Hyperlink to the starting point to take it back to the App page eg "mach/fenix" page in this case.

Don't show expired metrics by default

Many of the probes in the existing probe dictionary are expired or deprecated (and they don't always have build end dates). This causes confusion, since it might not be clear that no data will be available in those probes for currently-released products. While the ability to access historical probes should always exist, we should optimize for what is presumably the common case of looking at new data flowing in.

This might include:

  • A visually obvious identifier for deprecated or expired probes
  • Not returning search results for expired probes (unless checking a box)
  • Ranking search results by date of last update to the probe definition

Table search only searches in last part of the component

This issue is intended as an onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

Currently the filter search in the table view (e.g. http://localhost:5000/#!/apps/mach/tables/usage) only filters for content in the last "node" in the structure. e.g. if you have

client_info.app_version
client.info.app_display_version
client_info.client_id
...

and you search for "client", it will filter out all of the above except for client_info.client_id. Ideally the search would include all entries whose parent elements have the term in them (in the example above, this would be everything that includes client).

To fix this, you'll need to edit the schemaviewer component:

const filterTextChanged = (filterText = "") => {

Metric pages show almost no info

This issue is intended as an onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

Despite having a bunch of metadata in them, we show almost nothing about the metrics on the metric page.

For a start, let's show everything the earlier glean dictionary prototype did:

image
https://glean-dictionary.netlify.app/?metric=media_state_play

The code that needs to be modified is here:

https://github.com/mozilla/glean-dictionary/blob/main/src/pages/MetricDetail.svelte

All the information we want to display should already be extracted. If you have the server running locally, go to e.g.:

http://localhost:5000/data/mach/metrics/mach.system.memory.json

This is the dataset corresponding to:

http://localhost:5000/#!/apps/mach/metrics/mach.system.memory

You can skip the "help" parts of this dialog for now. We'll tackle that in a separate issue. Also, don't worry about styling the component too much-- can just use a table for now (like we do in the table view already: http://localhost:5000/#!/apps/mach/tables/usage)

Deploy a copy of Glean Dictionary to protosaur.dev

This bug requires specialized knowledge and access to Mozilla's internal systems, so is not a good issue for contributors

We should deploy a copy of the Glean dictionary to protosaur.dev on a regular basis. protosaur currently requires auth (making it inaccessible to those outside Mozilla), but that should be fixed soon by mozilla/protodash#16

Better handle cases where application/pings/metrics/table do not exist

Currently we just silently fail if the user navigates to an entity that does not exist. e.g.:

http://localhost:5000/#!/apps/burnham2
http://localhost:5000/#!/apps/fenix-nightly/pings/activation-doesnotexist

It would be better if we displayed some kind of friendly error page saying something like "Could not find application burnham2" or "Could not find ping application-doesnotexist". I don't expect this to happen frequently but this sort of thing can happen occasionally (e.g. if an application is added and then withdrawn)

To accomplish this task, you'll want to create a new Svelte component to cover this functionality and update each page to use/display it in the event that fetching information fails.

External link should be opened in new tab

Opening an external link in a new tab allows one to explore the other site as much as they want without having to hit the back button again. It helps to keep focus in one place without losing other websites' information.

Currently, metrics page opens external links in new tab. Pings page and Tables page need to be updated.

Here another thing to be noted, we should not open internal link in the new tab because it might confuse users. Also, Glean dictionary has the flexibility to navigate back or elsewhere according to users' needs. So for this, keeping users in the same tab helps them understand the navigation flow better.

We should collect all the labels ever used on a labeled metric

In Glean, we want to encourage users to remove deprecated labels from their metrics. But it would still be good to document in the dictionary all the labels that have been used on historical data.

This would basically require going through the entire history for a metric and collecting all of the labels used. Bonus points for flagging the ones that no longer exist in the latest revision.

See https://bugzilla.mozilla.org/show_bug.cgi?id=1587430 for additional context.

Storybook doesn't render components with styling

This issue is intended as an onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

We have a single story for the schema viewer, but it doesn't look great:

image

This is because the Tailwind css components aren't being imported directly into the story. It should be possible to fix this with some configuration changes. This repository may have some hints on how to configure things (my suggestion would be to look at the postcss configuration):

https://github.com/jerriclynsjohn/svelte-storybook-tailwind

Hide applications that are deprecated by default

This issue is intended as an onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

Currently we list all Glean applications in the UI by default on the home page (https://github.com/mozilla/glean-dictionary/blob/12ff6ac8f603f1245f6c6dcb9e4be9b85e28b135/src/pages/AppList.svelte), regardless of deprecation status. Instead we should have a checkbox that allows you to show/hide the applications that are deprecated. Something like this:

  • Show deprecated applications

You can use the deprecated property in the apps json file to accomplish this task. Assuming you have the application running, have a look at this JSON payload:

http://localhost:5000/data/apps.json

Add comment sections to probe details

This is an off-the-wall idea that came up in the data science team meeting today: it would be super helpful if data scientists had a place to leave comments on probes to discuss their behaviour and share warnings for future travellers!

Maybe the answer is "just use Bugzilla," or maybe there's another place, but this is a place that many data scientists look and so it seems like it could profitably live here.

Moderation or authentication is an obvious concern; possibly this could link out to Discourse threads or some other already-moderated Mozilla space, but it would be great if we could see whether there's a comment available, and ideally what it is.

Run ETL code after a schema deploy

This bug requires specialized knowledge and access to Mozilla's internal systems, so is not a good issue for contributors

The Glean Dictionary currently has some etl code which you need to run adhoc to create a bunch of static data assets in https://github.com/mozilla/glean-dictionary/blob/12ff6ac8f603f1245f6c6dcb9e4be9b85e28b135/scripts/build-glean-metadata.py -- we may eventually move to storing this in an elastic search cluster (see discussion here: https://docs.google.com/document/d/1OkTWA3rsSJ0m5g9GDnxXVUMkJP-xJMQk_bDgDq-Z9xM/edit#heading=h.tn5dtaq0zat6) but this seems like the easiest approach for now. While we're in this phase, we should schedule this etl code to run after a schema deploy and upload it to the bucket we're using with protosaur (#60). Need to talk to someone from dataops when we're ready to do this.

Ping page has almost no info

This issue is intended as an onboarding task for potential outreachy applicants. Please do not work on it unless you have completed the initial qualification task and it has been assigned to you.

We show only a tiny subset of the data related to the ping on the ping page right now. You can see an example of markdown documentation which more fully represents the metadata we have here:

https://github.com/mozilla/mozregression/blob/master/docs/glean/metrics.md#pings

The code that needs to be modified is here:

https://github.com/mozilla/glean-dictionary/blob/main/src/pages/PingDetail.svelte

All the information we want to display should already be extracted. If you have the server running locally, go to e.g.:

http://localhost:5000/data/mozregression/pings/usage.json

This is the dataset corresponding to:

http://localhost:5000/#!/apps/mozregression/pings/usage

Don't worry about styling the component too much-- can just use a table for now (like we do in the table view already: http://localhost:5000/#!/apps/mach/tables/usage)

Refine the metric page (transferred from old glean dictionary)

cc @spasovski

I know this is a pre-alpha, but I thought to share my feedback on this anyway :-) There's a few small nits that I believe would make this page a bit more digestible (see the relative colored numbers on the image):

  1. I'd change the label to resonate with the wording in the Glean Docs, i.e. relevant bugs
    1a. It also might make sense to drop the links from there, and just use a numbered list of links, for example, something that would look like this markdown [1](link to the first bug/GH issue), [2](...).
  2. It would be great to have tooltips/question marks icons like in GLAM to remind users what these entries mean. They are documented in the Glean docs and, for example, lifetime is usually confusing.
  3. Instead of Timing_distribution, this should probably drop the _ and also link to the proper glean docs (e.g. https://mozilla.github.io/glean/book/user/metrics/timing_distribution.html) - note that the name of the metric type is also the same name of the documentation for that metric type. This is on purpose, so that you can do https://mozilla.github.io/glean/book/user/metrics/{metric_type_name}.html

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.