Git Product home page Git Product logo

buildhub's Introduction

Buildhub

Status: June 11th, 2019

This project is deprecated and will be decommissioned soon. If you're using Buildhub, please migrate to Buildhub2.

Details

CircleCI

Buildhub aims to provide a public database of comprehensive information about releases and builds.

Licence

MPL 2.0

Development

  1. Install Docker
  2. To run tests: make test
  3. To lint check Python code: make lintcheck

Continuous Integration

We use CircleCI for all continous integration.

Releasing

There are a few pieces to Buildhub.

AWS Lambda job and cron job

Generate a new lambda.zip file by running:

rm lambda.zip
make lambda.zip

This runs a script inside a Docker container to generate the lambda.zip file.

You need to have write access to github.com/mozilla-services/buildhub.

You need a GitHub Personal Access Token with repos scope. This is to generate GitHub Releases and upload assets to them.

Create a Python virtual environment and install "requests" and "python-decouple" into it.

Run ./bin/make-release.py. You need to set the GITHUB_API_KEY environment variable. You need to specify the "type" of the release as a command-line argument. Choices are:

  • major (e.g. '2.6.9' to '3.0.0')
  • minor (e.g. '2.6.7' to '2.7.0')
  • patch (e.g. '2.6.7' to '2.6.8')

Then do this in your Python virtual environment:

$ GITHUB_API_KEY=895f...ce09 ./bin/make-release.py minor

This will bump the version in setup.py, update the CHANGELOG.rst and make a tag and push that tag to GitHub.

Then, it will create a Release and upload the latest lambda.zip as an attachment to that Release.

You need to file a Bugzilla bug to have the Lambda job upgraded on Stage. Issue #423 is about automating this away.

To upgrade the Lambda job on Stage run:

./bin/deployment-bug.py stage-lambda

To upgrade the cron job and Lambda job on Prod run:

./bin/deployment-bug.py prod

Website ui

Install yarn.

Then run:

$ cd ui
$ yarn install
$ yarn run build
$ rimraf tmp
$ mkdir tmp
$ cp -R build/* tmp/
$ gh-pages -d tmp --add

Note: This only deploys a ui that connects to prod kinto--it doesn't deploy a ui that connects to the stage kinto.

Datadog

Buildhub Performance

buildhub's People

Contributors

bhearsum avatar g-k avatar glasserc avatar leplatrem avatar magopian avatar mostlygeek avatar n1k0 avatar natim avatar peterbe avatar pyup-bot avatar ssetem avatar willkg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

buildhub's Issues

Version parsing issue for (some?) thunderbird records

In the 1082 records currently in Kinto, there's 98 with a version parsed as 05-03-00-40-06-comm-aurora-l10n/thunderbird-54.0a2.

If it helps, their archive folder is https://archive.mozilla.org/pub/thunderbird/nightly/2017/05/2017-05-03-00-40-06-comm-aurora-l10n/ (and an example file is https://archive.mozilla.org/pub/thunderbird/nightly/2017/05/2017-05-03-00-40-06-comm-aurora-l10n/thunderbird-54.0a2.is.win32.zip

Investigate about archives candidates JSON metadata

Build info are published as JSON in the candidates builds folder and for nightlies, but only for en-US.

https://archive.mozilla.org/pub/firefox/candidates/51.0b6-candidates/build1/linux-x86_64/en-US/firefox-51.0b6.json

https://archive.mozilla.org/pub/firefox/nightly/2017/05/2017-05-01-10-01-39-mozilla-central/firefox-55.0a1.en-US.linux-i686.json

Investigate:

  • what info we can reuse for other locales?
  • how the Socorro scrapper uses the list of candidates builds?
  • is the list of builds fixed or has a chance to be updated?

See #18

What are partners releases?

$ tree -L 2 pub/firefox/releases/partners
├── fujitsu-siemens
│   ├── 2.0.0.6
├── google
│   ├── 1.5.0.12
│   ├── 2.0.0.6
├── packardbell
│   ├── 2.0.0.6
├── seznam
│   ├── 2.0.0.6
├── yahoo
│   ├── 1.5.0.6
│   ├── 1.5.0.7
└── yahoo-japan
    ├── 2.0.0.6

Should we handle them or just ignore them?

Link releases together

  • Previous/next beta
  • Previous/next nightly
  • Previous ESR
  • Previous stable
    ...

ultimate goal: diff between releases :)

Obtain the locale revision

Somewhere (ShipIt at least), there is the mercurial changeset of the locale that was used to build the release. Find out how to get that :)

Irrelevant archives for android

The scraping picks us stuff like https://archive.mozilla.org/pub/mobile/releases/21.0b1/android-armv6/multi/fennec-21.0b1.multi.android-arm-armv6.tests.zip and https://archive.mozilla.org/pub/mobile/releases/24.0b1/android/en-US/fennec-24.0b1.en-US.android-arm.crashreporter-symbols.zip which leads to record ids conflicts

Clarify and fix scripts for repository/tree

Currently we don't have clear semantics about tree/channel etc.

I propose that we keep:

  • repository: full repository URL (eg. https://hg.mozilla.org/releases/mozilla-beta)
  • channel: update channel (for AUS, beta, nightly, release)
  • tree: path of repository relative to root (eg. releases/mozilla-beta)→ useful?

Related #17

Double request send to Kinto on each filter change

When we select a new filter, it fires a new request to kinto with the updated filters, but also a "url change" (which ultimately sends a request to Kinto).

This results in two requests to Kinto, we should fix that.

Using the build id search box breaks filtering

Whenever there's a string entered in the build id search box, two things happen:
1/ you can't filter on any other filter (selecting any other filter should reset the build id search box)
2/ it doesn't filter on the build id at all

Optimize build size

Right now generated app build is 500k. With uglifyjs, we could strip than down to ~150k.

Deterministic records ids

Currently we let the server choose the records ids (using POST).

@Natim suggested that we could have a relation between a release and its record.

Idea: We could use the download url (which is unique) to compute a uuid for example.

Display size in human readable format in the UI

34443782 bytes is quite hard for me to parse.
After some calculation I understood it was 32,85 MB I think that's the value I would like to have in the UI directly. We could however display the bytes value on mouseover (title?acronym?)

Realtime updates of buildhub

S3 has the ability to post events when objects are uploaded and deleted. Since archive.mozilla.org is an S3 bucket, whenever a new build is uploaded we can be notified and buildhub updated in real time.

For the flow:

upload -> s3 -> SNS -> SQS <- buildhub daemon -> buildhub kinto

or

upload -> s3 -> lambda -> buildub kinto

What do you guys think?

Malformed folder for archived thunderbird candidates

Traceback (most recent call last):
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 165, in fetch_listing
    data = await fetch_json(session, url)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/.venv/lib/python3.5/site-packages/backoff-1.4.3-py3.5.egg/backoff/_async.py", line 120, in retry
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 160, in fetch_json
    return await response.json()
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/.venv/lib/python3.5/site-packages/aiohttp-2.1.0-py3.5-linux-x86_64.egg/aiohttp/client_reqrep.py", line 722, in json
    headers=self.headers)
aiohttp.client_exceptions.ClientResponseError: 0, message='Attempt to decode JSON with unexpected mimetype: text/html'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/.venv/bin/scrape_archives", line 11, in <module>
    load_entry_point('buildhub', 'console_scripts', 'scrape_archives')()
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 429, in run
    loop.run_until_complete(main(loop))
  File "/usr/lib/python3.5/asyncio/base_events.py", line 466, in run_until_complete
    return future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 420, in main
    await produce(loop, queue, client)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 361, in produce
    await fetch_products(session, queue, PRODUCTS, client)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 238, in fetch_products
    await asyncio.gather(*futures)
  File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 307, in fetch_versions
    return await asyncio.gather(*futures)
  File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 319, in fetch_platforms
    return await asyncio.gather(*futures)
  File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
    result = coro.throw(exc)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 331, in fetch_locales
    return await asyncio.gather(*futures)
  File "/usr/lib/python3.5/asyncio/futures.py", line 380, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 293, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 349, in fetch_files
    metadata = await fetch_release_metadata(session, product, version, platform, locale)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 207, in fetch_release_metadata
    build_folders, _ = await fetch_listing(session, builds_url)
  File "/home/mathieu/Code/Mozilla/buildhub/jobs/buildhub/scrape_archives.py", line 168, in fetch_listing
    raise ValueError("Could not fetch {}: {}".format(url, e))
ValueError: Could not fetch https://archive.mozilla.org/pub/thunderbird/candidates/archived/-candidates/: 0, message='Attempt to decode JSON with unexpected mimetype: text/html'

Hardcode the filter values

We used to build the list of values for filters dynamically, using the list of build records we had at any one point. This made sure we had no missing value, and also that we'd always present a relevant list of filter values to the user, given a list of build records.

However, this isn't scalable: we can't load the full list of build records in memory on the client, and then go through them to build the list of filter values.

A first step towards being able to navigate through the whole data set is to harcode the filter values.

We're experimenting with using elasticsearch to do the indexing/faceting (see https://github.com/Kinto/kinto-elasticsearch) which might replace those hardcoded values in the future.

Why not pulse?

It seems that pulse does not meet our requirements. For posterity, @leplatrem could you describe the reasons that we decided against using it? Also describe what we prefer to do instead.

Infer channel information?

Of the 1082 records at the time of this writing, there's exactly 0 with a specified "channel" field. Either it's not an information we get back from pulse on the messages we're interested in, either there's an issue when storing those in kinto?

Operational Spec for BuildHub Version 1.0

This issue is to discuss how we want to structure the project to make it easy to configure and deploy for the operations team. Currently there are are server side and client side components. The server side components are daemons and cron job compatible scrapers. The client side is a static HTML website.

Data is stored in a kinto instance which is backed by Postgresql RDS. This is relatively slow moving and shouldn't require much operational work other than standing it up and keeping it updated.

Server side code

  • ship a container with all server based code
  • follow dockerflow
  • Build/test container in circle.ci
  • Publish container to hub.docker.com/mozilla/buildhub
  • A single container with all of the scripts / workers / scrapers / etc
  • version.json written into the container, accessible by run.sh version
  • a run.sh file as the container's ENTRYPOINT.

run.sh

Buildhub's docker container is a toolbox and run.sh provides an abstraction to the tools inside. Here are an examples of how run.sh works as the ENTRYPOINT:

Self documenting...
$ docker run buildhub:master help


On the CLI or as cronjob: 
$ docker run buildhub:master scrape_archives --auth user:password


As a daemon: 
$ docker run --envfile /etc/dockerflow/buildhub-pulse.txt \
    buildhub:master pulse_listener

Web UI

The WebUI (SearchKit: #127) also needs to be deployed.

Debounce the build id search box

At the moment, a request is sent to Kinto on each keypress. We should rather debounce it (only send a request whenever no keypress was detected for 200ms or more, or add a "search" button).

WebUI should create permalink URI for resources

The URI address in the browser should update when we change filters. This would make it easier for other web tools (mozreview, bugs, phrabricator, etc) to link back to build/release information.

Which arch for Android ARM?

It seems that scrape_archives is using android-arm as the platform.

However is there two platform for android? android-api-9 and android-api-15? Can we have the same release compatible with both?

Since the platform is used in the ID of the record, we need to make sure that android-api-9 releases won't be overridden by android-api-15 ones and vice et versa.

Version filter sometimes results in a 503

Trying to filter on certain version numbers results in the following error message in the console:

An error occured while fetching the build records: KintoError 503 "Service Unavailable" { errno = 201, message = "Service temporary unavailable due to overloading or maintenance, please retry later.", code = 503, error = "Service Unavailable" }

The request sent to Kinto is: https://kinto-ota.dev.mozaws.net/v1/buckets/build-hub/collections/releases/records?_limit=10&_sort=-build.date&target.version=0.8

If however you filter on version 53.0b1 there's no problem: https://kinto-ota.dev.mozaws.net/v1/buckets/build-hub/collections/releases/records?_limit=10&_sort=-build.date&target.version=53.0b1

Add support for Fennec nightly

https://archive.mozilla.org/pub/mobile/nightly/2017/05/2017-05-30-10-01-27-mozilla-central-android-api-15-old-id/fennec-55.0a1.multi.android-arm.apk

The generation of a unique ID for this URL is not unique.

It set the following ID: fennec_2017-05-30-10-01-27_55-0a1_android-arm_multi while it should set fennec_2017-05-30-10-01-27_55-0a1_android-arm_multi

We have other example of another URL that returns the exact same ID, leading to missing builds:

The latter should set: fennec_2017-05-30-10-01-27_55-0a1_android-i386_multi

Check that scraping script can be resumed easily

Currently, we take the highest version on the remote storage to resume a previous run.

It should work ok as long as things go well. But if a scraping task crashes, it can leave some records out for specific locales, platforms or whatever.

We should elaborate a clear strategy to allow idempotence/resuming efficient :)

[ui] Add a page with code snippets

In order to show the possibility of the underlying APIs, we could have a static page that:

  • Shows a curl example to retrieve the revision of a buildid (for example)
  • Shows kinto-http.(py|js) snippets to query the API (simple stuff like list of locales for a version)
  • Links to Kinto API docs
  • Info to contact us (IRC,

Thoughts?

Capture and Service Telemetry's Requirements

Last month (apr 17) I discussed with @mreid-moz for telemetry requirements. Telemetry's main requirement is being able to cross reference the build id/release channel in a telemetry ping back to a commit in mercurial.

Summary of requirements:

  • A single source of truth to identify a build. They often get data from non official build and release version of firefox.
  • With a build id/release channel be able to look up the commit hash for a release

The blue print contains the detailed meeting notes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.