Git Product home page Git Product logo

debian-snapshot's Introduction

Snapshot

Create a Debian snapshot service like snapshot.debian.org.

It currently uses snapshot.debian.org set of timestamps and data for provisioning the service.

WIP: In a near future, we plan to manage our own set of timestamps and download data directly from deb.debian.org. We currently stick to snapshot.debian.org set of timestamps only for current development and testing and notably, for metasnap.debian.net portability.

Snapshot repositories

usage: snapshot.py [-h] [--archive ARCHIVE] [--suite SUITE] [--component COMPONENT] [--arch ARCH] [--timestamp TIMESTAMP] [--check-only] [--provision-db]
                   [--provision-db-only] [--ignore-provisioned] [--no-clean-part-file] [--skip-installer-files] [--verbose] [--debug]
                   local_directory

positional arguments:
  local_directory         Local directory for snapshot.

optional arguments:
  -h, --help              show this help message and exit
  --archive ARCHIVE       Debian archive to snapshot. Default is 'debian' and is the only supported archive right now.
  --suite SUITE           Debian suite to snapshot. Can be used multiple times. Default is 'unstable'
  --component COMPONENT   Debian component to snapshot. Default is 'main'
  --arch ARCH             Debian arch to snapshot. Can be used multiple times.
  --timestamp TIMESTAMP   Timestamp to use for snapshot. Can be used multiple times. Default is all the available timestamps. Timestamps range can be expressed with
                          ':' separator. Empty boundary is allowed and and this case, it would use the lower or upper value in all the available timestamps. For
                          example: '20200101T000000Z:20210315T085036Z', '20200101T000000Z:' or ':20100101T000000Z'.
  --check-only            Check downloaded files only.
  --provision-db          Provision database.
  --provision-db-only     Provision database only.
  --ignore-provisioned    Ignore already provisioned repodata.
  --no-clean-part-file    No clean partially downloaded files.
  --skip-installer-files  Skip download of installer files.
  --verbose               Display logger info messages.
  --debug                 Display logger debug messages.

Examples

  1. Partial timestamps set for debian archive (default value), unstable, bookworm and bullseye suites, amd64, all and source architectures, main component (default value) since 20170101T000000Z to local directory /snapshot:
./snapshot.py /snapshot --debug --suite unstable --suite bookworm --suite bullseye --arch amd64 --arch all --arch source --timestamp 20170101T000000Z:

Note: Pay attention to the ':'

  1. Partial timestamps set for debian archive, bullseye suite, all architecture, main component for 20210221T150011Z and 20210315T085036Z timestamp to local directory /snapshot:
./snapshot.py /snapshot --debug --suite unstable --suite bullseye --arch amd64 --arch all --timestamp 20210221T150011Z --timestamp 20210315T085036Z
  1. Full timestamps set for Debian unstable, arm64 architecture to local directory /snapshot:
./snapshot.py /snapshot --debug --suite unstable --arch arm64

Available timestamps

A partial timestamps set (see Example 1) is available at http(s)://snapshot.notset.fr. The only thresholds are (extracted Nginx conf):

limit_conn conn_limit_per_ip 20;
limit_rate 10m;

This is for allowing every Debian rebuilder infrastructure to scale their actual builders.

API

The snapshot process extracts and stores repository metadata (Sources.gz and Packages.gz) into a database. From it, we expose a machine-readable output API similar to snapshot.debian.org, but extended. We store in database the full location information for a file in terms of archive, suite, component and ranges of timestamps that we expose for the API results. Contrary to snapshot.debian.org, it allows to know the exact repository location for a given file and to not being limited to only archive name and the first timestamp that is has been recorded.

We currently expose the following similar endpoints:

URL: /mr/package
HTTP status codes: 200 404 500
Summary: list source package names

URL: /mr/package/<package>
HTTP status codes: 200 404 500
Summary: list all available source versions for this package

URL: /mr/package/<package>/<version>/srcfiles
Options: fileinfo=1 includes fileinfo section
HTTP status codes: 200 404 500
Summary: list all files associated with a source package

URL: /mr/binary/<package>
HTTP status codes: 200 404 500
Summary: list all available binary versions for this package

URL: /mr/binary/<package>/<version>/binfiles
Options: fileinfo=1 includes fileinfo section
HTTP status codes: 200 404 500
Summary: list all files associated with a binary package

URL: /mr/file
http status codes: 200 404 500
Summary: list all files

URL: /mr/file/<sha256>/info
http status codes: 200 404 500
Summary: information about file

URL: /mr/file/<sha256>/download
http status codes: 302 404 500
Summary: Download file from hash

URL: /mr/timestamp/<archive_name>
http status codes: 200 404 500
Summary: list all available timestamps for this archive name

URL: /mr/timestamp/<archive_name>/<timestamp_value>
http status codes: 200 404 500
Summary: if <timestamp_value> is 'latest', it returns the latest timestamp value available for the
 requested archive. Else, it returns the closest older timestamp value to <timestamp_value>.
 If an archive with timestamp <timestamp_value> exists, then <timestamp_value> is returned unchanged.

URL: /mr/buildinfo
Options: suite_name=<suite_name> filter results for the given Debian suite
http status codes: 200 404 500
Summary: compute minimal set of timestamps containing all package versions in uploaded buildinfo file

Note: Contrary to snapshot.debian.org, we only use SHA256.

API examples

Get debian archive available timestamps:

{
  "_api": "0.3",
  "_comment": "notset",
  "result": [
    "20170101T032652Z",
    "20170101T092432Z",
    "20170101T153528Z",
(...)
    "20210718T032051Z",
    "20210718T092653Z",
    "20210718T144801Z",
    "20210718T204229Z",
    "20210719T031839Z",
    "20210719T090459Z"
  ]
}
{
  "_api": "0.3",
  "_comment": "notset",
  "result": "20210822T023545Z"
}
{
  "_api": "0.3",
  "_comment": "notset",
  "result": "20191231T170830Z"
}

Get source files info for python-designateclient package version 2.3.0-2 (http://snapshot.notset.fr/mr/package/python-designateclient/2.3.0-2/srcfiles?fileinfo=1):

{
  "_api": "0.3",
  "_comment": "notset",
  "package": "python-designateclient",
  "version": "2.3.0-2",
  "result": [
    {
      "hash": "240d86861138fbf8a34c1bf96412bf290dc8eae4a560473b0ecee605b8d1288f"
    },
    {
      "hash": "d65b4d861612c0bed42cdecedbcb0c32d886fc27bdc5642399ed410de042ed85"
    },
    {
      "hash": "ffb63b9b69d579fabd05d81a84c679dc396c29a663fcd244b0e8c600257478f3"
    }
  ],
  "fileinfo": {
    "240d86861138fbf8a34c1bf96412bf290dc8eae4a560473b0ecee605b8d1288f": [
      {
        "name": "python-designateclient_2.3.0-2.dsc",
        "path": "/pool/main/p/python-designateclient",
        "size": 3417,
        "archive_name": "debian",
        "suite_name": "buster",
        "component_name": "main",
        "timestamp_ranges": [
          ["20170618T072316Z", "20170821T035341Z"],
          ["20170822T154312Z", "20170922T035316Z"],
          ["20170924T042402Z", "20171024T092932Z"],
          ["20171025T221056Z", "20171106T213509Z"]
        ]
      },
      {
        "name": "python-designateclient_2.3.0-2.dsc",
        "path": "/pool/main/p/python-designateclient",
        "size": 3417,
        "archive_name": "debian",
        "suite_name": "unstable",
        "component_name": "main",
        "timestamp_ranges": [
          ["20170101T032652Z", "20171101T160520Z"]
        ]
      }
    ],
    "d65b4d861612c0bed42cdecedbcb0c32d886fc27bdc5642399ed410de042ed85": [
      {
        "name": "python-designateclient_2.3.0-2.debian.tar.xz",
        "path": "/pool/main/p/python-designateclient",
        "size": 4208,
        "archive_name": "debian",
        "suite_name": "buster",
        "component_name": "main",
        "timestamp_ranges": [
          ["20170618T072316Z", "20170821T035341Z"],
          ["20170822T154312Z", "20170922T035316Z"],
          ["20170924T042402Z", "20171024T092932Z"],
          ["20171025T221056Z", "20171106T213509Z"]
        ]
      },
      {
        "name": "python-designateclient_2.3.0-2.debian.tar.xz",
        "path": "/pool/main/p/python-designateclient",
        "size": 4208,
        "archive_name": "debian",
        "suite_name": "unstable",
        "component_name": "main",
        "timestamp_ranges": [
          ["20170101T032652Z", "20171101T160520Z"]
        ]
      }
    ],
    "ffb63b9b69d579fabd05d81a84c679dc396c29a663fcd244b0e8c600257478f3": [
      {
        "name": "python-designateclient_2.3.0.orig.tar.xz",
        "path": "/pool/main/p/python-designateclient",
        "size": 57008,
        "archive_name": "debian",
        "suite_name": "buster",
        "component_name": "main",
        "timestamp_ranges": [
          ["20170618T072316Z", "20170821T035341Z"],
          ["20170822T154312Z", "20170922T035316Z"],
          ["20170924T042402Z", "20171024T092932Z"],
          ["20171025T221056Z", "20171106T213509Z"]
        ]
      },
      {
        "name": "python-designateclient_2.3.0.orig.tar.xz",
        "path": "/pool/main/p/python-designateclient",
        "size": 57008,
        "archive_name": "debian",
        "suite_name": "unstable",
        "component_name": "main",
        "timestamp_ranges": [
          ["20170101T032652Z", "20171101T160520Z"]
        ]
      }
    ]
  }
}

Get binary files info for python-designateclient package version 2.3.0-2 (http://snapshot.notset.fr/mr/binary/python-designateclient/2.3.0-2/binfiles?fileinfo=1):

{
  "_api": "0.3",
  "_comment": "notset",
  "binary_version": "2.3.0-2",
  "binary": "python-designateclient",
  "result": [
    {
      "hash": "c50880146a09fa6a6f9cd7dfc11d5c0fc1147c673f938d0a667d348f59caf499",
      "architecture": "all"
    }
  ],
  "fileinfo": {
    "c50880146a09fa6a6f9cd7dfc11d5c0fc1147c673f938d0a667d348f59caf499": [
      {
        "name": "python-designateclient_2.3.0-2_all.deb",
        "path": "/pool/main/p/python-designateclient",
        "size": 43340,
        "archive_name": "debian",
        "suite_name": "buster",
        "component_name": "main",
        "timestamp_ranges": [
          ["20170618T072316Z", "20170821T035341Z"],
          ["20170822T154312Z", "20170922T035316Z"],
          ["20170924T042402Z", "20171024T092932Z"],
          ["20171025T221056Z", "20171106T213509Z"]
        ]
      },
      {
        "name": "python-designateclient_2.3.0-2_all.deb",
        "path": "/pool/main/p/python-designateclient",
        "size": 43340,
        "archive_name": "debian",
        "suite_name": "unstable",
        "component_name": "main",
        "timestamp_ranges": [
          ["20170101T032652Z", "20171101T160520Z"]
        ]
      }
    ]
  }
}

For every file, you have the detailed info in terms of archive, suite, component and timestamps it has been seen. For a given location, the timestamp_ranges is a set of all timestamp ranges that a file is present. A timestamp range is in the format of [begin_timestamp, end_timestamp] and contains all the timestamps available for the archive between begin_timestamp and end_timestamp.

Compute a minimal set of timestamps containing all package versions referenced in a buildinfo file

  • Example 1 (curl -F 'buildinfo=<-' http://snapshot.notset.fr/mr/buildinfo < bash_5.1-2_amd64.buildinfo):
{
  "_api": "0.3",
  "_comment": "notset: This feature is currently very experimental!",
  "results": [
    {
      "archive_name": "debian",
      "suite_name": "bullseye",
      "component_name": "main",
      "architecture": "amd64",
      "timestamps": [
        "20210101T211102Z",
        "20210110T204103Z",
        "20210116T204022Z",
        "20210208T213147Z"
      ]
    },
    {
      "archive_name": "debian",
      "suite_name": "unstable",
      "component_name": "main",
      "architecture": "amd64",
      "timestamps": [
        "20201230T203527Z",
        "20210106T142920Z"
      ]
    },
    {
      "archive_name": "debian",
      "suite_name": "buster",
      "component_name": "main",
      "architecture": "amd64",
      "timestamps": [
        "20210705T151228Z"
      ]
    }
  ]
}

For every known locations in terms of archive_name, suite_name, component_name and available architecture, it gives the set of timestamps containing all package versions referenced in the provided buildinfo file. For rebuilder softwares, you would use only one location which can contain more or less timestamps to be added to cover all the packages dependencies.

  • Example 2 (curl -F 'buildinfo=<-' http://snapshot.notset.fr/mr/buildinfo?suite_name=buster < bash_5.1-2_amd64.buildinfo):
{
  "_api": "0.3",
  "_comment": "notset: This feature is currently very experimental!",
  "results": [
    {
      "archive_name": "debian",
      "suite_name": "buster",
      "component_name": "main",
      "architecture": "amd64",
      "timestamps": [
        "20210705T151228Z"
      ]
    }
  ]
}

It supports to filter which Debian suite to use. Additional filtering options will be provided in a near future.

Archives from other distributions

QubesOS

We include the support for the multi-versions repository of QubesOS. On this repository, we can find the QubesOS packages for bullseye and buster. As there is not strictly speaking snapshots but a repository having multiple versions for packages, we reference the unique timestamp as 99990101T000000Z. Archives are named as qubes-rX.Y-vm where rX.Y references the Qubes release and vm is the Qubes package-set.

Installation

In this section, we give a quick installation guide.

For the Snapshot repositories, install the following dependencies:

$ sudo apt install postgresql-13 postgresql-plpython3-13 python3-debian python3-sqlalchemy python3-httpx python3-tenacity

Additional, for the Snapshot API install:

$ sudo apt install python3-sqlalchemy python3-psycopg2 python3-flask python3-flask-caching python3-flask-sqlalchemy python3-dateutil uwsgi uwsgi-plugin-python3 nginx-full

In what follows, we assume to have a user user. As user, go to /home/user folder and clone the repository:

$ git clone https://github.com/fepitre/debian-snapshot

Install the snapshot-api.service:

$ sudo cp /home/user/debian-snapshot/api/snapshot-api.service /usr/lib/systemd/system

Note: Ensure that WorkingDirectory in snapshot-api.service points at api folder into the path of the cloned git directory, here /home/user/rebuilder/api. Pay attention that in /home/user/rebuilder/api/snapshot-api.ini, the configuration file for uWSGI, the application is run with uid = user and gid = www-data. If case where you have a different user than user, adjust the uid value.

Then:

$ sudo systemctl daemon-reload

Create necessary folders:

$ sudo mkdir -p /snapshot /var/run/snapshot /var/log/snapshot

and adjust permissions:

$ sudo chown user:www-data /var/run/snapshot
$ sudo chown postgres:postgres /var/lib/postgresql
$ sudo chown user:user /var/log/snapshot

Copy nginx sample configuration:

$ sudo cp /home/user/debian-snapshot/api/nginx.conf /etc/nginx/

WARNING: This configuration serves only as an example. It has to be adapted and hardened in function of your setup.

Init postgresql snapshot database:

$ sudo -u postgres psql < /home/user/debian-snapshot/init_db.psql

Enables services:

$ sudo systemctl enable postgresql@13-main
$ sudo systemctl enable snapshot-api
$ sudo systemctl enable nginx

In user's crontab (crontab -e), add the following cron job:

0 */3 * * * /home/user/debian-snapshot/scripts/snapshot-mirror-cron.sh

You can now start the services:

$ sudo systemctl start postgresql@13-main
$ sudo systemctl start snapshot-api
$ sudo systemctl start nginx

debian-snapshot's People

Contributors

fepitre avatar zephone avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

debian-snapshot's Issues

API interface 'latest' not pointing to latest image

The API of latest currently points to 20220822T155847Z
However, the latest timestamp for debian.txt is at the moment 20220916T090657Z

With 20220916T090657Z I've been able to correctly generate live-build ISO images (smallest-build and gnome).
Did something change for the latest API?

Add a marker to signify that the sync has not completely finished yet

Today an interesting reproducible difference was diagnosed, because the snapshot image was missing the file InRelease http://snapshot.notset.fr/archive/debian/20220212T031344Z/dists/sid/ in the first build.

This can be seen in https://jenkins.debian.net/view/live/job/reproducible_debian_live_build_gnome_sid/81/ (until about 2022-02-21)

See also #9 where earlier incomplete snapshots were encountered.

Would it be possible to place a marker file 'not_completely_synced_yet' when you start syncing the repository, and remove that marker file when the last sync run has completed without errors?
Such a marker does not need to be on the dists level, it could also be on the top level.

Please publish announced caching proxy

In this video https://debconf21.debconf.org/talks/22-making-use-of-snapshotdebianorg-for-fun-and-profit/ they said that they wrote caching proxy for snapshot.debian.org , which has special quirks for snapshot.debian.org . Please, publish it. As well as I understand software in this repo ( https://github.com/fepitre/debian-snapshot ) is not that caching proxy. This repo ( https://github.com/fepitre/debian-snapshot ) contains tool, which downloads all packages for given timestamp+arch+suite triple at once. I don't need this. Please, publish caching proxy mentioned in that video.

Now let me describe why I need this.

I'm creating tool for creating Debian images "from past". I cannot use snapshot.debian.org directly because of its download limits. So I use https://debian.notset.fr/snapshot/archive/debian/ . But it doesn't contain suites such as "bullseye-updates". So it seems the only way to proceed for me is to set up my own local caching proxy for snapshot.debian.org with snapshot.debian.org-specific quirks. I cannot use this repo ( https://github.com/fepitre/debian-snapshot ), because (as well as I understand) this tool downloads whole timestamp+arch+suite at once, it is not caching proxy. It seems I also cannot use general purpose caching proxy, such as squid, because it is not aware of snapshot.debian.org quirks. So now I'm currently writing my own caching proxy for snapshot.debian.org in Rust. But this would be very good if you simply publish caching proxy you already have

Add non-free-firmware

Helle Fréderic,

Would it be possible to add the non-free-firmware section (in addition to main)?
I'm currently preparing an update to live-build to use the new firmware section and Jenkins uses snapshot.notset.fr.
I have no need to go back in time to the moment that the new section was created.

With kind regards,
Roland Clobus

Latest snapshot still points to 2022-12-02

Hello Frédéric,

https://snapshot.notset.fr/mr/timestamp/debian/latest still points to 20221202T085321Z, even though it is already 20221207.

Is there no newer snapshot available?

With kind regards,
Roland Clobus

Add 'sid' in addition to 'unstable'

Hello Frédéric,

See:
https://snapshot.notset.fr/archive/debian/20211122T030439Z/dists/
https://snapshot.debian.org/archive/debian/20211122T030439Z/dists/

https://jenkins.debian.net/view/live/job/reproducible_debian_live_build_standard_sid

In the Jenkins tests, I started to use 'sid' instead of 'unstable'. That means that the tests will fail, because on snapshot.notset.fr there is only 'unstable'. The big snapshot server has a symlink to connect 'sid' to 'unstable'.
Would that be something you could/would add?

With kind regards,
Roland

The MR API points to old snapshots

Helle Fréderic,

The MR API functions all point to a version of Christmas 2022, though newer snapshots have been generated.
Could you take a look?

With kind regards,
Roland Clobus

https://snapshot.notset.fr/mr/timestamp/debian/latest -> 20221225T204518Z
https://snapshot.notset.fr/mr/timestamp/debian -> 20221225T204518Z
https://snapshot.notset.fr/by-timestamp/debian.txt -> 20230101T091029Z

https://snapshot.debian.org/archive/debian/?year=2023&month=1 -> 20230101T091029Z

http://debian.notset.fr/snapshot/archive/debian/ works and http://debian.notset.fr/snapshot/archive/debian doesn't

URL http://debian.notset.fr/snapshot/archive/debian/ works and URL http://debian.notset.fr/snapshot/archive/debian doesn't. Here is wget output:

d-user@comp:/tmp$ wget http://debian.notset.fr/snapshot/archive/debian/
--2022-10-22 18:12:17--  http://debian.notset.fr/snapshot/archive/debian/
Resolving debian.notset.fr (debian.notset.fr)... 80.11.163.215
Connecting to debian.notset.fr (debian.notset.fr)|80.11.163.215|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html                                  [      <=>                                                                       ] 818,82K   536KB/s    in 1,5s    

2022-10-22 18:12:19 (536 KB/s) - ‘index.html’ saved [838475]

d-user@comp:/tmp$ wget http://debian.notset.fr/snapshot/archive/debian
--2022-10-22 18:12:21--  http://debian.notset.fr/snapshot/archive/debian
Resolving debian.notset.fr (debian.notset.fr)... 80.11.163.215
Connecting to debian.notset.fr (debian.notset.fr)|80.11.163.215|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://debian.notset.fr/archive/debian/ [following]
--2022-10-22 18:12:21--  http://debian.notset.fr/archive/debian/
Reusing existing connection to debian.notset.fr:80.
HTTP request sent, awaiting response... 404 Not Found
2022-10-22 18:12:21 ERROR 404: Not Found.

P. S. Thanks a lot for service. I plan to use it. It would be very cool if you also add stretch suite and also stretch-updates and stretch/updates

Add installer files

As discussed on IRC: could you add the installer files? There is (from my part) no need to go back to 2017, starting 'today' is fine.

A sample URL is:
http://debian.notset.fr/snapshot/archive/debian/20210819T144544Z/dists/bullseye/main/installer-amd64/current/images/cdrom/vmlinuz

All files under /current/images can be downloaded by the live-build script.

For reference: the command that I used is: (based on https://wiki.debian.org/ReproducibleInstalls/LiveImages)
lb config --apt-http-proxy http://localhost:3142 --parent-mirror-bootstrap http://debian.notset.fr/snapshot/archive/debian/20210819T144544Z --parent-mirror-binary http://debian.notset.fr/snapshot/archive/debian/20210819T144544Z --security false --updates false --apt-options "--yes -o Acquire::Check-Valid-Until=false" --distribution bullseye --debian-installer live --cache-packages false

main/dep11 is not mirrored but requested by apt-get update

Dear @fepitre !

Trying to perform a scripted apt-get update on the following /etc/apt/sources.list:

deb [check-valid-until=no] http://debian.notset.fr/snapshot/archive/debian/$SNAPSHOT buster main
deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/$SNAPSHOT buster-updates main
deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/$SNAPSHOT buster-backports main

I am stuck at

http://debian.notset.fr/snapshot/archive/debian/20210617T212009Z/dists/buster/main/dep11/icons-48x48.tar

not mirrored. The main/dep11 directory is present in snapshot.debian.org but absent in the debian.notset.fr mirror,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.