The Swift Package Index is the place to find Swift packages!

Home Page: https://swiftpackageindex.com

License: Apache License 2.0

Dockerfile 0.14% Swift 69.90% Shell 0.24% Makefile 0.14% JavaScript 1.95% SCSS 4.55% HTML 23.08%

swiftpackageindex-server's Issues

Site design

I've done a lot of thinking about how we want to show package metadata this weekend, and I'm posting my initial draft designs here for discussion. Even though I'm planning to start implementing some parts of this tomorrow, everything is still up for open and frank review in this thread.

When I started this process, my main goal was to move away from the approach of trying to display all metadata about multiple versions of a package inline in the search results. It's already too much to take in and also doesn't show anywhere near what people need to know to make informed decisions about package quality.

The idea, as discussed previously, was to move to a "page per package" approach giving much more scope for displaying information that people can use to judge the maturity, quality, and maintenance history of a package.

I also rolled back my thinking displaying metadata from multiple package versions as part of this process. The fact that we capture all metadata from multiple versions it critical, but it's only occasionally useful for people to see. When it's useful, it's instrumental, but that tends to only be around the release of new Swift/operating system versions. The fact that we have this data is a huge advantage, but I want to be smart about displaying it.

So, here's my first draft of the per-package page, where I spent most of my time and thinking this weekend.

Some notes:

Display of multiple version metadata

You'll see an example of my idea for dealing with changing metadata between significant versions in the Languages and Platforms section. Most packages will use the same version of Swift/iOS/macOS/etc. for every version. It doesn't change that often. So let's only display it when it changes!

If all three significant versions use the same Swift versions, it just gets displayed once. Only if (in this example) the new beta upgraded Swift version would it be shown like it is here.

Prominence (or lack of) of the Swift version

You'll notice that one of the (seemingly) most essential bits of metadata, the version of Swift that's supported, is quite far down the listing. At first, this worried me too, but I think I'm OK with it after playing with it today. I think if we can get the metadata from git/GitHub about the number of releases, project start date, commits, etc. that gives an excellent indication of how active a project is. Active projects tend to support the latest version of Swift.

The Swift version needs to be there, and it is, but I don't think it's as important as that other metadata.

Contributors

Here's a basic plan that I think will work well for this line of metadata.

Any contributor with more than 10 commits is eligible to be listed here. This rule will save projects where someone fixes a typo in a README in a newly released project with just an "Initial commit."
For all eligible contributors, they are listed if they have made more than 5% of the total commits.
The name should be fetched from the GitHub API for the username
The link should go to the contributors GitHub profile.
The name and link should be able to be overridden by our extra metadata file. The list of contributors should be custom at this point, and we should replace the method above with only the names and links in the metadata file. The link may point to any web site, not just their GitHub profile. (Yes, this opens up the possibility of spam, but let's deal with that as it comes up).

Quality Score

I'm reserving some space with the grey [X] box on the right-hand side of the page for the "package quality index" score or whatever we call it. However, there are lots of decisions to make before we get to the specifics of this design. I'll raise a separate issue with some thoughts on it, maybe not tonight, but at some point.

Missing GitHub data

There metrics the sentences in this design that we're not currently collecting from the GitHub API. I didn't let that limit what I planned, but I realise we won't be able to implement all of this immediately, and that's OK 👍

Dependencies

The most obvious thing missing from this design is dependency details. I want to add this, but I feel it can be after an initial release.

Other interesting metadata

I had a quick chat with a friend of mine about this design earlier today. He mentioned a couple of metrics that he uses when evaluating a dependency. "Does it link against UIKit?" was one interesting one. "Does it swizzle any methods?" is another. I think this kind of metric might be really interesting to play with after an initial release.

I've also done some very preliminary thoughts on the home page. This gets affected by my thoughts on search though, which I'll post next! Here it is for reference though.

make checkouts directory configurable

Routing and URL generation.

This issue tracks the robust generation of URLs from routes.

We need a way to generate URLs for routes in a standard way, so we can (for example) request the URL for the show action of the package controller for a specific Package and get back a URL that currently looks like this:

/ packages/UUID

Once we have a robust way to generate URLs, we should also then supply those generated URLs to the web client via the API response, as mentioned here.

Populate release date metadata

Release date for every package version we store

@daveverwer to look at git command for fetching this.

`GIT_TERMINAL_PROMPT=0` leads to crash on Linux

Triggering repository is our friend https://github.com/SwiftyBeaver/AES256CBC.git

analyze_1    | Fatal error: Error raised at top level: ShellOut encountered an error
analyze_1    | Status code: 128
analyze_1    | Message: "fatal: could not read Username for 'https://github.com': terminal prompts disabled"
analyze_1    | Output: "": file /home/buildnode/jenkins/workspace/oss-swift-5.2-package-linux-ubuntu-18_04/swift/stdlib/public/core/ErrorType.swift, line 200
analyze_1    | Current stack trace:
analyze_1    | 0    libswiftCore.so                    0x00007ff392e15db0 swift_reportError + 50
analyze_1    | 1    libswiftCore.so                    0x00007ff392e87f60 _swift_stdlib_reportFatalErrorInFile + 115
analyze_1    | 2    libswiftCore.so                    0x00007ff392b9cdfe <unavailable> + 1383934
analyze_1    | 3    libswiftCore.so                    0x00007ff392b9ca07 <unavailable> + 1382919
analyze_1    | 4    libswiftCore.so                    0x00007ff392b9cfe8 <unavailable> + 1384424
analyze_1    | 5    libswiftCore.so                    0x00007ff392b9b2d0 _assertionFailure(_:_:file:line:flags:) + 520
analyze_1    | 6    libswiftCore.so                    0x00007ff392be3c4c <unavailable> + 1674316
analyze_1    | 7    Run                                0x000055e446ed5f51 <unavailable> + 7941969
analyze_1    | 8    libc.so.6                          0x00007ff39086eab0 __libc_start_main + 231
analyze_1    | 9    Run                                0x000055e4468619ea <unavailable> + 1173994
analyze_1    | 0x7ff39283e88f
analyze_1    | 0x7ff392b9b4e5
analyze_1    | 0x7ff392be3c4b
analyze_1    | 0x55e446ed5f50, main at /build/<compiler-generated>:0
analyze_1    | 0x7ff39086eb96
analyze_1    | 0x55e4468619e9
analyze_1    | 0xffffffffffffffff
swiftpackageindex-server_analyze_1 exited with code 132

Check what happens if a force push changes upstream revisions

Can we handle changes in tags or branches?

Missing package data

Package Search

First up, how does search currently work in the SwiftPM Library?

Information is donated from the various package/product/version records into a local ElasticSearch instance. The default branch is always the source search information as that's the most up to date.
The following information is donated:
- Package name (extracted from the Package.swift)
- Package description (currently from the GitHub description)
- The repository identifier (username/project, excluding github.com and other URL paraphernalia)
- The search score (for ordering only, it's not searched).
Search can be customised to look only in certain fields. For example, try this search compared to this search, or this search.

The plan was that at some point you might also want to customise search by other useful metadata like, for example:

Swift version
Platform compatibility
Authors real names, as well as the "free" search we get from including the repository identifier which includes a GitHub username.
More...

Without ElasticSearch, some of this becomes much harder (but not impossible) to achieve. As I was thinking about the design of the search form today, something struck me. Even though the colon syntax for searching is fairly standard, not everyone knows about it. So maybe we lean into a basic Postgres full-text search across a de-normalised version of the search data from the default branch, processed only to contain searchable data. Then, some checkboxes/input controls people can tick as they start to search. There's plenty of UI space on that home page for extra controls, and it might aid discovery of the feature.

Not all of this needs implementing straight away, but I want to discuss it about it as it affects the design.

Search results are lost when navigating between pages.

Steps to reproduce:

Home page
Type search query
Pick a result and click to navigate to that page
Click back

At this point, the search box still has the query in it, but the results are gone.

Fix is easy enough. When the DOM loaded, if there is text in the query field, re-do the search. It could even be as easy as just un-hiding the results div as I bet the results are still there. Needs a bit of testing.

Bad URLs in the master package list are currently ignored

In ReconcilePackageListCommand the fetchMasterPackageList method currently throws away bad URLs from the fetched JSON.

There should never be bad URLs in that file, but if there are, the reconciliation process should report on them.

Deployment automation

I've started having a look at out deployment options.

What I know works without any further research is the following:

mirror repo to Gitlab
instrument Linode box as a runner
drive deployment to it via .gitlab-ci.yml
run docker-compose up on the runner (the Linode box)

I'm doing this for several projects and it works well. The downside is that deployment runs separate-ish. While .gitlab-ci.yml could happily live inside the repo, what it triggers happens elsewhere. (Although arguably it has to happen elsewhere by definition - the Linode box - it's just a question of how much of it does.)

For Github actions I believe we can go the following:

build docker image as we do now
push to docker hub
set up ssh action that runs on Linode box
pull docker image
run docker-compose up

The slight issue I'm seeing right now is that there's no built in ssh action (AFAICT) and I'm not excited to use a 3rd party one that handles private keys as an input. However, I think it shouldn't be too hard to either fork one and use a version we control or just run the key handling inline.

Or maybe I'm to sensitive about the key? B/c unless we host our own runner (which we easily could, I'm doing it for three other GH projects) we'll always be passing our private key to some machine we don't control.

Wire up Issues and Pull Requests bullet on Package page

Depends on #87 being completed.

activityClause

Add version for default branch ("latest")

Partially fulfilled futures

I've posted this as a question in the Vapor Discord but I wanted to record this here as well, both for tracking and as a call for help, in case someone is coming across this issue here :)

I’m pasting the message I posted on Discord at the end but since it generalises away some of the specifics I wanted to mention these here as well.

To reproduce the issue, check out revision cfd9168 and run test AnalyzserTests.test_basic_analysis. It should fail around ~75% of the time with errors like the following:

/Users/sas/Projects/SPI-Server/Tests/AppTests/AnalyzerTests.swift:76: error: -[AppTests.AnalyzerTests test_basic_analysis] : XCTAssertEqual failed: ("[]") is not equal to ("[Optional("foo-2"), Optional("foo-2")]")
/Users/sas/Projects/SPI-Server/Tests/AppTests/AnalyzerTests.swift:77: error: -[AppTests.AnalyzerTests test_basic_analysis] : XCTAssertEqual failed: ("[]") is not equal to ("[Optional("2.0"), Optional("2.1")]")
Test Case '-[AppTests.AnalyzerTests test_basic_analysis]' failed (0.314 seconds).

i.e. the version updates for the second package are not flushed out to the db. The issue seems to be nested futures. I’ve tried to solve this in various ways (see the Discord message below) but in the end the only way to work around it was to introduce an explicit wait for the updates (revision 2098f12):

    let fulfilledUpdates = try updates.wait()
    let setStatus = fulfilledUpdates.map { (pkg, updates) -> EventLoopFuture<[Void]> in
        EventLoopFuture.whenAllComplete(updates, on: application.db.eventLoop)
            .flatMapEach(on: application.db.eventLoop) { result -> EventLoopFuture<Void> in
                switch result {
                    case .success:
                        pkg.status = .ok
                    case .failure(let error):
                        application.logger.error("Analysis error: \(error.localizedDescription)")
                        pkg.status = .analysisFailed
                }
                return pkg.save(on: application.db)
        }
    }.flatten(on: application.db.eventLoop)

Here’s the Discord post, for reference:

—

I've gotten myself into a bit of a bind with nested Futures in that I can't quite figure out how to ensure they all complete by the time a wait returns. (The wait is required for it to be run from a Command - or a test.)

At the start, I'm pulling a set of items from a table, yielding

let items: Future<[Item]> = ... fetch ...

(just going to call EventLoopFuture Future for brevity).

I then do some processing which yields a set up db saves per item, such that I'm left with

let updates: Future<[(Item, [Future<Void>])]> = ... update ...

i.e. a set of pairs of one Item with N Future<Void>s for the saves. If I stop here and

try updates.transform(to: ()).wait()

the saves flush out reliably - but that's just luck, right? The transform deals with the outermost future only and the inner ones just happen to be done before. I've run a test for this many, many times and it never failed.

As soon as I now stack on futher steps, even as simple as doing nothing and even completely disregarding the new status:

let status: Future<[Void]> = updates.flatMapEach(on: application.db.eventLoop) { _ in
    application.db.eventLoop.makeSucceededFuture(())
}
try updates.transform(to: ()).wait()  // NB: still just looking at updates

the cracks are beginning to show and some updates don't flush out to the db. (Test now fails ~75% of the time.)
I've taken a few stabs at unnesting the inner futures but the result is the same:

let unnested: Future<[Void]> = updates
    .flatMapEach(on: application.db.eventLoop) { (item, updates) -> Future<Void> in
        updates.flatten(on: application.db.eventLoop)
}
try unnested.transform(to: ()).wait()

let unnested = updates.flatMap { (r: [(Item, [Future<Void>])]) -> Future<Void> in
    let s = r.map { (item, updates) in
        updates.flatten(on: application.db.eventLoop)
    }
    return s.flatten(on: application.db.eventLoop)
}

and Future<Void>.andAllComplete(...) instead of flatten but the result is the same: often, some of the inner futures do not complete by the time the wait returns.

Where am I going wrong? How can I reliably unnest the futures such that every update has flushed when the final wait returns?

Repos turned private trigger `git clone` password prompt

It seems that https://github.com/SwiftyBeaver/AES256CBC.git went private and that messes with the analysis:

[ INFO ] cloning https://github.com/SwiftyBeaver/AES256CBC.git to /Users/sas/Projects/SPI-Server/SPI-checkouts/github.com-swiftybeaver-aes256cbc
Username for 'https://github.com': ^C⏎

I've tried various incantations of git -c core.askPass="" clone ... and git clone -c credentials.helper= clone ..., GIT_ASKPASS and SSH_ASKPASS where not yet, still always getting the prompt.

I thought this might be specific to me running the job on my Mac but it also tripped up the job in the docker container just now.

Need to try and disable the prompt in the docker container, should have more control over the env there.

Production setup

cloudflare (TTL, invalidation)
api access / auth? / rate limiting?
DNS

Supporting beta versions of Packages

Looking through the differences between the existing dataset from swiftpm.co, and the new dataset from this project there are a few differences. One that's important is that swiftpm.co correctly parses semantic versions that have a pre part, more commonly known as beta versions 😂

For example, for one package on swiftpm.co:

and for the same package in this project:

Swap around branches

As discussed yesterday with @daveverwer , the plan is to move master to vapor-3 and then vapor-4 to be the new master branch.

Happy to do that, just let me know what you prefer.

Revisit Ingestor.recordError

This needs to go elsewhere - under AppError? - and should be consolidated with onine logging.

I.e. right now recordError is both online error reporting and setStatus. That's mixing concerns a bit.

Add keyboard navigation of search results.

Arrow keys up and down to move between results, enter to visit the result package page.

Implement a basic search scoring algorithm

This search score should be used as the ORDER BY for the search API, and will be an initial version of the package quality score.

Rename `Repository` -> `HostedRepository`

As discussed in #1

Add a loading indicator while search results are being loaded.

Similar in design to the one on https://swiftpm.co.

Add two RSS feeds

Newly added packages.
New package versions.

Parse out owner and repository name

Store these in two fields:

username
reponame

Review Ingestor/Analyzer error paths

I need to rework both stages around their error handling. There are still gaps where errors are not caught, for instance this test fails:

    func test_invalidPackageCachePath() throws {
        // setup
        try savePackages(on: app.db, ["1", "2"], processingStage: .ingestion)

        // MUT
        try analyze(application: app, limit: 10).wait()

        // validation
        let packages = try Package.query(on: app.db).sort(\.$url).all().wait()
        XCTAssertEqual(packages.map(\.status), [.invalidUrl, .invalidUrl])
    }

by raising an invalidPackageCachePath exception out of the analyze call that should not bubble up.

In practise this might not happen, because an invalid url like that doesn't get this far in the chain. However, both stages, suffer from being written in the first few days and contain a mix of throwing and Result code which is hard to reason about.

Explore rewriting everthing around Result, if possible.

Add an icon next above the "No results" text in the search results.

Similar in design to the one on https://swiftpm.co.

Add test to ensure new platforms trigger errors/alerts

As discussed here #45 (comment)

Deleting the `vapor-3` branch

I think we're done with it by now @finestructure? Should I delete it?

Add error reporting mechanism

We need a way to collect and report on errors. Possible ways:

db table (problematic if the service itself has issues)
rollbar
slack/telegram

We could (should) of course report into multiple systems

Add a JavaScript copy button to the URL on the public package page.

Leaving this as a reminder I still need to do it. I haven't added any JavaScript yet, but will.

Back-end support for pulling issue and pull request data from GitHub

This is an additional (or even multiple additional) GitHub API request to fetch:

Number of open issues
Number of open pull requests
Date of last issue being closed
Date of last pull request being closed/merged

Generate a production GitHub API access token

Owned by the organisation, if possible.

Analyser: index.lock: File exists

Look how/if this gets reported and if we can clear it up:

2020-05-15T17:39:52.883069388Z [ ERROR ] analysis error: ShellOut encountered an error
2020-05-15T17:39:52.883122027Z Status code: 128
2020-05-15T17:39:52.883138198Z Message: "fatal: Unable to create '/checkouts/github.com-facebook-facebook-ios-sdk/.git/index.lock': File exists.
2020-05-15T17:39:52.883142468Z
2020-05-15T17:39:52.883146168Z Another git process seems to be running in this repository, e.g.
2020-05-15T17:39:52.883149748Z an editor opened by 'git commit'. Please make sure all processes
2020-05-15T17:39:52.883153048Z are terminated then try again. If it still fails, a git process
2020-05-15T17:39:52.883156258Z may have crashed in this repository earlier:
2020-05-15T17:39:52.883159378Z remove the file manually to continue."
2020-05-15T17:39:52.883162458Z Output: ""

Change `Reference.tag(String)` to `Reference.tag(SemVer)`

As discussed here: #36 (comment)

New database schema

Before we start converting the project to Vapor, it's going to be worth discussing the database schema. For some background, please see this document for project goals, guiding principles, and scope of this conversion project.

I've uploaded the current database schema for reference. This is the database that the SwiftPM Library currently runs on.

Editable version

Notes

I want to change the way that resetting data works. During development, it's useful and necessary to be able to force wipe or force update data for various bits of a package. In the old schema, this was done with the force_xxxxx_reset and force_xxxxx_update fields. In the new schema, the online metadata is split into a separate table. Wiping online metadata becomes a case of deleting the relevant record from that table, forcing a local metadata re-parse becomes a case of deleting the PackageVersion records. This cleans up both the force_xxxxx fields, but also a lot of the confusing datetime fields.
I've changed the way that Swift versions are stored. Originally there was a swift_versions table with a join table package_version_swift_versions to represent a package version being compatible with multiple versions of Swift. In this new design, it's an array type of enums representing the versions of Swift. The advantage here is that if Swift versions are stored in a database table, it becomes trivial to query packages by Swift version (For example "Show me all packages compatible with Swift 4.2"), I'm just not sure how often we actually need to do that.
I've excluded some of the error message fields. I'm sure some of these are going to be necessary but I think we should get the structure firmed up before adding those.

Downcase URLs on reconciliation

Fails to compile on Linux

Broke in revision 8dbd087

Recording here what I just posted on Discord:

I've hit an issue where an expression compiles and passes tests on macOS but fails to even compile on Linux:

private extension QueryBuilder where Model == Package {
    func filter(for stage: ProcessingStage) -> Self {
        switch stage {
            case .reconciliation:
                fatalError("reconciliation stage does not select candidates")
            case .ingestion:
                return group(.or) {
                    $0.filter(\.$processingStage == .reconciliation)
                        .filter(\.$updatedAt < Current.date().advanced(by: -Constants.reingestionDeadtime))
                }
            case .analysis:
                return filter(\.$processingStage == .ingestion)
        }
    }
}

The compiler error is:

/host/Sources/App/Models/Package.swift:143:31: error: type of expression is ambiguous without more context
                    $0.filter(\.$processingStage == .reconciliation)
                              ^~~~~~~~~~~~~~~~~~

Giving more context by qualifying the closure types:

                return group(.or) { (query: QueryBuilder<Package>) -> () in

changes the error to

/host/Sources/App/Models/Package.swift:141:35: error: cannot convert value of type '(QueryBuilder<Package>) -> ()' to expected argument type '(QueryBuilder<Package>) throws -> ()'
                return group(.or) { (query: QueryBuilder<Package>) -> () in
                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

so the throws appears to be the issue. Adding that still doesn't help:

/host/Sources/App/Models/Package.swift:141:39: error: cannot convert value of type '(FluentKit.QueryBuilder<App.Package>) throws -> ()' to expected argument type '(FluentKit.QueryBuilder<App.Package>) throws -> ()'
                return try group(.or) { (query: QueryBuilder<Package>) throws -> () in
                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The types look fine now. I've looked briefly at the innards of group etc but there's nothing that stands out at being platform specific. Is this a compiler issue best reported elsewhere?

Create search API

JSON API to return the following information for a search query:

Package name
Repository ID (username/reponame)
Summary

This API could one day become a proper search API, but for now it just needs these fields.

Make version bullet optional

Populate repository history metadata

First commit date
Last commit date
Total number of commits
Total number of releases

Source will be from local git repository.

@daveverwer to provide git commands for these.

Implement home page markup

Add docstrings to Ingestor/Analyzer

Analyzer
Ingestor

Don't set Package.status to `metadata_request_failed` if the GitHub API call fails with a "Forbidden" error

The GitHub API key running out of API requests isn't a failure of the package, but we currently store metadata_request_failed against the package if that API call fails.

My gut feeling says that if the GitHub API fails because the API is down, or if it returns a 500, or anything that's not the fault of the package, then the record shouldn't be updated at all. We should just move on, and it'll still be a candidate for next time the ingestor runs.

Doing it as we do at the moment means the status field can be a little meaningless.

Of course, if it's a 404, that's fine because it means the project has gone and I'd expect it to be a metadata_request_failed at that point.

Add a `<noscript>` tag to explain that JavaScript is required for search.

We could implement a non-JavaScript version one day but it's really not worth it. We would need to do an intermediate search results page, and everyone has JS switched on.

Populate Language & Platforms view model

Wire up in PackageShow.Model.query

Limit the number of search results in the JSON API

Two parts to this:

The number of results for any query should be limited to a maximum of 20.
If there are more than 20 results, include a boolean to indicate that so that a note can be shown to people to say they should narrow their search.

In order to achieve this, we should change the top-level JSON object to be a dictionary that includes a results key with the current array in it.

Save supported platforms as enum?

https://github.com/SwiftPackageIndex/SwiftPackageIndex-Server/blob/todo-cleanup/Sources/App/Models/Version.swift#L34-L35

Discussed here: #24 (comment)

Add new design for language and platforms to the package page

Add fields to the view model
Add the design markup

View model architecture

Continuation of discussion that started here

My gut feeling says that the view models shouldn't construct full sentences that are going to be used, but instead populate everything that might be needed.

Maybe something like this:

To construct this sentence:

"In development for over 5 years, with 1,433 commits and 79 releases."

The view model could contain:

First commit date (Date)
Number of commits (Int)
Number of releases (Int)

and to construct this sentence:

"By Christian Noon, Mattt, Jon Shier, Kevin Harwood, and 186 other contributors."

Array of:
- Full Name (String)
- Profile URL (URL)
Number of other contributors (Int)

Is that what you were thinking?

swiftpackageindex / swiftpackageindex-server Goto Github PK