swiftpackageindex / swiftpackageindex-server Goto Github PK
View Code? Open in Web Editor NEWThe Swift Package Index is the place to find Swift packages!
Home Page: https://swiftpackageindex.com
License: Apache License 2.0
The Swift Package Index is the place to find Swift packages!
Home Page: https://swiftpackageindex.com
License: Apache License 2.0
I've done a lot of thinking about how we want to show package metadata this weekend, and I'm posting my initial draft designs here for discussion. Even though I'm planning to start implementing some parts of this tomorrow, everything is still up for open and frank review in this thread.
When I started this process, my main goal was to move away from the approach of trying to display all metadata about multiple versions of a package inline in the search results. It's already too much to take in and also doesn't show anywhere near what people need to know to make informed decisions about package quality.
The idea, as discussed previously, was to move to a "page per package" approach giving much more scope for displaying information that people can use to judge the maturity, quality, and maintenance history of a package.
I also rolled back my thinking displaying metadata from multiple package versions as part of this process. The fact that we capture all metadata from multiple versions it critical, but it's only occasionally useful for people to see. When it's useful, it's instrumental, but that tends to only be around the release of new Swift/operating system versions. The fact that we have this data is a huge advantage, but I want to be smart about displaying it.
So, here's my first draft of the per-package page, where I spent most of my time and thinking this weekend.
Some notes:
You'll see an example of my idea for dealing with changing metadata between significant versions in the Languages and Platforms section. Most packages will use the same version of Swift/iOS/macOS/etc. for every version. It doesn't change that often. So let's only display it when it changes!
If all three significant versions use the same Swift versions, it just gets displayed once. Only if (in this example) the new beta upgraded Swift version would it be shown like it is here.
You'll notice that one of the (seemingly) most essential bits of metadata, the version of Swift that's supported, is quite far down the listing. At first, this worried me too, but I think I'm OK with it after playing with it today. I think if we can get the metadata from git/GitHub about the number of releases, project start date, commits, etc. that gives an excellent indication of how active a project is. Active projects tend to support the latest version of Swift.
The Swift version needs to be there, and it is, but I don't think it's as important as that other metadata.
Here's a basic plan that I think will work well for this line of metadata.
There metrics the sentences in this design that we're not currently collecting from the GitHub API. I didn't let that limit what I planned, but I realise we won't be able to implement all of this immediately, and that's OK π
The most obvious thing missing from this design is dependency details. I want to add this, but I feel it can be after an initial release.
I had a quick chat with a friend of mine about this design earlier today. He mentioned a couple of metrics that he uses when evaluating a dependency. "Does it link against UIKit?" was one interesting one. "Does it swizzle any methods?" is another. I think this kind of metric might be really interesting to play with after an initial release.
I've also done some very preliminary thoughts on the home page. This gets affected by my thoughts on search though, which I'll post next! Here it is for reference though.
This issue tracks the robust generation of URLs from routes.
We need a way to generate URLs for routes in a standard way, so we can (for example) request the URL for the show
action of the package
controller for a specific Package
and get back a URL that currently looks like this:
/ packages/UUID
Once we have a robust way to generate URLs, we should also then supply those generated URLs to the web client via the API response, as mentioned here.
@daveverwer to look at git command for fetching this.
Triggering repository is our friend https://github.com/SwiftyBeaver/AES256CBC.git
analyze_1 | Fatal error: Error raised at top level: ShellOut encountered an error
analyze_1 | Status code: 128
analyze_1 | Message: "fatal: could not read Username for 'https://github.com': terminal prompts disabled"
analyze_1 | Output: "": file /home/buildnode/jenkins/workspace/oss-swift-5.2-package-linux-ubuntu-18_04/swift/stdlib/public/core/ErrorType.swift, line 200
analyze_1 | Current stack trace:
analyze_1 | 0 libswiftCore.so 0x00007ff392e15db0 swift_reportError + 50
analyze_1 | 1 libswiftCore.so 0x00007ff392e87f60 _swift_stdlib_reportFatalErrorInFile + 115
analyze_1 | 2 libswiftCore.so 0x00007ff392b9cdfe <unavailable> + 1383934
analyze_1 | 3 libswiftCore.so 0x00007ff392b9ca07 <unavailable> + 1382919
analyze_1 | 4 libswiftCore.so 0x00007ff392b9cfe8 <unavailable> + 1384424
analyze_1 | 5 libswiftCore.so 0x00007ff392b9b2d0 _assertionFailure(_:_:file:line:flags:) + 520
analyze_1 | 6 libswiftCore.so 0x00007ff392be3c4c <unavailable> + 1674316
analyze_1 | 7 Run 0x000055e446ed5f51 <unavailable> + 7941969
analyze_1 | 8 libc.so.6 0x00007ff39086eab0 __libc_start_main + 231
analyze_1 | 9 Run 0x000055e4468619ea <unavailable> + 1173994
analyze_1 | 0x7ff39283e88f
analyze_1 | 0x7ff392b9b4e5
analyze_1 | 0x7ff392be3c4b
analyze_1 | 0x55e446ed5f50, main at /build/<compiler-generated>:0
analyze_1 | 0x7ff39086eb96
analyze_1 | 0x55e4468619e9
analyze_1 | 0xffffffffffffffff
swiftpackageindex-server_analyze_1 exited with code 132
Can we handle changes in tags or branches?
First up, how does search currently work in the SwiftPM Library?
The plan was that at some point you might also want to customise search by other useful metadata like, for example:
Without ElasticSearch, some of this becomes much harder (but not impossible) to achieve. As I was thinking about the design of the search form today, something struck me. Even though the colon syntax for searching is fairly standard, not everyone knows about it. So maybe we lean into a basic Postgres full-text search across a de-normalised version of the search data from the default branch, processed only to contain searchable data. Then, some checkboxes/input controls people can tick as they start to search. There's plenty of UI space on that home page for extra controls, and it might aid discovery of the feature.
Not all of this needs implementing straight away, but I want to discuss it about it as it affects the design.
Steps to reproduce:
At this point, the search box still has the query in it, but the results are gone.
Fix is easy enough. When the DOM loaded, if there is text in the query field, re-do the search. It could even be as easy as just un-hiding the results div as I bet the results are still there. Needs a bit of testing.
In ReconcilePackageListCommand
the fetchMasterPackageList
method currently throws away bad URLs from the fetched JSON.
There should never be bad URLs in that file, but if there are, the reconciliation process should report on them.
I've started having a look at out deployment options.
What I know works without any further research is the following:
.gitlab-ci.yml
I'm doing this for several projects and it works well. The downside is that deployment runs separate-ish. While .gitlab-ci.yml
could happily live inside the repo, what it triggers happens elsewhere. (Although arguably it has to happen elsewhere by definition - the Linode box - it's just a question of how much of it does.)
For Github actions I believe we can go the following:
The slight issue I'm seeing right now is that there's no built in ssh action (AFAICT) and I'm not excited to use a 3rd party one that handles private keys as an input. However, I think it shouldn't be too hard to either fork one and use a version we control or just run the key handling inline.
Or maybe I'm to sensitive about the key? B/c unless we host our own runner (which we easily could, I'm doing it for three other GH projects) we'll always be passing our private key to some machine we don't control.
Depends on #87 being completed.
activityClause
I've posted this as a question in the Vapor Discord but I wanted to record this here as well, both for tracking and as a call for help, in case someone is coming across this issue here :)
Iβm pasting the message I posted on Discord at the end but since it generalises away some of the specifics I wanted to mention these here as well.
To reproduce the issue, check out revision cfd9168 and run test AnalyzserTests.test_basic_analysis
. It should fail around ~75% of the time with errors like the following:
/Users/sas/Projects/SPI-Server/Tests/AppTests/AnalyzerTests.swift:76: error: -[AppTests.AnalyzerTests test_basic_analysis] : XCTAssertEqual failed: ("[]") is not equal to ("[Optional("foo-2"), Optional("foo-2")]")
/Users/sas/Projects/SPI-Server/Tests/AppTests/AnalyzerTests.swift:77: error: -[AppTests.AnalyzerTests test_basic_analysis] : XCTAssertEqual failed: ("[]") is not equal to ("[Optional("2.0"), Optional("2.1")]")
Test Case '-[AppTests.AnalyzerTests test_basic_analysis]' failed (0.314 seconds).
i.e. the version updates for the second package are not flushed out to the db. The issue seems to be nested futures. Iβve tried to solve this in various ways (see the Discord message below) but in the end the only way to work around it was to introduce an explicit wait
for the updates (revision 2098f12):
let fulfilledUpdates = try updates.wait()
let setStatus = fulfilledUpdates.map { (pkg, updates) -> EventLoopFuture<[Void]> in
EventLoopFuture.whenAllComplete(updates, on: application.db.eventLoop)
.flatMapEach(on: application.db.eventLoop) { result -> EventLoopFuture<Void> in
switch result {
case .success:
pkg.status = .ok
case .failure(let error):
application.logger.error("Analysis error: \(error.localizedDescription)")
pkg.status = .analysisFailed
}
return pkg.save(on: application.db)
}
}.flatten(on: application.db.eventLoop)
Hereβs the Discord post, for reference:
β
I've gotten myself into a bit of a bind with nested Futures in that I can't quite figure out how to ensure they all complete by the time a wait
returns. (The wait
is required for it to be run from a Command
- or a test.)
At the start, I'm pulling a set of items from a table, yielding
let items: Future<[Item]> = ... fetch ...
(just going to call EventLoopFuture
Future
for brevity).
I then do some processing which yields a set up db saves per item, such that I'm left with
let updates: Future<[(Item, [Future<Void>])]> = ... update ...
i.e. a set of pairs of one Item
with N Future<Void>
s for the saves. If I stop here and
try updates.transform(to: ()).wait()
the saves flush out reliably - but that's just luck, right? The transform deals with the outermost future only and the inner ones just happen to be done before. I've run a test for this many, many times and it never failed.
As soon as I now stack on futher steps, even as simple as doing nothing and even completely disregarding the new status
:
let status: Future<[Void]> = updates.flatMapEach(on: application.db.eventLoop) { _ in
application.db.eventLoop.makeSucceededFuture(())
}
try updates.transform(to: ()).wait() // NB: still just looking at updates
the cracks are beginning to show and some updates don't flush out to the db. (Test now fails ~75% of the time.)
I've taken a few stabs at unnesting the inner futures but the result is the same:
let unnested: Future<[Void]> = updates
.flatMapEach(on: application.db.eventLoop) { (item, updates) -> Future<Void> in
updates.flatten(on: application.db.eventLoop)
}
try unnested.transform(to: ()).wait()
or
let unnested = updates.flatMap { (r: [(Item, [Future<Void>])]) -> Future<Void> in
let s = r.map { (item, updates) in
updates.flatten(on: application.db.eventLoop)
}
return s.flatten(on: application.db.eventLoop)
}
and Future<Void>.andAllComplete(...)
instead of flatten
but the result is the same: often, some of the inner futures do not complete by the time the wait
returns.
Where am I going wrong? How can I reliably unnest the futures such that every update has flushed when the final wait
returns?
It seems that https://github.com/SwiftyBeaver/AES256CBC.git went private and that messes with the analysis:
[ INFO ] cloning https://github.com/SwiftyBeaver/AES256CBC.git to /Users/sas/Projects/SPI-Server/SPI-checkouts/github.com-swiftybeaver-aes256cbc
Username for 'https://github.com': ^Cβ
I've tried various incantations of git -c core.askPass="" clone ...
and git clone -c credentials.helper= clone ...
, GIT_ASKPASS
and SSH_ASKPASS
where not yet, still always getting the prompt.
I thought this might be specific to me running the job on my Mac but it also tripped up the job in the docker container just now.
Need to try and disable the prompt in the docker container, should have more control over the env there.
Looking through the differences between the existing dataset from swiftpm.co, and the new dataset from this project there are a few differences. One that's important is that swiftpm.co correctly parses semantic versions that have a pre
part, more commonly known as beta versions π
For example, for one package on swiftpm.co:
and for the same package in this project:
As discussed yesterday with @daveverwer , the plan is to move master
to vapor-3
and then vapor-4
to be the new master
branch.
Happy to do that, just let me know what you prefer.
This needs to go elsewhere - under AppError? - and should be consolidated with onine logging.
I.e. right now recordError
is both online error reporting and setStatus
. That's mixing concerns a bit.
Arrow keys up and down to move between results, enter to visit the result package page.
This search score should be used as the ORDER BY for the search API, and will be an initial version of the package quality score.
As discussed in #1
Similar in design to the one on https://swiftpm.co.
Store these in two fields:
I need to rework both stages around their error handling. There are still gaps where errors are not caught, for instance this test fails:
func test_invalidPackageCachePath() throws {
// setup
try savePackages(on: app.db, ["1", "2"], processingStage: .ingestion)
// MUT
try analyze(application: app, limit: 10).wait()
// validation
let packages = try Package.query(on: app.db).sort(\.$url).all().wait()
XCTAssertEqual(packages.map(\.status), [.invalidUrl, .invalidUrl])
}
by raising an invalidPackageCachePath
exception out of the analyze
call that should not bubble up.
In practise this might not happen, because an invalid url like that doesn't get this far in the chain. However, both stages, suffer from being written in the first few days and contain a mix of throwing and Result
code which is hard to reason about.
Explore rewriting everthing around Result
, if possible.
Similar in design to the one on https://swiftpm.co.
As discussed here #45 (comment)
I think we're done with it by now @finestructure? Should I delete it?
We need a way to collect and report on errors. Possible ways:
We could (should) of course report into multiple systems
Leaving this as a reminder I still need to do it. I haven't added any JavaScript yet, but will.
This is an additional (or even multiple additional) GitHub API request to fetch:
Owned by the organisation, if possible.
Look how/if this gets reported and if we can clear it up:
2020-05-15T17:39:52.883069388Z [ ERROR ] analysis error: ShellOut encountered an error
2020-05-15T17:39:52.883122027Z Status code: 128
2020-05-15T17:39:52.883138198Z Message: "fatal: Unable to create '/checkouts/github.com-facebook-facebook-ios-sdk/.git/index.lock': File exists.
2020-05-15T17:39:52.883142468Z
2020-05-15T17:39:52.883146168Z Another git process seems to be running in this repository, e.g.
2020-05-15T17:39:52.883149748Z an editor opened by 'git commit'. Please make sure all processes
2020-05-15T17:39:52.883153048Z are terminated then try again. If it still fails, a git process
2020-05-15T17:39:52.883156258Z may have crashed in this repository earlier:
2020-05-15T17:39:52.883159378Z remove the file manually to continue."
2020-05-15T17:39:52.883162458Z Output: ""
As discussed here: #36 (comment)
Before we start converting the project to Vapor, it's going to be worth discussing the database schema. For some background, please see this document for project goals, guiding principles, and scope of this conversion project.
I've uploaded the current database schema for reference. This is the database that the SwiftPM Library currently runs on.
force_xxxxx_reset
and force_xxxxx_update
fields. In the new schema, the online metadata is split into a separate table. Wiping online metadata becomes a case of deleting the relevant record from that table, forcing a local metadata re-parse becomes a case of deleting the PackageVersion records. This cleans up both the force_xxxxx
fields, but also a lot of the confusing datetime
fields.swift_versions
table with a join table package_version_swift_versions
to represent a package version being compatible with multiple versions of Swift. In this new design, it's an array type of enums representing the versions of Swift. The advantage here is that if Swift versions are stored in a database table, it becomes trivial to query packages by Swift version (For example "Show me all packages compatible with Swift 4.2"), I'm just not sure how often we actually need to do that.Broke in revision 8dbd087
Recording here what I just posted on Discord:
I've hit an issue where an expression compiles and passes tests on macOS but fails to even compile on Linux:
private extension QueryBuilder where Model == Package {
func filter(for stage: ProcessingStage) -> Self {
switch stage {
case .reconciliation:
fatalError("reconciliation stage does not select candidates")
case .ingestion:
return group(.or) {
$0.filter(\.$processingStage == .reconciliation)
.filter(\.$updatedAt < Current.date().advanced(by: -Constants.reingestionDeadtime))
}
case .analysis:
return filter(\.$processingStage == .ingestion)
}
}
}
The compiler error is:
/host/Sources/App/Models/Package.swift:143:31: error: type of expression is ambiguous without more context
$0.filter(\.$processingStage == .reconciliation)
^~~~~~~~~~~~~~~~~~
Giving more context by qualifying the closure types:
return group(.or) { (query: QueryBuilder<Package>) -> () in
changes the error to
/host/Sources/App/Models/Package.swift:141:35: error: cannot convert value of type '(QueryBuilder<Package>) -> ()' to expected argument type '(QueryBuilder<Package>) throws -> ()'
return group(.or) { (query: QueryBuilder<Package>) -> () in
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
so the throws
appears to be the issue. Adding that still doesn't help:
/host/Sources/App/Models/Package.swift:141:39: error: cannot convert value of type '(FluentKit.QueryBuilder<App.Package>) throws -> ()' to expected argument type '(FluentKit.QueryBuilder<App.Package>) throws -> ()'
return try group(.or) { (query: QueryBuilder<Package>) throws -> () in
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The types look fine now. I've looked briefly at the innards of group
etc but there's nothing that stands out at being platform specific. Is this a compiler issue best reported elsewhere?
JSON API to return the following information for a search query:
This API could one day become a proper search API, but for now it just needs these fields.
Source will be from local git repository.
@daveverwer to provide git commands for these.
The GitHub API key running out of API requests isn't a failure of the package, but we currently store metadata_request_failed
against the package if that API call fails.
My gut feeling says that if the GitHub API fails because the API is down, or if it returns a 500, or anything that's not the fault of the package, then the record shouldn't be updated at all. We should just move on, and it'll still be a candidate for next time the ingestor runs.
Doing it as we do at the moment means the status field can be a little meaningless.
Of course, if it's a 404, that's fine because it means the project has gone and I'd expect it to be a metadata_request_failed
at that point.
We could implement a non-JavaScript version one day but it's really not worth it. We would need to do an intermediate search results page, and everyone has JS switched on.
Wire up in PackageShow.Model.query
Two parts to this:
In order to achieve this, we should change the top-level JSON object to be a dictionary that includes a results
key with the current array in it.
Continuation of discussion that started here
My gut feeling says that the view models shouldn't construct full sentences that are going to be used, but instead populate everything that might be needed.
Maybe something like this:
To construct this sentence:
"In development for over 5 years, with 1,433 commits and 79 releases."
The view model could contain:
Date
)Int
)Int
)and to construct this sentence:
"By Christian Noon, Mattt, Jon Shier, Kevin Harwood, and 186 other contributors."
String
)URL
)Int
)Is that what you were thinking?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.