ireneknapp / codex Goto Github PK

A container for discussion and early exploratory work towards a new package repository for Haskell.

License: BSD 3-Clause "New" or "Revised" License

Haskell 100.00%

codex's Introduction

What This Is

A container for discussion and early exploratory work towards a new package repository for Haskell (or at least ). Our mission is to be transparent about our design process and welcoming of community support and feedback, and to actually ship.

We are not the Hackage 2 project.

We intend to help support Hackage 2 succeed, and provide a vehicle for experimenting with complementary ideas to enable hackage style tooling for a rich set of use cases.

We expect to lean heavily on the decisions that its members have made on major design issues, and perhaps even borrow some code as permitted by its license, we are dedicated to a process that everyone can participate in, and to previews becoming usable and useful from early on.

As far as specific design decisions, no individual decision is so important as to be part of our mission statement here. But, see our issue tracker for our latest thinking on some key questions.

What This Is Not

There's no code yet.

codex's People

Contributors

Watchers

codex's Issues

Ideas for distributed hackage

One way to spread load when spinning up new hackage mirrors would be to use bit torrent. Or something similar.

Do we want to store our blobs in the database, or where then?

I'd actually suggest that we might consider keeping them in Amazon S3. This removes us from the business of building features related to blob storage.

Description of server state

Okay, so the idea is that all state is divided into three parts: configuration, which describes how parts of the system can find each other; database, which is relational information about packages, users, and so on; and blob repository, which is a bunch of, er, blobby files.

I'd like the config files to use JSON (I am a fan of Aeson as the interface to it), because it's reasonably standard and doesn't have any really bizarre or surprising syntactic rules like YAML does. That also saves us time on writing a parser; the other obvious option would be a simple plaintext format that we define.

I'd like the database to use SQLite3, because I know and trust it, and it's trivial to set up and use and even back up and restore. Another sane option would be PostgreSQL, but that has substantially more administrative overhead.

I'd like the blob repository to live in Amazon S3. This makes distribution of files almost trivial, since we can simply grant public access to the appropriate parts of it. I've already poked around a bit and created a possible folder hierarchy we might use; see issue #16. The alternative to S3 would be the local, per-mirror filesystem, but this runs into size constraints, and means that each mirror, which doesn't really need to poke at the contents of packages except when it's in the act of building them, has to do a large up-front download before it can come online. It also introduces complications of synchronizing this state across federated mirrors. Now, S3 is not without its administrative hassles, but they relate to assigning permissions, which feels like a cleaner category of problem to have.

Is distributing binaries in-scope?

Is distributing binaries in-scope? Probably not, eh.

business user remarks / opinoins

what follows is a transcript of bullet points from someone using haskell in their business (and I think it articulates a number of ideas better than I would have )

internal use [editor: of a hackage-like, eg codex], and security policies are pretty much essential features to using it, i believe
It's hard to justify building a package management system today with out signatures (for author validation) given the Ruby Debacle
I want a place where I can put a proxy in front of hackage for any internal packages
And where I can trust the package I downloaded and compiled into my system as the same one being vetted accross the community
because how often do you grep your cabal install downloads for use of unsafePeformIO? If you're like me: never

near term plan

theres a lot of nice things we want to do over time, but
dead easy hackage mirror is the first step. Theres a lot of nice things we can do on top of that, but thats really step one.

It would be nice to have an issue here for each feature of the extant Hackage 1...

See http://code.haskell.org/~ross/hackage-scripts for its source.

Make a command-line tool to assist in the creation of a new mirror

See issue #17 first, for background reading. I'm moving forward on the assumption that we're going to use the technologies I suggest there - JSON for a config file, SQLite3 for a database engine, and Amazon S3 for blob storage and distribution - but I could be talked out of any of these.

Make a command-line tool to assist in the creation of a new mirror. I'm envisioning a multi-command executable that takes a subcommand name as its first argument, with the first subcommand to be implemented being "config", which simply asks questions, pokes at the systems it's ostensibly connecting to a little, and spits out a config file.

The config file should at the very least contain S3 credentials and bucket identifier. The credentials are two fields, an access key and a secret. (A "bucket" is the top-level container of stuff in S3.)

So I'm in a hurry now and want to get all these thoughts down, so I'm going to just describe the flow of the steps I envision "config" doing. I originally thought the command line might be suitable, but now that I see how many steps there are, I'm thinking something more like the "dialog" program, which is that great set of tools that Debian and the Linux kernel makefiles both use for graphical terminal-based configuration.

The reason I think something interactive of this nature is necessary is because I was trying to document these steps and they're fairly error-prone. Plus, Amazon's console is subject to change; its API is not.

"This command will assist you in configuring a new mirror of Codex, a Haskell software-distribution system. You probably don't need to run your own mirror, unless you have code which you wish to publish internally but not to the world at large. I'll assume since you haven't ^Ced out of the program that you wish to continue..."

"First, do you wish to set up the first server in a federation of servers, or a mirror of an existing federation?"

(User chooses first in a federation.)

"Okay. You will need to have an existing Amazon Web Services account. This tool can create the resources it needs therein, which consist of an S3 bucket, an IAM group, and an IAM user with an access key. The tool can also utilize existing resources, if you wish to create them manually. If you wish to go with the automated solution, you will need to supply an access key and secret which will not be stored, only used to create the credentials which will actually be used. Which would you like to do - automated, or manual?"

(User chooses automated.)

"I'm pleased to hear that." [Software should be polite! :D] "What is your access key?" (User does so.) "And your secret?" (User does.) "Checking - okay, these are valid. If there is an existing IAM group you wish to use for the machines in this federation, please select it now; otherwise, just choose "create" to create a new one. The following are the IAM groups extant: ..."

(User chooses "create".)

"Okay. Do you have a preference for the name of this group? If so, specify it now. If not, I will use "codex"."

(User chooses the default.)

"Good. I will use the group "codex" as the IAM group to create my user in. Or have you already created the user? There are no IAM users in the codex group, and fifteen users overall, as follows: ..."

(User chooses "create".)

"I notice that this computer's hostname is "silly-cat-joke". Would you like the user to be named that as well, or do you have a preference, or should it be set to something arbitrary?"

(User chooses "silly-cat-joke".)

"Good. Next, would you like to create an S3 bucket, or use an existing one? There are 3 extant buckets, as follows: ..."

(User chooses "create".)

"What should it be called? If you have no preference, I will use "codex"."

(User chooses the default.)

"All right. The bucket has been created." [Conveniently, we don't need to create directory structure; it doesn't really exist.] "I have also granted the "codex" group the appropriate permissions on it."

"We need to know where to keep our local database. The default is /var/lib/codex/database."

(User chooses the default.)

"Okay. The new config file is written to config.json in the current working directory; move it to wherever your init.d script will be able to find it. Note that this file contains precious information, so don't casually delete it to start over; doing so will leave inaccessible resources that require cleanup work by the federation administrator."

It would be nice to have an issue here for each major design decision that was made on the hackage2 mailing list...

Is building a hoogle database in-scope?

Is building a hoogle database in-scope? That would be rather nice, but it's not clear to me how much work it would be or whether it is deployable. If, for example, it were based on acid-state, that would consume far more server resources than we (or anybody) could afford.

Is building documentation in-scope?

Is building documentation in-scope? I think that the documentation repository is the main nice thing about Hackage 1, as it stands, so we really want this to be.

Is building packages in-scope?

Is building packages for testing and documentation purposes in-scope? We should consider this carefully. It may be an "in a few weeks" thing rather than a "right this moment" thing.

What is our backup strategy?

I am the author of direct-sqlite, which is the layer underneath sqlite-simple. I think it would be very little work to add support for SQLite3's online-backup API to both these projects. Then, as long as we store everything in the database (except for configuration, which properly belongs outside it for ops reasons), we can easily create self-contained backups of everything.

ireneknapp / codex Goto Github PK

codex's Introduction

What This Is

What This Is Not

codex's People

Contributors

Watchers

codex's Issues

Recommend Projects

Recommend Topics

Recommend Org