msimet / stile Goto Github PK

View Code? Open in Web Editor NEW

9.0 8.0 6.0 74.59 MB

Stile: the Systematics Tests In Lensing pipeline

License: BSD 3-Clause "New" or "Revised" License

Python 99.97% Shell 0.03%

stile's Introduction

Stile

Stile: the Systematics Tests In LEnsing pipeline

Installation instructions

Stile is a pure python package, no compilation needed. You can install it using:

pip install stile

Dependencies

To run Stile, you must have:

Python 2.7
NumPy

We also recommend:

TreeCorr, Mike Jarvis's 2-point correlation function code. All of our correlation function tests involve calls to this package.
PyFITS/Astropy to handle FITS tables and images. Stile can run on ASCII tables, but is much slower.
matplotlib to generate plots.

Note that these dependencies will be automatically installed by pip. More dependencies may be added in the future.

Documentation

The documentation is available online at http://stile.readthedocs.io/. You can also build it using Sphinx in the doc/ directory.

Current functionality

Right now, Stile can:

Generate an average shear signal around a set of points (if you've installed corr2), given two catalogs that you specify in a particular way, and plot the results (if you have matplotlib).
Perform a number of basic statistics measurements and print the results.
Make whisker plots.
Make scatter plots with trendlines.
Generate histograms.
Perform any of the above tests for data with different binning schemes applied, using a simple method of specifying the bins.
Interface with sufficiently recent versions of the HSC pipeline.

Wishlist

Over the upcoming months, we plan to add:

A more flexible way of specifying your data set than hard-coding the file names in your script.
Automatic drivers to run as many tests as your data set allows.
A larger suite of example code.
Tests on images, and the utilities to make those tests easier.

stile's People

Contributors

Stargazers

Watchers

Forkers

enourbakhsh eiffl wentaoluo dylancromer tobias-liaudat jeanne-jw

stile's Issues

Driver functions

We should expand the simple driver prototype used in examples/example_run.py into something that will run Stile flexibly via command-line or config interfaces.

I have some ideas for this already, so I'll plan on doing it after finishing the HSC data handler.

@HironaoMiyatake pointed out that if one of our SysTests fails, the whole program fails out; it would be nice to change this so it completes all the tests it can with the available data. (Certainly for the main drivers--not sure how this would work for user scripts.)

Tract and patch-level HSC interface

Expand the current CCD/visit HSC interface to include the coadd data divisions, tract and patch.

Default parameters for correlation functions?

At the moment, when you run a correlation function test, you must specify the units of the ra and dec fields, plus the number of bins, min/max separation, and units for the output correlation function.

I'm wondering if it's worth trying to work out some simple defaults for these values. Getting a max ra range and dec range (or x and y range) could give us a max separation, we could work out a decent guess of a min separation based on the number of objects in the given rectangle, set a default number of bins (maybe based on the number of points available), and as long as the units are all the same it doesn't really matter what we call them. It likely wouldn't be perfect, especially for non-rectangular areas, but might be easier than figuring out these values for every correlation function you'd like to run. (They could all still be specified, of course--these would just be used for unspecified quantities.)

The counterargument, which I think is a big one, is that this is opaque to the end user and it might not be obvious if we picked something really wrong from the endpoints. Plus, the max separation for a xi-type correlation function is probably bigger than the desired max separation for e.g. tangential shear around bright stars.

Thoughts?

ability to plot requirements curves

For some of the statistics we will calculate, it would be useful to be able to plot a tabulated requirements curve on top of the plot.

This could be kind of annoying to figure out, but Melanie has some idea how to do it.

PSF-residual correlations systematics tests

Write some code to generate PSF residuals and add some correlation function tests that use them (called the Rowe statistics by DES people). This requires a little care since we may have files already written to disk.

priorities for January

This is an issue for collecting priorities for Stile work in January.

@msimet , you mentioned some priorities to me, verbally. @HironaoMiyatake , do you have any thoughts? Are there any important things for which we don't have a person to work on it?

Add parameters to config in HSC module

Related to a bug @HironaoMiyatake found with our PSF flags w/r/t some changes in the HSC pipeline, I thought I'd open this issue for moving some more parameters to config objects for the HSC/LSST module. The particular problem was caused by some hard-coded PSF flags:

extra_mask = numpy.array([src.get('flux.psf.flags')==0 for src in data])

which also checked another flag originally; the second flag was removed with some pipeline updates, so running this code failed on the new rerun.

This one's easy enough to move to the config object, but as long as we're doing it, I thought we should check if there's anything else hard-coded that folks would like to have configurable instead.

This isn't high-priority, so feel free to add things as they occur to you...

Switch to `shape.hsm` for galaxies in HSC module

We should switch to using shape.hsm, the PSF-corrected shape measurements, for galaxies in the HSC module.

This is a little tricky, because right now we assume anything that wants a g1 for example can use the same routine, but we don't want to use shape.hsm for stars. However, should be doable.

Table writing doesn't write the proper columns for heterogeneous data sets

This is a bug I just discovered with the way MakeCorr2FileKwargs works. At the moment, if it has to write some files to disk, it only writes the fields that are in all the data sets it's writing--but that's a silly thing to do when performing, eg, a galaxy-galaxy lensing computation, where we only have positions and not shears for one of the two data sets. So right now any point-shear correlation functions fail, because it doesn't write out the shears for the second data set. Whoops!

I'll fix this on this branch.

Limit HSC galaxies in resolution, S/N, etc

Eventually we will want to be able to limit the HSC-pipeline galaxies we use in Stile by their resolution, S/N, etc; this should also be configurable.

directory cleanup

On one of the pull requests (#10) the decision was made to change the test/ directory to examples/, to avoid confusion with the unit test directory tests/. But the test/ directory hasn't gone away, it's still under version control. I'd like to (1) move its .gitignore into examples/ and (2) delete it (test/). Any objections? Melanie, are you okay with me doing this on master since it's such a basic cleanup, or do you want it on a branch with PR?

separate test classes vs. a single unified class with keywords

As discussed on pull requests #29 and #30 , there is a stylistic issue that we should decide on sooner rather than later:

When making test classes for the kinds of tests that might be done with a bunch of different quantities, do we
(a) Make separate tests for each of the different quantities (as for the correlation function tests, whisker plots, and scatter plots), or
(b) Make a single test that has keyword arguments to decide which quantity is being tested.

My gut reaction to these PRs was to prefer something like (b). I don't like having 10 (or whatever) different class names to juggle. A single class seems cleaner to me, and would enable code like:

for plot_type in ['value','residual']:
    for quantity in ['g1', 'g2', 'sigma']:
         WhateverWeCallThisThing(quantity=quantity, plot_type=plot_type, ...)

instead of making 6 separate calls to classes with different names.

However:

I know this is a matter of taste, so others may not agree.
There is a matter of "where is it important for us to put in effort right now?", and I suspect most of you will say, with good justification, that it's more important to continue enabling new functionality rather than refactoring existing functionality. (This could change how we put in new stuff, though, and we could plan to refactor old stuff later if we want.)
I'm not sure if there is any aspect of my suggestion that would not play nicely with config scripts, or with the HSC pipeline connection.
We could imagine enabling both modes of calling the tests, i.e., have both a general class and the more specific ones, where the general one would parse the keywords to call the appropriate specific one. This would violate the whole "only one way to do things" plan.

Thoughts? @TallJimbo , I'm not sure if you saw the discussion on the PR, so I wanted to bring this to your attention too...

Scatter plots

We need to add a functionality to make scatter plots. I plan to add a class for making general scatter plots, and then play with it by making a star vs PSF and residual(star-PSF) vs PSF plot.

Add public S-Cam data for unit tests in LSST/HSC pipeline interface

We cannot use HSC data since it is not public. Instead, we want to add HSC pipeline products of some public S-Cam data.

multiple single exposure results for each galaxy/star

@HironaoMiyatake , I'm opening this based on our e-mailed discussion. It was not clear to me if you planned on doing this off on some branch, or if you weren't sure how to do it, but in any case I thought it worth putting the idea on here.

For some systematics tests, we might want to be able to get multiple instances of the same object (from different exposures). For example, to estimate the effective rho_1 after combining exposures, if we don't quite trust the coadd PSF yet then using multiple instances of the same object might be a fair way to do it.

Statistical-quantity systematics test

Generic code for computing a bunch of statistical quantities: median, mean, standard deviation, etc.

Code review of master + more unit tests

I threw a bunch of stuff up on master so we could branch from it, but I'm sure it has errors in it currently.

This issue is for me to write unit tests for the files that currently do not have them (stile_utils.py and tests.py--data_handler.py doesn't currently have anything non-abstract in it). It's also for other people to point out errors with the code as it stands and for me (or volunteers) to fix them.

Updating HSC flags

We just had a discussion about which flags we should actually be using in the HSC module--this is a reminder to make these updates.

Generate randoms for HSC data sets

Write code to generate randoms corresponding to the HSC data products.

I think the right thing to do here is to:

For CCD/patch level tests, generate randoms on a per-CCD basis with the same WCS and number density as the data (times whatever random multiplier we want).
For visit-level tests, still generate the randoms on a CCD-level basis with the given WCS, but get the number density from the average number density of the whole visit or tract, plus Poisson noise. This way we don't have to try to reject chip gaps.
For tract-level tests...probably still generate on a patch-by-patch basis in case we don't include all the patches within the tract.

The last thing to be concerned about is rejection masks. Single pixels we don't worry about, but overlap regions for example we should. I'd rather not read in the masks if we don't have to but I'm not sure there's a better way to do it.

Reduce the number of computeShapes calls

When looking into #39, we found that computeShapes is one of the bottleneck for speeding up, and the number of this function calls can be reduced. For example, it seems that if we run star whisker and then PSF whisker, computeShapes are called separately for each test. If we collect all shapes needed for tests in a run and then we get these shapes by a single call of computeShapes, it can reduce the time.

HSC module: multiple tract/visit runs

@HironaoMiyatake requested a way to analyze multiple visits or tracts at once (currently we only do one at a time).

Terminology agreement

One annoying thing in the code currently is that I didn't keep strict terminology agreement; this issue is to remind me to fix:

columns and fields -> fields
options, params, args and kwargs -> args and kwargs
recarray -> either call them formatted arrays, or actually enforce that they're numpy.recarrays.

Make which tests to be run configurable

We should work out how to change which tests are run via the command line or config. Right now we have to hard-code that.

Binning in the HSC module

We have some code set up to bin our data in the main Stile code, but don't have any way to utilize that in the HSC module. We should think about how to include it.

File output: simple change

Hironao would like to change the file output so that the filename automatically includes visit/CCD or tract/patch or some other unique-ish identifier, so that plots don't get overwritten when you make a new one for a new CCD. I believe he said this should be a simple change that he can do. Is that right, @HironaoMiyatake ?

Investigate switch to `numpy.rec.fromarrays`

Just noticed in PR #62 that @HironaoMiyatake has a different way of making formatted arrays (well, recarrays) than the way we currently do it: the function numpy.rec.fromarrays. I see here that the major difference between formatted numpy arrays and recarrays is that access in recarrays is a little slower, so if you loop a lot it can be a time sink.

This is a reminder to myself to investigate (once the higher-priority items are done) exactly how much it slows us down to switch to recarrays, because I'd rather use numpy code if we can, but since it would be a change in how our data is stored, it could potentially slow us down quite a bit.

Matplotlib settings

Do we want to alter the way Matplotlib plots look by default?

There's a set of suggestions here, some of which I use. The ones I'm most likely to use in my own work are A) always using Computer Modern Roman (the LaTeX font) for labels and captions, even when not in TeX mode, B) tight_layout() (which is already used in at least some of our plotting routines), and C) setting the plot aspect ratio to be the golden ratio (which admittedly is a bit silly).

I can see this being annoying for those who have their own favored Matplotlib settings, however, in which case we'll be overriding them.

Unit tests for graphical SysTests

Hi folks,

I just realized that we don't have any unit tests for the ScatterPlot or WhiskerPlot sys tests. We clearly should have some, but the question is, since the main outputs are graphics...how do we test them?

We discussed this a bit back in the early planning stages of Stile, and I think the best idea I heard was to have a 'canonical' image stored in the repository, and then generate a 'test' image that reproduces that saved image in one half and plots the to-be-tested graphic in the other half. Users would then have to look at the generated images and verify that they were qualitatively the same. I can look into code that does this, unless somebody has another suggestion? (Another idea might be to dig into exactly what data is stored by matplotlib and how, but that may be nitty gritty stuff that wouldn't be too enlightening--hard to know unless we try.)

There are probably also numeric routines in those specific SysTests that should be unit tested as well, but I admit I haven't looked too closely for what they should be.

Speeding up HSC interface

We have to speed up the HSC interface, since it takes ~1 hour to process the entire FOV!
@msimet summarized what we could do as follows, roughly in order of priority.

Make sure we can use C++ loops on the catalogs, not Python.(snip) My guess is that this will
solve half of our problems at least.
Figuring out the catalog schema ahead of time and getting Key objects, rather than using a string every time we need to access a new column. Jim had said he could show us how to do this, but it might involve a big code refactor. (snip)
I don't think the masking to NaNs is a big time sink, but if it is, we could try just computing every quantity for every entry in the source catalog. I doubt that would be faster, but it's possible.

I'll start looking into 1.

Terminology for systematics tests

@TallJimbo brought up the fact that using "tests" to refer to our systematics tests was confusing when we also have "unit tests" and other software quality controls under the same name. This issue is therefore: brainstorm new names!

My first suggestion: "Diagnostic."

Whisker plots

We need some generic code to generate whisker plots. (We may want to hold off on this until we make a decision on #3, since this--more than other systematics tests--is a question of presentation.)

DataHandler for HSC

One of this big issues: a data handler that interfaces with the HSC/LSST pipeline.

I'm happy to volunteer for this one.

Presentation of results

We need to think about how results will be shown once the tests are run. I think there was consensus that we should output a data file containing the data shown in the plot for people who wish to use other plotting programs, but it would be nice to generate plots automatically (or on demand) as well.

Ideas so far:

Matplotlib
- Pro: familiar to most scientific users
- Con: Slow
displayQA from the LSST/HSC code
- Pro: Interactive; plots on demand so it wastes less computer time making unimportant plots
- Con: Requires ability to run web server (which at least @rmandelb and I won't have on the computer where our data lives)

Or, obviously, both of those. But in either case we'll need to write code that interfaces with those packages.

detect.is-primary for HSC patch-level tests?

First of all, a note that I just pushed commit c2131a2 which is a bugfix for the case where we're trying to mask on detect.is-primary. (The previous implementation didn't work if the catalog had already been masked by other flags.)

Second, right now, we mask on detect.is-primary for both the patch and tract-level tests. My question is whether we want to do that for the patch-level test. Tract we need it to avoid duplication; on the patch level, do we want to be checking the performance for things that were analyzed in the patch, but won't be used for our final analysis?

dangling branch or mistaken commit?

I notice that after the PR that merged branch #2 into master, another commit seems to have been made on that branch, so it is listed as being ahead of master on https://github.com/msimet/Stile/branches. Was that supposed to happen or is something weird going on?

I/O timing issues

I was just noticing as I looked at the TreeCorr Readme file that it uses a package called pandas instead of numpy for fast text file i/o:

This package significantly speeds up the reading of ASCII input catalogs over the numpy functions loadtxt or genfromtxt.

I see that we use genfromtxt. Do you have any sense for how much the file I/O is hitting us for typical uses of Stile? If it's a problem we might consider adding this dependency.

Histogram implementation

Some systematics tests involve histograms; this branch is to write the basic code that all specific histogram objects will inherit from.

We can probably use numpy.histogram to do this; I think the more important question is how we set defaults like number of bins to be used. And it would be nice to have a generic plotting instance too.

Move from corr2 to corr2 v3.0 aka TreeCorr

As many of you know, we rely right now on the program corr2 written by Mike Jarvis (@rmjarvis) to do all of our correlation functions. corr2 is a compiled program, and we have been writing catalogs to disk and calling the program via the subprocess module in order to use it.

Today, a new version (v3.0) was released, including a Python interface called treecorr. It's a GitHub repository: https://github.com/rmjarvis/TreeCorr. I propose that we move to this version of the code rather than the old compiled v2.x. It has a number of benefits:

It will use the catalogs we already have in memory.
- No need to write catalogs to disk and then read them back in.
- No further need for a large chunk of code I was never very comfortable with (the function that coerces all the catalogs to the same format and writes them out). As a side benefit, not needing this function renders bug #23 obsolete, and also makes issue #28 easier, among others.
- No need to make users writing their own data interfaces to figure out when it's easier to send a filename and when it's easier to send an array. We'll just always use arrays.
Native Python wrappers mean we can remove some of the code in corr2_utils.py and rely on the package's own built-in functions, so we won't be as vulnerable to minor changes in the package.
Installation is easier than corr2 installation (which was already very simple). You can pip install TreeCorr or clone/download the repo and use the provided installation script. CFITSIO is no longer required.

Mike was kind enough to let me investigate this version before it was released, so shortly I'll be pushing a branch that has some working code on it, although it's not flexible enough yet (has some hardcoded things just so I could check the script functionality).

I suppose I should back up: does anyone have any objections to this change? If so, please do let me know!

StatSysTests in the HSC module

I have a better idea for how to run the StatSysTests through the HSC pipeline, and this is a reminder to myself to do it!

File output: into directory structure

Hironao had an idea for where to put output files that will be harder to implement and certainly isn't critical right now, but could be interesting for the future.

His idea is to make an option to directly output the file into the data directory structure, e.g., if we're using an HSC catalog from rerun/miyatake/test/0001228/49/src.fits (where the numbers give visit and CCD number) then the plots related to that visit and CCD could go directly into the directory with the catalog.

Overall design plan

This thread is a place to debate some overall design decisions for Stile (or whatever we decide to call it, if we want to change the name).

As I've been thinking about this, I've been aiming to maximize the possible user base, which drives a couple of design decisions: 1, that the interface with the data should be as separate as possible from the test functions, and 2, that we want to include enough code that you can run all of the tests without knowing Python, via config files/command-line options. 2 is low on the list of priorities at the moment, but I thought it would be helpful to keep in mind as we're working out the structure.

Brief overview of the architecture, as I've conceived of it so far:

Tests. We can slice and dice the tests in several ways, but I think the best is by what they're doing internally. We have correlation-function type tests (eg shear around random points, autocorrelation of star shapes); statistical quantity tests for both scalar quantities (mean/median, variance) and combinations of scalar quantities (Pearson coefficients); histograms; whisker plots; etc. So I think it makes sense to have a generic Test object, then more specific CorrelationFunctionTest, WhiskerTest, etc objects with some of the helper functions & output functions for that kind of test, and then specific implementations for the various tests we decide to include. (I have a partially-written CorrelationFunctionTest, for example, and then a specific correlation test calls the corr2 wrapper from the parent class with the type of correlation function & the specific quantities we need.) The tests should know what kind of data they need to run. [There are at least three flavors: a raw data type (catalog, image); a time domain (single image, multiepoch
Data handling. We need a data handler class which can deliver the base unit of data that a lensing test needs, given the constraints of imposed binning functions. Things it should be able to do include: return the types of data it contains; list all the data sets that meet given parameters; read in those data sets, bin them given (a) binning function(s), and turn them into data types the rest of the program understands; give the location of the file on disk (if it exists as such and if no bins are defined) so that external programs like corr2 can ingest them without us having to read & rewrite an identical file. It also needs to be able to serve multiple data sets which should be analyzed together (a lens sample and a source sample; for multiepoch tests, all the individual exposures that went into one coadd; etc). I think the output files should also be handled by a data handler (with default implementation available) so users who want to insert the results in a framework they already have can do that at the same time. In the short term, we're essentially writing a less-flexible wrapper for the HSC butler.
Binning. I think we should include a binning structure within the code natively, as systematics can show up in some regimes of data and not others and I don't think we want to define whole new data handler classes just to limit which data is actually produced. This can also simplify some of our own coding--eg you might want to do star/galaxy separation tests as a function of Galactic latitude, and it would be nice to just reuse the same code & add an object that will do latitude bins.
Corr2. A lot of these tests involve correlation functions, and Mike Jarvis has a nice, fast, straightforward-for-the-user correlation function code, corr2 (http://code.google.com/p/mjarvis/), so we should make use of that. A lot of the coding I've already done involves the corr2 interface, in fact, though it's not very well tested right now.

Things I have put less thought into:

Exactly what kind of outputs we want to generate for each of the tests, and how we go about doing it. The displayQA framework produces some really nice outputs, but it requires running a web server; I know that the machine where I plan to run the HSC pipeline won't let me do that due to security concerns. I also worry a little about defining a hard pass/fail metric for a lot of these tests. On the other hand, as I said, very nice outputs, and less code to write...

Code that exists:

I just put up some simple file I/O (copied from another project) and the corr2 interface, which is mostly untested but has a lot of what we need I think. Over the next week or so I'll clean up and post:

a file tests.py including some abstract interfaces & documentation for the systematics tests, plus a mostly-complete implementation of the CorrelationFunctionTest template & a sketchy implementation of a test that uses it (signals around objects).
some pieces for a the binning structure--I already have similar code for this, so I think I can make it line up with what we need on a short time scale.
some first ideas for a DataHandler class (documentation of what it should do, without implementation)

Next comment: some notes on coding conventions.

Residual size vs magnitude plots

It would be useful to add a functionality to generate a scatter plot that shows residual (star - PSF) size vs magnitude to investigate the brighter-fatter effect. I'll write a code based on #43 since I am now closely working with the latest reduction.

HSC pipeline single-visit runs with multiple associated coadds

@HironaoMiyatake noticed yesterday that if we run StileVisit.py on a single visit which has multiple associated coadds, we end up running the data multiple times, as if the same CCD with a different coadd calibration was a separate and independent CCD. (As a reminder, when possible we use the broader-level coadd calibration instead of the single-pass calibration, which is why we're touching the coadds at all here.)

This is obviously not desired behavior; the question is what to do about it?

Options:

Allow people to select a specific coadd they'd like to use; if they don't, have some algorithm to pick which coadd will be used.
Separate by coadd as well as by visit, make the filenames represent this, and run the data once per associated coadd.
Something else?

Happy to take thoughts even from people who haven't worked with this aspect of the code: it's sort of a philosophical choice (do we want to test all the possible calibrations? If we don't, is it misleading to merely select one?).

Documentation

We agreed in issue #1 to use sphinx for documentation; this issue is to remind us to set that up (at some point in the future when we have more functionality, probably).

Outputs for displayQA

As we have discussed before, we decided to use displayQA for showing a bunch of systematics plots. I am thinking about what is needed for this. Here is the list. Currently I restrict myself to tests for each field (not for coadd).

A Visit-level plot for a summary plot (a left large figure on the displayQA webpage): This can be obtained by running StileVisit.py.
CCD-level plots: This can be obtained by running StileCCD.py. However, we have to run Stile for each CCD which duplicates the time for reading data given that we run StileVisit.py as well. This can be avoided by adding an option to run CCD-level tests as well to StileVisit.py.
Statistics for each CCD: We want mean/median and scatter for making table under the summary plot and the bad CCD indicator on the right of the displayQA webpage. StatSysTest could do that, but we have to add the functionality to write down the results to file.

I think 3 is urgent for the displayQA work. For 2, it is not obvious. This feature to StileVisit.py can be duplicate as we already have StileCCD.py. Anyway, please let me know what you think.

Other shape measurements in HSC module

At the moment, the HSC pipeline only has shape.sdss and shape.hsm. This is a placeholder issue to remind us to make the shape implementation a bit more abstract so we can use other shape measurements when they exist; at the moment it is low-priority, since other shape measurements aren't yet implemented...

Installation script

Since we're trying to recruit more people into using Stile, it would be nice to have an installation script.

I'm going to do this today.

Copyright/license notices

We should probably put copyright/license notices in the code files. Anyone know of a good template?

do we have what we need for this calculation?

Hi @HironaoMiyatake and @msimet -

While I've been busy with teaching I've lost track of what functionality is / is not present.

For the systematics requirements calculation, I would like to calculate the autocorrelation function of star shapes as a function of separation.

I would like this for single exposures (NOT coadd), using the entire FOV. Ideally, it would be nice to see this for a few different pointings, which could give us a bunch of measurements (perhaps a few exposures per pointing?).

Do we have what we need for this? Ibelieve we had defined a fiducial set of exposures for PSF tests, and I would like to use that set of exposures even though this isn't a PSF test (since it's just the star shapes, not the model residuals).

msimet / stile Goto Github PK

stile's Introduction

Stile

Installation instructions

Dependencies

Documentation

Current functionality

Wishlist

stile's People

Contributors

Stargazers

Watchers

Forkers

stile's Issues

Recommend Projects

Recommend Topics

Recommend Org