msimet / stile Goto Github PK
View Code? Open in Web Editor NEWStile: the Systematics Tests In Lensing pipeline
License: BSD 3-Clause "New" or "Revised" License
Stile: the Systematics Tests In Lensing pipeline
License: BSD 3-Clause "New" or "Revised" License
This is an issue for collecting priorities for Stile work in January.
@msimet , you mentioned some priorities to me, verbally. @HironaoMiyatake , do you have any thoughts? Are there any important things for which we don't have a person to work on it?
I was just noticing as I looked at the TreeCorr Readme file that it uses a package called pandas instead of numpy for fast text file i/o:
This package significantly speeds up the reading of ASCII input catalogs over the numpy functions loadtxt or genfromtxt.
I see that we use genfromtxt. Do you have any sense for how much the file I/O is hitting us for typical uses of Stile? If it's a problem we might consider adding this dependency.
As we have discussed before, we decided to use displayQA for showing a bunch of systematics plots. I am thinking about what is needed for this. Here is the list. Currently I restrict myself to tests for each field (not for coadd).
StileVisit.py
.StileCCD.py
. However, we have to run Stile for each CCD which duplicates the time for reading data given that we run StileVisit.py
as well. This can be avoided by adding an option to run CCD-level tests as well to StileVisit.py
.StatSysTest
could do that, but we have to add the functionality to write down the results to file.I think 3 is urgent for the displayQA work. For 2, it is not obvious. This feature to StileVisit.py
can be duplicate as we already have StileCCD.py
. Anyway, please let me know what you think.
Related to a bug @HironaoMiyatake found with our PSF flags w/r/t some changes in the HSC pipeline, I thought I'd open this issue for moving some more parameters to config objects for the HSC/LSST module. The particular problem was caused by some hard-coded PSF flags:
extra_mask = numpy.array([src.get('flux.psf.flags')==0 for src in data])
which also checked another flag originally; the second flag was removed with some pipeline updates, so running this code failed on the new rerun.
This one's easy enough to move to the config object, but as long as we're doing it, I thought we should check if there's anything else hard-coded that folks would like to have configurable instead.
This isn't high-priority, so feel free to add things as they occur to you...
We need to think about how results will be shown once the tests are run. I think there was consensus that we should output a data file containing the data shown in the plot for people who wish to use other plotting programs, but it would be nice to generate plots automatically (or on demand) as well.
Ideas so far:
Or, obviously, both of those. But in either case we'll need to write code that interfaces with those packages.
@HironaoMiyatake requested a way to analyze multiple visits or tracts at once (currently we only do one at a time).
Write some code to generate PSF residuals and add some correlation function tests that use them (called the Rowe statistics by DES people). This requires a little care since we may have files already written to disk.
We cannot use HSC data since it is not public. Instead, we want to add HSC pipeline products of some public S-Cam data.
@TallJimbo brought up the fact that using "tests" to refer to our systematics tests was confusing when we also have "unit tests" and other software quality controls under the same name. This issue is therefore: brainstorm new names!
My first suggestion: "Diagnostic."
First of all, a note that I just pushed commit c2131a2 which is a bugfix for the case where we're trying to mask on detect.is-primary
. (The previous implementation didn't work if the catalog had already been masked by other flags.)
Second, right now, we mask on detect.is-primary
for both the patch and tract-level tests. My question is whether we want to do that for the patch-level test. Tract we need it to avoid duplication; on the patch level, do we want to be checking the performance for things that were analyzed in the patch, but won't be used for our final analysis?
Since we're trying to recruit more people into using Stile, it would be nice to have an installation script.
I'm going to do this today.
We agreed in issue #1 to use sphinx for documentation; this issue is to remind us to set that up (at some point in the future when we have more functionality, probably).
Expand the current CCD/visit HSC interface to include the coadd data divisions, tract and patch.
Hi @HironaoMiyatake and @msimet -
While I've been busy with teaching I've lost track of what functionality is / is not present.
For the systematics requirements calculation, I would like to calculate the autocorrelation function of star shapes as a function of separation.
I would like this for single exposures (NOT coadd), using the entire FOV. Ideally, it would be nice to see this for a few different pointings, which could give us a bunch of measurements (perhaps a few exposures per pointing?).
Do we have what we need for this? Ibelieve we had defined a fiducial set of exposures for PSF tests, and I would like to use that set of exposures even though this isn't a PSF test (since it's just the star shapes, not the model residuals).
We just had a discussion about which flags we should actually be using in the HSC module--this is a reminder to make these updates.
One annoying thing in the code currently is that I didn't keep strict terminology agreement; this issue is to remind me to fix:
At the moment, the HSC pipeline only has shape.sdss
and shape.hsm
. This is a placeholder issue to remind us to make the shape implementation a bit more abstract so we can use other shape measurements when they exist; at the moment it is low-priority, since other shape measurements aren't yet implemented...
Hi folks,
I just realized that we don't have any unit tests for the ScatterPlot
or WhiskerPlot
sys tests. We clearly should have some, but the question is, since the main outputs are graphics...how do we test them?
We discussed this a bit back in the early planning stages of Stile, and I think the best idea I heard was to have a 'canonical' image stored in the repository, and then generate a 'test' image that reproduces that saved image in one half and plots the to-be-tested graphic in the other half. Users would then have to look at the generated images and verify that they were qualitatively the same. I can look into code that does this, unless somebody has another suggestion? (Another idea might be to dig into exactly what data is stored by matplotlib and how, but that may be nitty gritty stuff that wouldn't be too enlightening--hard to know unless we try.)
There are probably also numeric routines in those specific SysTests that should be unit tested as well, but I admit I haven't looked too closely for what they should be.
Hironao would like to change the file output so that the filename automatically includes visit/CCD or tract/patch or some other unique-ish identifier, so that plots don't get overwritten when you make a new one for a new CCD. I believe he said this should be a simple change that he can do. Is that right, @HironaoMiyatake ?
It would be useful to add a functionality to generate a scatter plot that shows residual (star - PSF) size vs magnitude to investigate the brighter-fatter effect. I'll write a code based on #43 since I am now closely working with the latest reduction.
As discussed on pull requests #29 and #30 , there is a stylistic issue that we should decide on sooner rather than later:
When making test classes for the kinds of tests that might be done with a bunch of different quantities, do we
(a) Make separate tests for each of the different quantities (as for the correlation function tests, whisker plots, and scatter plots), or
(b) Make a single test that has keyword arguments to decide which quantity is being tested.
My gut reaction to these PRs was to prefer something like (b). I don't like having 10 (or whatever) different class names to juggle. A single class seems cleaner to me, and would enable code like:
for plot_type in ['value','residual']:
for quantity in ['g1', 'g2', 'sigma']:
WhateverWeCallThisThing(quantity=quantity, plot_type=plot_type, ...)
instead of making 6 separate calls to classes with different names.
However:
Thoughts? @TallJimbo , I'm not sure if you saw the discussion on the PR, so I wanted to bring this to your attention too...
@HironaoMiyatake noticed yesterday that if we run StileVisit.py
on a single visit which has multiple associated coadds, we end up running the data multiple times, as if the same CCD with a different coadd calibration was a separate and independent CCD. (As a reminder, when possible we use the broader-level coadd calibration instead of the single-pass calibration, which is why we're touching the coadds at all here.)
This is obviously not desired behavior; the question is what to do about it?
Options:
Happy to take thoughts even from people who haven't worked with this aspect of the code: it's sort of a philosophical choice (do we want to test all the possible calibrations? If we don't, is it misleading to merely select one?).
On one of the pull requests (#10) the decision was made to change the test/
directory to examples/
, to avoid confusion with the unit test directory tests/
. But the test/
directory hasn't gone away, it's still under version control. I'd like to (1) move its .gitignore
into examples/
and (2) delete it (test/
). Any objections? Melanie, are you okay with me doing this on master since it's such a basic cleanup, or do you want it on a branch with PR?
Just noticed in PR #62 that @HironaoMiyatake has a different way of making formatted arrays (well, recarrays) than the way we currently do it: the function numpy.rec.fromarrays
. I see here that the major difference between formatted numpy arrays and recarrays is that access in recarrays is a little slower, so if you loop a lot it can be a time sink.
This is a reminder to myself to investigate (once the higher-priority items are done) exactly how much it slows us down to switch to recarrays, because I'd rather use numpy code if we can, but since it would be a change in how our data is stored, it could potentially slow us down quite a bit.
At the moment, when you run a correlation function test, you must specify the units of the ra
and dec
fields, plus the number of bins, min/max separation, and units for the output correlation function.
I'm wondering if it's worth trying to work out some simple defaults for these values. Getting a max ra range and dec range (or x and y range) could give us a max separation, we could work out a decent guess of a min separation based on the number of objects in the given rectangle, set a default number of bins (maybe based on the number of points available), and as long as the units are all the same it doesn't really matter what we call them. It likely wouldn't be perfect, especially for non-rectangular areas, but might be easier than figuring out these values for every correlation function you'd like to run. (They could all still be specified, of course--these would just be used for unspecified quantities.)
The counterargument, which I think is a big one, is that this is opaque to the end user and it might not be obvious if we picked something really wrong from the endpoints. Plus, the max separation for a xi-type correlation function is probably bigger than the desired max separation for e.g. tangential shear around bright stars.
Thoughts?
@HironaoMiyatake pointed out that if one of our SysTests fails, the whole program fails out; it would be nice to change this so it completes all the tests it can with the available data. (Certainly for the main drivers--not sure how this would work for user scripts.)
@HironaoMiyatake , I'm opening this based on our e-mailed discussion. It was not clear to me if you planned on doing this off on some branch, or if you weren't sure how to do it, but in any case I thought it worth putting the idea on here.
For some systematics tests, we might want to be able to get multiple instances of the same object (from different exposures). For example, to estimate the effective rho_1
after combining exposures, if we don't quite trust the coadd PSF yet then using multiple instances of the same object might be a fair way to do it.
We should switch to using shape.hsm
, the PSF-corrected shape measurements, for galaxies in the HSC module.
This is a little tricky, because right now we assume anything that wants a g1
for example can use the same routine, but we don't want to use shape.hsm
for stars. However, should be doable.
This is a bug I just discovered with the way MakeCorr2FileKwargs
works. At the moment, if it has to write some files to disk, it only writes the fields that are in all the data sets it's writing--but that's a silly thing to do when performing, eg, a galaxy-galaxy lensing computation, where we only have positions and not shears for one of the two data sets. So right now any point-shear correlation functions fail, because it doesn't write out the shears for the second data set. Whoops!
I'll fix this on this branch.
We should probably put copyright/license notices in the code files. Anyone know of a good template?
We need some generic code to generate whisker plots. (We may want to hold off on this until we make a decision on #3, since this--more than other systematics tests--is a question of presentation.)
We have to speed up the HSC interface, since it takes ~1 hour to process the entire FOV!
@msimet summarized what we could do as follows, roughly in order of priority.
I'll start looking into 1.
This thread is a place to debate some overall design decisions for Stile (or whatever we decide to call it, if we want to change the name).
As I've been thinking about this, I've been aiming to maximize the possible user base, which drives a couple of design decisions: 1, that the interface with the data should be as separate as possible from the test functions, and 2, that we want to include enough code that you can run all of the tests without knowing Python, via config files/command-line options. 2 is low on the list of priorities at the moment, but I thought it would be helpful to keep in mind as we're working out the structure.
Brief overview of the architecture, as I've conceived of it so far:
Things I have put less thought into:
Exactly what kind of outputs we want to generate for each of the tests, and how we go about doing it. The displayQA framework produces some really nice outputs, but it requires running a web server; I know that the machine where I plan to run the HSC pipeline won't let me do that due to security concerns. I also worry a little about defining a hard pass/fail metric for a lot of these tests. On the other hand, as I said, very nice outputs, and less code to write...
Code that exists:
I just put up some simple file I/O (copied from another project) and the corr2 interface, which is mostly untested but has a lot of what we need I think. Over the next week or so I'll clean up and post:
Next comment: some notes on coding conventions.
Generic code for computing a bunch of statistical quantities: median, mean, standard deviation, etc.
One of this big issues: a data handler that interfaces with the HSC/LSST pipeline.
I'm happy to volunteer for this one.
I notice that after the PR that merged branch #2 into master, another commit seems to have been made on that branch, so it is listed as being ahead of master on https://github.com/msimet/Stile/branches. Was that supposed to happen or is something weird going on?
For some of the statistics we will calculate, it would be useful to be able to plot a tabulated requirements curve on top of the plot.
This could be kind of annoying to figure out, but Melanie has some idea how to do it.
We should work out how to change which tests are run via the command line or config. Right now we have to hard-code that.
Write code to generate randoms corresponding to the HSC data products.
I think the right thing to do here is to:
The last thing to be concerned about is rejection masks. Single pixels we don't worry about, but overlap regions for example we should. I'd rather not read in the masks if we don't have to but I'm not sure there's a better way to do it.
We should expand the simple driver prototype used in examples/example_run.py
into something that will run Stile flexibly via command-line or config interfaces.
I have some ideas for this already, so I'll plan on doing it after finishing the HSC data handler.
We need to add a functionality to make scatter plots. I plan to add a class for making general scatter plots, and then play with it by making a star vs PSF and residual(star-PSF) vs PSF plot.
We have some code set up to bin our data in the main Stile code, but don't have any way to utilize that in the HSC module. We should think about how to include it.
As many of you know, we rely right now on the program corr2
written by Mike Jarvis (@rmjarvis) to do all of our correlation functions. corr2
is a compiled program, and we have been writing catalogs to disk and calling the program via the subprocess
module in order to use it.
Today, a new version (v3.0) was released, including a Python interface called treecorr
. It's a GitHub repository: https://github.com/rmjarvis/TreeCorr. I propose that we move to this version of the code rather than the old compiled v2.x. It has a number of benefits:
corr2_utils.py
and rely on the package's own built-in functions, so we won't be as vulnerable to minor changes in the package.corr2
installation (which was already very simple). You can pip install TreeCorr
or clone/download the repo and use the provided installation script. CFITSIO is no longer required.Mike was kind enough to let me investigate this version before it was released, so shortly I'll be pushing a branch that has some working code on it, although it's not flexible enough yet (has some hardcoded things just so I could check the script functionality).
I suppose I should back up: does anyone have any objections to this change? If so, please do let me know!
Hironao had an idea for where to put output files that will be harder to implement and certainly isn't critical right now, but could be interesting for the future.
His idea is to make an option to directly output the file into the data directory structure, e.g., if we're using an HSC catalog from rerun/miyatake/test/0001228/49/src.fits
(where the numbers give visit and CCD number) then the plots related to that visit and CCD could go directly into the directory with the catalog.
I threw a bunch of stuff up on master so we could branch from it, but I'm sure it has errors in it currently.
This issue is for me to write unit tests for the files that currently do not have them (stile_utils.py and tests.py--data_handler.py doesn't currently have anything non-abstract in it). It's also for other people to point out errors with the code as it stands and for me (or volunteers) to fix them.
I have a better idea for how to run the StatSysTests through the HSC pipeline, and this is a reminder to myself to do it!
Do we want to alter the way Matplotlib plots look by default?
There's a set of suggestions here, some of which I use. The ones I'm most likely to use in my own work are A) always using Computer Modern Roman (the LaTeX font) for labels and captions, even when not in TeX mode, B) tight_layout()
(which is already used in at least some of our plotting routines), and C) setting the plot aspect ratio to be the golden ratio (which admittedly is a bit silly).
I can see this being annoying for those who have their own favored Matplotlib settings, however, in which case we'll be overriding them.
Some systematics tests involve histograms; this branch is to write the basic code that all specific histogram objects will inherit from.
We can probably use numpy.histogram to do this; I think the more important question is how we set defaults like number of bins to be used. And it would be nice to have a generic plotting instance too.
Eventually we will want to be able to limit the HSC-pipeline galaxies we use in Stile by their resolution, S/N, etc; this should also be configurable.
When looking into #39, we found that computeShapes
is one of the bottleneck for speeding up, and the number of this function calls can be reduced. For example, it seems that if we run star whisker and then PSF whisker, computeShapes are called separately for each test. If we collect all shapes needed for tests in a run and then we get these shapes by a single call of computeShapes
, it can reduce the time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.