exosamsi / detrending Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 11.0 22.78 MB

Light curves and detrending at #exosamsi

Python 100.00%

detrending's Introduction

Light curves & detrending workgroup at #exosamsi

Information

Room 219

Office Hours 9am

Members

How to join the exosami organization

One option is: if you star this repository, I'll add you to the organization. Alternatively, if you want to have more fun with git+GitHub:

Join GitHub
Find the exosamsi/detrending repo
Click the "fork" button:

Figure out how to add your name to the members list in this file 😄.
Then submit a pull request:

How to contribute to the repository

Once you're a member of the organization, you should be able to push to the main repository. To do this, you can clone the main repository by running:

git clone https://github.com/exosamsi/detrending.git

or, if you already have a checked out copy, you can point it at the main repository by running:

git remote set-url origin https://github.com/exosamsi/detrending.git

In this step, origin is the name of the remote. origin is what git called the repository that you originally cloned.

detrending's People

Contributors

Stargazers

Watchers

Forkers

hpparvi dawsonri saturnaxis jessielchristiansen ruthangus rwolpert benmontet mrtommyb pombredanne tbs1980 migueldvb

detrending's Issues

Injected data and earth pointing

Tom:
Hi team,
I’ve been plating with injecting and recovering 1-year orbital period planets. I started using a dumb median filter but got stuck with detections always occurring after earth points through faulty filtering at data gaps.
I then tried using Gal’s detrender and am finding it causes massive spikes after every earth point. What am I doing wrong? I’ve attached a plot showing the a dumb median filter in red and Gal’s one in blue. I tried playing with the window size with little success.

By the way, the data files contain a column called SAP_QUALITY. The values in this column mean things, ie. there is a numerical value for ‘earth-point just happened’. I’ve started trying just throwing away the data for two days after every earth-point.

Cheers,
Tom

Bekki:
We're working on various approaches to avoid detections at data gaps. For example, what I'm doing now is:

Identify change points based on time gaps and/or big flux jumps
Detrend each segment between change points separately using running median filter with 3-day width
Delete data 3 days before and after each gap
Use wavelets to interpolate between gaps
And I think others are working on different things. But basically for now most of us are sacrificing data near gaps.

Ruth is working with Gal's median filter too and I think she's implementing something to try to address this issue.

Tom:
Thanks for the update Bekki. I too am finding that sacrificing data near gaps is a necessary evil.
Why do you remove data before the gaps? These data shouldn’t be affected by the nasty ‘thermal ramps’, the spacecraft doesn’t know its about to do an Earth-point.
(I guess we should probably be having this conversation on Gibhub, oops)

Hope all is well there with you guys! It’s rainy here in SF!! in June!!

Visualize wavelet ideas

I think that @dawsonri, @RuthAngus and @davidwhogg are on this.

Compile a list of existing detrending algorithms

We'll add them to the list and then make sure that we can run them.

At least @mrtommyb, @benmontet, @dawsonri and @aprsa have their own versions. Others?

Papers we may be working on

Title: Just use the PDC SAP
Abstract: Who needs a fancy detrending algorithm?

Title: A wavelet-based transit likelihood? No thanks.
Abstract: Chi-squared for good enough for our grand-advisors.

Title: The CMB or the Kepler CCD array?
Abstract: We know the temperature of one these pretty well.

Title: A Gaussian approach to transit detection
Abstract: We drew a box around the transit light curves and assumed that the flux through the surface was proportional to the amount of planets inside.

Title: God wouldn't make other planets like the Earth.
Abstract: We set the prior to zero and didn't find any.

Build issues

From @benmontet

On EPD numpy 1.6.something, the build fails with the error

/usr/include/string.h:548:5: error: unknown type name ‘__locale_t’

/usr/include/string.h:552:18: error: unknown type name ‘__locale_t’

untrendy/_untrendy.c: In function ‘untrendy_find_discontinuities’:

untrendy/_untrendy.c:27:46: error: ‘NPY_ARRAY_IN_ARRAY’ undeclared (first use in this function)

Write document describing wavelet-based posterior PDF

Math-y-ish

A list of Earth-like candidates? You wish

I wouldn't dignify these with the term "candidates" but in the pound stars folder, I've put a zipped "gallery" of png images of the two most promising signals for each "pound star" (i.e. 72 bright, low-noise stars with known small, low-period planets). You'll see some examples of possible signals, and even more examples of detrending fails and transit-duration-timescale stellar variability (e.g. an especially deep wiggle among tons of similar wiggles).

A couple examples:

The colors represent a rainbow of time, from early (purple) to late (red). If the transit seems to be mostly one color (i.e. only occurring at one time) or different depths at different times/colors, be very suspicious.

Supposedly my program automatically masked out the signals of the known planets in the system but I haven't checked if that succeeded for every star (it worked for the several that I checked). Also, it's very possible that my detrending introduced transit signals or distorted true transit signals (see my other post, coming soon, about the recovery rate of Dan's injected planets; about 24% of planets are being detected and all are above 120 ppm transit depth). Speaking of, Ruth and Billy have been investigating how detrending distorts transits; you'll see that in our slides when we post them (and some interesting points and results by Hannu)

If any of these not-really-candidates catches your eye, let me know if you'd like me to take a closer look. Let me know if you'd like any more information or even if you'd like to have a "how the other half lives" type experience by looking at some of my IDL code.

The exact process through which they were obtained: detrend PDC using running median filter on each segment (between gaps); trim away edges of each segment; use wavelets to interpolate between gaps; at each datapoint, remove a 13 hour/80 ppm transit and record delta likelihood if delta likelihood is greater than zero [Note: this is equivalent to imposing a super-strict prior; I'm going to try imposing a weaker prior soon]; fold and sum likelihoods to identify best period. As you'll see in the other post, at the moment the chi^2 likelihood is working better than (my implementation of) the wavelet likelihood for this search process (i.e. performing better in recovering Dan's planets), but I'm hopeful that can be improved on.

All the best,

Bekki

Tutorial for working with sandbox data

Hey @saturnaxis, can you add a little tutorial for how to interact with the Sandbox data? You've already pretty much written it on the SAMSI group.

How about a README.md file in the sandbox directory of this repository?

Implement wavelet stuff

Are you crazy enough?

Compile Sandbox 1.0 dataset

Pretty much what it says. I think I'm on this but other people are welcome to comment/contribute.

Make the Savitzy Golay be a import-able module

I want to import the Savitzy-Golay filtering as a module and then call it like

import SGfilt #obviously pick whatever name you choose
SGfilt.do_filtering(time,flux,**kwargs)

where the keywords take sensible defaults.

Bekki, WTF are you doing?

@dawsonri how are you so awesome at finding candidates? What method are you using and what are your tricks? If there is a paper, point us to it. If there isn't, can we help you write it?

Discuss homework papers

At lunch, let's chat about the papers from last night for people who are less familiar with the terminology and standard practices.

@jessielchristiansen, @mrtommyb and @dawsonri seem to know All The Things.

Generate a list of target stars for Sandbox 1.0 dataset

I think we decided on a few spatially localized groups that pass through quiet and noisy CCDs. Take a look at the notes for other discussion and add anything here or there that might be relevant.

I think that @mrtommyb and @jessielchristiansen are looking into this and then I'll build a data product from this list.

Proposal for common data format

We need a way of comparing the output of all of the detrending algorithms. This will (probably) involve something including but not limited to:

running a standardized search algorithm on all the different outputs
visualizing the results of the different methods in the same way/simultaneously
other things?

Anything that we do will benefit a common output format for the codes (for obvious reasons).

I see 2 main options:

ASCII tables (gasp!) with specified columns (kbjd, detrended_flux, detrended_flux_uncert, ...)
FITS tables with the same format as the original Kepler data products (including the relevant metadata) with added columns with the same information as above

The first option is far easier to implement in any programming language (lowering the barrier to entry) so I'm probably inclined to go with that but the second one seems more useful (and self-contained) for the search phase depending on what we decide to do.

Thoughts?