Git Product home page Git Product logo

eqcorrscan / eqcorrscan Goto Github PK

View Code? Open in Web Editor NEW
164.0 19.0 86.0 290.12 MB

Earthquake detection and analysis in Python.

Home Page: https://eqcorrscan.readthedocs.io/en/latest/

License: Other

Python 58.29% Makefile 0.35% Batchfile 0.52% PowerShell 0.11% Shell 0.20% C 3.08% Jupyter Notebook 37.46%
cross-correlation seismology earthquakes subspace python template-matching repeating-earthquakes detection earthquake-detection matched-filtering

eqcorrscan's Introduction

EQcorrscan

A python package for the detection and analysis of repeating and near-repeating earthquakes.

Citation:

We have a manuscript on the development of EQcorrscan, if you make use of EQcorrscan please cite the following paper:

Chamberlain, C. J., Hopp, C. J., Boese, C. M., Warren-Smith, E., Chambers, D., Chu, S. X., Michailos, K., Townend, J., EQcorrscan: Repeating and near-repeating earthquake detection and analysis in Python. Seismological Research Letters 2017

If you want to you should also cite the version number: DOI

Installation

The easiest way to install EQcorrscan is through anaconda: Anaconda-Server Badge

Instructions for installing EQcorrscan and the required dependency, fftw are linked from the docs

Updates

If you want to be kept informed about releases, bug-tracking and enhancements without having to keep looking on github, subscribe to our google group.

Documentation

The full documentation for this package can be found here: Docs. Any errors including typos and just missing bits can either be fixed by you, or flagged in the issues tab here. We host our docs on readthedocs, which uses sphinx to scrape the docstrings in the codes, so it is simple to match the docs to the codes and change the docstrings.

Contributing

Please fork this project and work on it there then create a pull request to merge back to this main repository. Please create a branch from develop.

When you make changes please run the tests in the test directory to ensure everything merges with minimum effort. If there is not yet a test to cope with your changes then please write one.

Please document your functions following the other documentation within the functions, these doc-scripts will then be built into the main documentation using Sphinx.

Functionality

This package contains routines to enable the user to conduct matched-filter earthquake detections using obspy bindings when reading and writing seismic data, as well as subspace detection, brightness source-scanning, relative moment calculation using singular-value decomposition, and correlation pick-adjustment for similar events.

Also within this package are:

  • Clustering routines for seismic data;
  • Peak finding algorithm (basic, but appropriate for noisy data);
  • Automatic amplitude picker for local magnitude scale;
  • Obspy.core.event integration, which opens up lots of other functions (Seishub, hypoDDpy etc.);
  • Stacking routines including phase-weighted stacking based on Thurber at al. (2014);
  • Brightness based template creation based on the work of Frank et al. (2014);
  • Singular Value Decomposition derived magnitude calculations based on Rubinstein & Ellsworth (2010).

The code-base has grown to be quite large - it is probably worth having a look at the docs to check what functions we have. We are writing a series of tutorials included on the EQcorrscan API to highlight key functions.

A note on correlation precision EQcorrscan computes normalised cross-correlations in the frequency-domain using the fftw (Fastest Fourier Transform in the West). Internally the C routines enforce double-precision (64-Bit floating point numbers) for all aspects of the cross-correlations (despite requiring 32-Bit float input and output). Results in testing are accurate to within ~0.0001 of time-domain cross-correlation results.

Test status

Note that tests for travis and appveyor are run daily on master as cron jobs, and may reflect time-out issues.

Service tests Badge
CI checks test
Code coverage codecov

Licence

This package is written and maintained by the EQcorrscan developers, and is distributed under the LGPL GNU License, Copyright EQcorrscan developers 2018.

Funding

RCET

Continued development of the EQcorrscan package is directly supported by the RCET, Rapid Characterisation of Earthquakes and Tsunami programme funded by the New Zealand Ministry of Business, Innovation and Employment Endeavour fund.

Development is indirectly funded by grants from Toku Tū Ake: EQC and a Rutherford Discovery Fellowship.

eqcorrscan's People

Contributors

calum-chamberlain avatar chrisdjscott avatar cjhopp avatar d-chambers avatar dependabot[bot] avatar emilyws1 avatar flixha avatar gitter-badger avatar halauwet avatar ikahbasi avatar snyk-bot avatar stickler-ci avatar tjnewton avatar xiansch avatar yvonnefroehlich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eqcorrscan's Issues

Bug in spec_trace plot

utils.plotting.spec_trace does not plot interactively - this means that you can't do much with it, nor can the example be rendered by sphinx.

Party.decluster behaving as expected?

Hey @calum-chamberlain, correct me if I'm wrong, but I don't think decluster() is working as expected.

The docstring states that the function should take the highest single channel correlation value, but nowhere in the function does it reference Detection.no_chans in order to do the averaging. I think it's currently taking the detection with the highest raw cross-correlation sum.

If I haven't missed anything, would you agree with adding a flag which allows for clustering on either 'avg_single_chan' or 'cc_sum'?

Using eqcorrscan

Dear all

It is not really an issue with the code, but with me.

I would really like to try using eqcorrscan. I already tried some other codes writen in python and obspy moduls that does exactly the same thing, so it would be nice to compare the results. From reading about this project, I saw that this provides me with additional options, specialy in the Utils section.

Anyway, I am a beginner in python, so this is the main problem that I have.

I started with the tutorial on template creation.
If I understand _template_gen function correctly, there are few things that I need to prepare:

picks: this is a list of picks. How should this list look like? Maybe I missed it somewhere in manual. Propably something like STA, CHAN, PHASE, TIME ?

st: this are (lets say) mseed files of sta, chan of the day of the template. so, something like glob.glob('path/to/files/*')?

Thank you very much!
Blaz

Matched-filter should not enforce day long data

There is no good reason for this, but there are things that need to be changed to allow this:

  • Remove Error in match_filter.match_filter;
  • Change template generation functions to have a length option for how much data to process;
  • Change the documentation to show that, rather than enforcing day-length, templates should be cut from data of the same length as used in match_filter;
  • Generate tests comparing different detections for different lengths of data;
  • Get it out there 👍

This will facilitate:

  • Lower memory usage, allowing for more parallel processing, and for smaller system (e.g. single-board computers);
  • Near real-time applications where data chunks are small.

Bugs

This issue thread will be kept open to track bugs found before release 0.0.10. Please add bugs here, even if you have fixed them as they will not be included in the master until the next release.

event.comments in Template creation

@calum-chamberlain: In creating Template objects, if an associated event has no Comment objects in event.comments, then no new Comment is added. Comment(text='text="eqcorrscan_template_" + temp_name, creation_info=CreationInfo(agency='eqcorrscan', author=getpass.getuser()))) is necessary when reading the tarball back in, otherwise template.event is NoneType.

Is len(event.comments) > 0 a necessary condition?

if len(event.comments) > 0 and \
"eqcorrscan_template_" + temp_name not in \
[c.text for c in event.comments]:

Test style guide

I notice most (maybe all) of EQcorrsan's tests are written in the unit test style, with test classes inheriting from unittest.TestCase.

Is this required to use unittest style or is it alright to use pytest style tests, which include fixtures for dependency injections and normal asserts?

Personally I find the latter style to be much nicer to work with.

Logging

Logging is not handled well in EQcorrscan. Currently all the debug output is print statements - we should make use of the logging - the issue lies in multiprocessing, but we could try and make use of the multiprocessing-logging module.

Submodules to address are below. Other routines are usually run interactively and here prints are useful. Although the log output could be to screen...

  • match_filter;
  • bright_lights;
  • lag_calc;
  • subspace.

QuakeML integration

I'm going to go ahead and start working on QuakeML integration.

Read/write to and from Catalog classes should be straightforward via built in obspy methods, the more involved bit, I think, is creating new Event classes from new matched filter detections.

What's the word on the features/location branch?

detection_multiplot

I tried plotting the detections with detection_multiplot and saving in pdf format. There appears to be a row of 0's printed over the times on the x axis so that they are illegible (see attached) and I am not sure what could be causing that.
dete ction_2012-04-02T11:49:48.250000Z.pdf

Also, would it be possible to plot the templates and data with some vertical space between them so that the template doesn't block out the data waveform?
Thanks

Unexpected results from Party.__add__

A misplaced variable assignment is giving unexpected results when adding two Parties together here.

For example, let's take two Party objects which contain a mix of families: some with templates common to each other and others which are unique to a particular Party. If we add these Party objects, self.__add__ works well for all templates/families which are common between them, but will fail to add any unique templates/families after the first common template is encountered as the variable added has not been reset for the new family loop.

Release 0.2.0

This issue describes the steps needed before release 0.2.0.

Note that this release number was chosen because there are significant updates to the API - while the old API generally remains, there is an additional object-oriented layer added to match_filter.

  • Finish tests for #94 ;
  • Update and check docs;
  • Update and check website;
    - [ ] Get conda installs worked;
  • Finalise and merge #93 ;
  • Check and merge #106 ;
  • Check that installs work as documented;
    - [ ] Work out how to skip tests if download times out;
    - [ ] Work out how to ignore fail for lag-calc at 4.7902.

Once this is done the manuscript should get out there.

Tutorials

The following is a to-do list of tutorials to write:

  • template generation;
  • matched-filter detection and plotting results;
  • calculating optimum lag-times for relocation @cjhopp;
  • magnitude calculation by SVD;
  • magnitude calculation by auto amplitude picking (download resp info from GeoNet);
  • clustering events and stacking;

More...?

Feature addition

The following features should be added:

  • SNR frequency plotting, possibly similar to PPSD, e.g. ratio of possible template (signal) over noise in different frequency bands - either do in time domain with filtered data, or divide the ffts?
  • Template spatial resolution test;
  • Synthetic detection threshold testing;
  • autocorrelation template detections;
  • cross-station template detection.
  • Subspace automatic thresholding from user-defined false alarm probability
  • Add PCA relative moment calculation per Shelly et al., 2016
  • Break clustering.cluster() into two functions. One calculates distance matrix, the other does clustering.
  • Add more options for clustering
    • options for method for hierarchical
    • divisive clustering
    • k-means
    • whatever else could be useful!
  • Use logging for debug statements (user to supply logger, or create internally using debug= variable to set level); working on in #225
  • Subspace classes to mimic Tribe, Template, Party and Family; see #154
  • Subspace to make use of correlate functions;
  • lag_calc should have the ability to re-cut templates (e.g. you might use a 6s template with both P and S phases in for match-filtering, but you might then want to separate the P and S phases for relocation).
  • client_detect could run a background download thread to download the next set of data while detection happens (and process it)... This would potentially be tricky in terms of memory usage, so having an option to not do this would be good.

Comments from @nackerley:

  • Make it work better on cluster computers - use a database of some form for detections (pandas?), and possibly Dask for scheduling?
  • Allow KeyBoardInterupt for correlate functions/match-filter see #241
  • Propagate Timeouts from Client methods into match-filter.
  • Add ability to download from multiple clients in client_detect method.

Release v0.0-alpha.1 requirements

I hope to release an early version of the software in working form from the release_prep branch with the tag v0.0-alpha.1 To get the codes to a good enough state to be used by other people within the department, as is the intended audience for this release, I will need to:

  • Complete a usable tutorial.rst document and tutorial.py script designed to run the main match_filter.py routine;
  • Update documentation for all codes;
    • Put licence line in doc of all files
  • Arrive at a fully tested and working brightness function;
  • Provide a test dataset:
    • One day of day-long data
    • One s-file from which to generate a template

Last trace always has no data leading to error in norm determination in linstack = stacking.linstack(traces)

Here is the error message always for last trace:

linstack = stacking.linstack(traces)
  File "/usr/local/lib/python2.7/dist-packages/eqcorrscan/utils/stacking.py", line 63, in linstack
    tr.data = np.sum((norm, tr.data), axis=0)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 1844, in sum
    out=out, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 32, in _sum
    return umr_sum(a, axis, dtype, out, keepdims)
ValueError: operands could not be broadcast together with shapes (0,) (180000,) 

Here is my screen output with many print commands for more info:

1 Trace(s) in Stream:
AF.WHYM.03.SHZ | 2009-03-16T17:32:26.770000Z - 2009-03-16T17:32:26.770000Z | 200.0 Hz, 0 samples
0
matchtr []
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:70: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
norm []
norm_nantonum []
[] 180000

clustering.cluster() output

The variable 'groups' is not populated correctly at the end of clustering.cluster():

At line 181 - end of function: Inner loop not allowed to run through all indeces due to 'break'

The following works as ln 181 to EOFunction:

for group_id in group_ids:
    group=[]
    for ind in indeces:
        if ind[0] == group_id:
            group.append(stream_list[ind[1]])
    groups.append(group)
return groups

can't see plots after generating templates?

I would like to plot the cut templates after generating them, and I tried commenting out plt.close() in plotting.py's pretty_template_plot() afterward but nothing appears to show. Setting save=True also doesn't seem to actually save the plots. I am using Qt4Agg backend for matplotlib.

Thanks

OpenCv vs scipy.fftconvolve, is it really worth it?

Hello,

First of all, great work! I am very impressed with the project so far. The documentation is excellent, the source code looks clean, and the tests seem to be coming along well.

I was getting started with EQcorrscan on python3 and had a very hard time getting OpenCV compiled and set up. After about an hour I just gave up and switched to a python 2.7 conda env so I could use conda to install OpenCV as stated in the EQcorrscan documentation.

However, I was curious if OpenCV would really be much more efficient than the tools that ship with Anaconda. I compared the OpenCV backed function from EQCorrscan with a function based around scipy.signal.fftconvolve.

On my machine (Ubuntu 16.04, 16 core i7) I found the scipy version was about twice as fast as the OpenCV version when using the default stream as a template and and randomly generated search spaces ranging from 60 seconds to 24 hours (assuming 100Hz sampling rate). Here are the results. Normxcorr2 is the function from EQcorrscan, normxcorr1 is the scipy function, the left column is the duration of the search space.

search space (seconds)
| normxcorr1 | normxcorr2 | difference_%
| --- | ---| ---| ---|
60 | 0.000537 | 0.000769 | 43.140351
12394 | 0.101550 | 0.236562 | 132.951964
24728 | 0.235110 | 0.467714 | 98.934310
37062 | 0.349631 | 0.712785 | 103.867752
49397 | 0.481285 | 0.939461 | 95.198439
61731 | 0.587475 | 1.194135 | 103.265641
74065 | 0.772260 | 1.417412 | 83.540794
86400 | 0.888662 | 1.654045 | 86.127562

I, of course, didn't compile OpenCV on my machine, so perhaps it would be more efficient if I had. On the flip side, I didn't compile numpy or any of the other packages either, so I feel it might be a fair comparison.

Anyway, I have attached the notebook for timing the two functions. I would be curious to see if anyone else gets a similar results. Regardless, I would recommend including a cross correlation function that doesn't require OpenCV as an option to make getting started, especially on python 3, a bit easier.

I also have some ideas that should significantly speed up the subspace detection that I will share when I get a chance.

Keep up the good work.

profile_numpy_v_opencv.ipynb.zip

duplicate functions

Detections_to_catalog and get_catalog are the same source code

detection.write should also write out a catalog as well as a detection file, otherwise the catalog is lost

S-file writing bugs

  • Conversion of depths from km to m;
  • Force three letter agency ID;
  • Magnitude conversion id errors;

eqcorrscan.utils.clustering.group_delays

Hi.

I tried running group_delays

from obspy import read
import glob
import os
from eqcorrscan.utils.clustering import group_delays

path = '/home/blaz/Documents/eqcorrscan/Postojna/template'
stream_files = glob.glob(os.path.join(path, '*'))
stream_list = [(read(stream_file), i)
               for i, stream_file in enumerate(stream_files) ]

groups = group_delays(stream_list=stream_list)

the error I get is:

Working on waveform 0 0f 85
Traceback (most recent call last):
File "group_delays.py", line 17, in
groups = group_delays(stream_list=stream_list)
File "/home/blaz/anaconda2/lib/python2.7/site-packages/EQcorrscan-0.103rc0-py2.7egg/eqcorrscan/utils/clustering.py", line 227, in group_delays
starttimes.append(tr.stats.starttime)
AttributeError: 'Stream' object has no attribute 'stats'

So, basicly it treats my stream_list as a stream_list, not traces (as I understand from source), hence no attribute 'stats' is found.

i checked if there is something wrong with my templates so i run that:

from obspy import read
import glob
import os
from eqcorrscan.utils.clustering import group_delays

path = '/home/blaz/Documents/eqcorrscan/Postojna/template'
stream_files = glob.glob(os.path.join(path, '*'))
stream_list = [(read(stream_file), i)
               for i, stream_file in enumerate(stream_files) ]
for stream in stream_list:
  for tr in stream[0]:
    print tr.stats.starttime
#groups = group_delays(stream_list=stream_list)

it prints starttimes of traces without problems, so my templates are OK, i guess.

Thanks for the help,
Blaz

Memory allocation error when running large grids.

Traceback (most recent call last):
  File "./LFE_brightness_search.py", line 130, in <module>
    brightdef.coherance)
  File "/home/calumch/my_programs/Building/EQcorrscan/core/bright_lights.py", line 355, in brightness
    energy=np.array([np.array([np.nan]*len(stream[0]))]*len(nodes))
MemoryError

PyPA registry for version 0.0-alpha.2

In the interest of moving this away from being my own personal set of codes and into the realms of useful python packages this project should change the way it is set-up and integrate into PyPA. Instructions for packaging and distributing are here: https://packaging.python.org/en/latest/distributing.html

To this end I need to do the following:

  • Write a setup.py file;
  • Write a setup.cfg file;
  • Migrate my parameter file set-up and my scripts to a development branch and clean master;
    -Write tests for install and to check that things work;
  • Think about Wheels - I think this will be a platform wheel (doesn't support windows at the moment);
  • Think of a useful way to allow the user to access functions... Remove default parameter access from core and put them as variables?

Failing tests due to timeout

Annoying test fails due to Timeouts should be handled better - we can use pytest skip conditions to catch timeout errors, but I want all the tests to run on the CI for pull requests really.

It might be good to work out a way to include the skip for timeouts on all test runs apart from pull requests.

Tests for all utils and core functions

Need to implement tests for all core and utils functions to ensure they meet requirements and do as expected. The following codes need tests inside them:

  • utils:
    • utils.sfile_util;
    • utils.catalogue_to_dd;
    • utils.clustering;
    • utils.findpeaks;
    • utils.mag_calc;
    • utils.pre_processing;
    • utils.seismo_logs;
    • utils.stacking;
    • utils.synth_seis;
    • utils.sac_util
  • core:
    • core.bright_lights;
    • core.match_filter;
    • core.template_gen;
    • core.lag_calc.

Possible _group_detect issue when handling bad daylong data.

In the case where st[0] of daylong data starts the day before cough GeoNet cough, this line ensures that when a given trace isn't daylong, pre_processing.process will actually create a daylong trace of zeros for the day before (I think...)

I'm still not sure if that entirely sure if this is the case but I've been getting output like this:

/projects/nesi00228/EQcorrscan/eqcorrscan/utils/pre_processing.py:479: UserWarning: Time headers do not match expected date: 2012-02-08T00:00:00.000000Z str(tr.stats.starttime)) /projects/nesi00228/EQcorrscan/eqcorrscan/utils/pre_processing.py:479: UserWarning: Time headers do not match expected date: 2012-02-06T23:59:59.998394Z str(tr.stats.starttime)) /projects/nesi00228/EQcorrscan/eqcorrscan/utils/pre_processing.py:479: UserWarning: Time headers do not match expected date: 2012-02-08T00:00:00.000000Z str(tr.stats.starttime)) /projects/nesi00228/EQcorrscan/eqcorrscan/utils/pre_processing.py:479: UserWarning: Time headers do not match expected date: 2012-02-06T23:59:59.998394Z str(tr.stats.starttime)) /projects/nesi00228/EQcorrscan/eqcorrscan/utils/pre_processing.py:479: UserWarning: Time headers do not match expected date: 2012-02-08T00:00:00.000000Z str(tr.stats.starttime))

Where some traces actually start two days before the day we want. In this case, that's 2012-02-07. If there's another explanation, I'm all ears, but I'm going to try out some solutions tomorrow.

detections from multiple network stations

Hi!

I have some templates that are using stations that belong to different networks (network code is different - SL and MN). Templates consist of 9 SL traces and 3 MN. In the folder with daylong waveforms I have all 12 channels. When the match filter is runing, everything seems ok, but then the detections are only from the 9 SL channels and nothing from MN (which actually has the strongest signal from the earthquake).

all the files have the same way of naming, so STA.CHN.2006_doy_hh_min_ss

Any idea? Thanks!

Docs missing for subspace

The docs for subspace are missing. This might be due to an out-dated SciPy on readthedocs, which doesn't have next_fast_len. But, the match-filter docs build fine, which also use next_fast_len.

Needs to be debugged and patched to master and develop.

four letter station names

I'm having some trouble with four letter station names, and I'm not sure if it is caused by writing to mseed. When the templates are cut (template_gen) the output will say "Cutting N.IWEH.UU"
But performing the match_filter part will show
I have template info for these stations:
[u'N.IWE.NN', u'N.IWE.EE', u'N.IWE.UU']
I have daylong data for these stations:
[u'N.IWEH.UU', u'N.IWEH.EE', u'N.IWEH.NN']
so as you can see the last letter of the station has gotten cut. This leads to some problems with the checks in place to make sure that the stations correspond so the match_filter doesn't work properly. I've done a quick and dirty fix by simply appending +"H" to match_filter.match_filter
if debug >= 2:
print('Ensuring all template channels have matches in daylong data')
template_stachan = []
for template in templates:
for tr in template:
template_stachan += [(tr.stats.station+"H", tr.stats.channel)]
template_stachan = list(set(template_stachan))

but I'm not sure if this is causing problems any where else. When running without compile errors I run the template over the day long dataset from which it was cut and I get
********* DEBUG: N.IWEH.UU ccc MAX: 0.0
********* DEBUG: N.IWEH.UU ccc MAX: 0.0
********* DEBUG: N.IWEH.UU ccc MEAN: 0.0
********* DEBUG: N.IWEH.UU ccc MEAN: 0.0
********* DEBUG: N.IWEH.UU ccc MAX: 0.0
shape of ccc: (1, 1726401)
A single ccc is using: 3.452802MB
with zero detections, but I'm not sure if this is related to the naming issue or not.

fftw_normxcorr deadlocks in multiprocessing pool

fftw_normxcorr deadlocks when used with a multiprocessing pool, consider this example:

from multiprocessing import Pool, cpu_count

import numpy as np
import obspy

from eqcorrscan.utils.correlate import fftw_normxcorr

cores = cpu_count() if cpu_count() < 4 else 4

# load streams
trace_list = [obspy.read()[0] for _ in range(10)]

# run fftw_normxcorr in serial, this works fine
for trace in trace_list:
    fftw_normxcorr(trace_list[0].data.reshape(1, -1), trace.data, [0])

# run in parallel using multiprocessing pool, this deadlocks
pool = Pool(processes=cores)
args = ((trace_list[0].data.reshape(1, -1), trace, [0]) for trace in trace_list)
results = [pool.apply_async(fftw_normxcorr, args=arg) for arg in args]
pool.close()
# Extract the results when they are done
dist_list = [p.get() for p in results]
# Close and join all the processes back to the master process
pool.join()

I am using python 3.6.1 on Ubuntu 16.04

pre_processing.shortproc()

shortproc() only accepts traces with 3 letter channel names. Processing throws a 'string index out of range' error for channels which are already 2 characters long.

Issue arose when I was processing some templates I had already created using clustering.emiprical_SVD() which outputs traces with 2-character channel names (I think).

Line 94:

    tr.stats.channel=tr.stats.channel[0]+tr.stats.channel[2]

Brightness function - core/bright_lights.py

This function needs to be improved:

  1. In _resample_grid:
    * Remove similar moveouts based on the cumulative moveout difference between nodes
  2. When calling only give it horizontal channels, maybe put a help message in brightness to tell people to only use horizontal channels for LFEs?
  3. Normalize the data based on the rms of the data rather than the max of the data
  4. Detect based on the cumulative network response (see Frank et al. 2014, GJI)
    * Have two vectors of the composite network response, one of the actual composite network response and another with the node for that value.
  5. Remove false detection using the network coherance (Add function?)

shortproc not working with stations with different sampling rates

Hi!

Actually, there are 2 issues.
first one:
I tried to work on some data, that is not daylong using 3 sta with samp_rate at 200 and one with 100. If I use daylong process on the same data (obviously on the whole day set), this problem does not occure, but if I try to use the same stations with shortproc, I don't get past the resampeling stage of shortproc...

As soon as I remove the station with different sampling rate (100), shortproc will work, BUT only in the case if shortproc starttime is after the starttime of the data and endtime is before endtime of the data.

So, why shortproc does not work with stas with different sampling rates?
and
why it doesnt wotk if I put in starttime and endime of the st but it does work with some time after and before the start/endtime?

here is the script:

from eqcorrscan.core import match_filter
from eqcorrscan.utils import pre_processing
from eqcorrscan.utils.plotting import detection_multiplot
from obspy import read
import glob
import os
import time

startTime = time.time()

# Read in the templates
template_path = './templates'
template_files = glob.glob(os.path.join(template_path, '*'))

templates = []

for template_file in template_files:
    templates.append(read(template_file))
for template in templates:
    for tr in template:
        # Change channel names to two letters because EQcorrscan uses Seisan
        # style channel names, which are not standard, and should be changed!
        # Note that I might change EQcorrscan to use proper three letter
        # chanel names in a future release, but I will ensure it works with
        # two letter channel names too.
        tr.stats.channel = tr.stats.channel[0] + tr.stats.channel[-1]
        #pre_processing.shortproc(st=tr, lowcut=3.0, highcut=6.0, filt_order=3, samp_rate=20)

days =os.listdir('./24h/')


for day in sorted(days):
  # Create detections and detections/6s folder that are used for detection files
  if not os.path.isdir("detections/") == True:
    os.system("mkdir detections")
    os.system("mkdir detections/6s")

  print("Days with data: ", sorted(days))
  print("Number of days with data: ", len(days))
  print("Match filter started for day: ", day)

  # Read in and process the daylong data
  # There is something wrong with the way, Antelope is setting sampling rate to the miniseeds...
  # This is why you need to manualy set to 200Hz - miniseeds are at 199.99Hz.
  st = read('24h/'+day+'/*')
  print("Setting sampling rates of the traces")
  for tr in st:
    if not tr.stats.station == "VINO":
      if tr.stats.sampling_rate != 200.0:
        tr.stats.sampling_rate=200.0
    else:
      if tr.stats.sampling_rate != 100.0:
        tr.stats.sampling_rate = 100.0

  tstarts=[]
  tends = []

  for tr in st:
      tstarts.append(tr.stats.starttime)
      tends.append(tr.stats.endtime)

  tstart = max(tstarts)
  tend = min(tends)



  # st_cut = st.copy()
  # st_cut = st.trim(tstart+2, tend-2)
  # st_cut.plot()

  if len(st) == 0:
    pass
  else:


    # Use the same filtering and sampling parameters as your template!
    print("Preprocessing stream")
    st = pre_processing.shortproc(st, lowcut=3.0, highcut=6.0, filt_order=3,
                       samp_rate=20,
                       starttime=tstart+1, endtime=tend-1, debug=0)

And the data (all 4 stations, VINO is the one with 100 sampling rate)
https://drive.google.com/open?id=0BzMcKedS7yadQk13bWktUEJnZzA

Thank you very much!
Blaž

Prep for v 0.1.4 release.

A lot of work has been done to improve the stability of lag_calc for varying data quality, these changes should be released. Before the release, the following should be done:

  • flake8 checks of all code (include in CI);

  • Check docstrings for core modules (ordering and links, see #53):

    • core.subspace
    • core.template_gen
    • core.match_filter
    • core.bright_lights
    • core.lag_calc
  • Check docstrings for all utils modules:

    • utils.archive_read
    • utils.catalog_to_dd
    • utils.catalog_utils
    • utils.clustering
    • utils.despike
    • utils.findpeaks
    • utils.mag_calc
    • utils.parameters
    • utils.picker
    • utils.plotting
    • utils.pre_processing
    • utils.sac_util
    • utils.seismo_logs

    - [ ] utils.sfile_util NOT DOING, migrating to obspy

    • utils.stacking
    • utils.synth_seis
    • utils.trigger
  • Remove duplicate functions in match_filter (#51);

  • Check tutorial simple examples (#54);

  • Add additional tests for edge cases in lag-calc found in #51 (missing data, gaps, overlaps, unprocessed data, reducing cccsums);

  • Fix travis CI failing for python 3.4 (issue with matplotlib backend?);

  • Use module-level imports where possible;

  • Update 'what's new' section in docs;

  • Add doc page on how to run the tests;

  • Check docs github links.

readpicks() issue when origin and picks are on different days

I have an issue reading an sfile back in with readpicks(). the problem is that the origin time is at 23:59:54.9 and the picks are therefore the next day, and have their HHMM as 2400. readpicks() doesnt allow the hour to be larger than 23. here is an example of the sfile:

 2016  911 2359 54.9 L -37.345 178.756 25.0  TES  5 0.6                        1
 GAP=319        1.44     322.8   375.2547.3  0.1201E+06 -0.2039E+06 -0.1752E+06E
 ACTION:ARG 16-10-14 16:29 OP:EW-S STATUS:               ID:20160912000015     I
 DUMMY                                                                         6
 STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO AIN AR TRES W  DIS CAZ7
 MXZ  HZ  P       24 0  3.33                             101   -0.7810 46.7 238 
 WMGZ EZ  P       24 0  6.73                              82    0.5810 60.7 210 
 HAZ  HZ  P       24 0 11.81                              56    0.5510 97.4 242 
 CNGZ EZ  P       24 0 15.27                              56   -0.5910  135 201 
 MWZ  HZ  P       24 0 18.47                              56    0.2410  154 224

Could not load shared library

Running Ubuntu 16.04 I followed the instructions listed here, including creating a conda virtual env and installing fftw.

I then cloned the current repo and tried to run the test suite but I get many failing tests with the following errors:

'Could not load shared library "libutils.cpython-36m-x86_64-linux-gnu.so".\n\n /media/data/Gits/EQcorrscan/eqcorrscan/lib/libutils.cpython-36m-x86_64-linux-gnu.so: undefined symbol: GOMP_parallel_sections'

Next, I tried deactivating the conda env and installing from source (master) using:

python setup.py install 

The installation seems to have worked but re-running the tests the same problem arises.

Threshold testing addition

It would be really good to have a module with a series of threshold testing functions in it, these would do:

  1. Test correlation threshold using random seeding of a template within a day or phase randomised noise as per Chamberlain et al. (2014);
  2. Test noise levels for a day, which could be used to calculate a rough completeness from the results of 1;
  3. Test coherance for a day of data using windows at the length of the template and for nodes distributed through the grid used for core.bright_lights.

The memory spike...

So I got yelled at by our friends at PAN (not really, they were very nice as usual) and my massive array got killed.

CPU efficiency of 10% on PAN was the result of looping over small template groups, which is a workaround for the memory spike issue. Correct me if you don't agree with that. But if that's a fair assumption then the memory spike needs addressing.

You've obviously put way more thought into this than I have. Therefore, there's likely a reason we can't do this but I want to ask regardless. If we change preallocation of the cccs_matrix in _channel_loop from:

`cccs_matrix = np.array(
    [np.array([np.array([0.0] * (len(stream[0].data) - temp_len + 1))] *
              len(templates))] * 2, dtype=np.float32)`

to:

cccs_matrix = np.zeros([2, len(templates), len(stream[0].data) - temp_len + 1])

In theory, this vastly limits the memory needed by eliminating python lists(?). For example, if we want to run correlations on 50 Hz day-long files, for 100, 1-sec templates, the memory profiling outputs this:

In [47]: shape = [2, 100, 4319499]

In [48]: %memit -r 3 np.array([np.array([np.array([0.0] * shape[2])] * shape[1])] * shape[0], dtype=np.float32)
peak memory: 6609.98 MiB, increment: 6575.42 MiB

In [49]: %memit -r 3 np.zeros(shape, dtype=np.float32)
peak memory: 34.84 MiB, increment: 0.03 MiB

Which would eliminate the spike. Unless I'm misunderstanding where the spike is coming from...which is likely.

Does any of that make sense @calum-chamberlain ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.