Git Product home page Git Product logo

ctlearn's People

Contributors

aribrill avatar bastienlacave avatar bryankim96 avatar jsevillamol avatar juanjosemuela avatar lucaromanato avatar maxnoe avatar nietootein avatar qi-feng avatar rcervinoucm avatar sgh14 avatar tjarkmiener avatar vuillaut avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ctlearn's Issues

Fix bug with DataLoader metadata

The following code fails with a KeyError:

>>> from ctalearn.data_loading import HDF5DataLoader
>>> data_loader = HDF5DataLoader(['/home/shevek/datasets/sample_prototype/gamma_20deg_0deg_srun4-100___cta-prod3_desert-2150m-Paranal-HB9_cone10.h5', '/home/shevek/datasets/sample_prototype/proton_20deg_0deg_srun1-10___cta-prod3_desert-2150m-Paranal-HB9.h5'])
>>> data_loader.get_metadata()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shevek/brill/ctalearn/ctalearn/data_loading.py", line 312, in get_metadata
    metadata['total_aux_params'] += metadata['num_additional_aux_params']
KeyError: 'num_additional_aux_params'

Add removal instructions

Some users expressed some concerns that the recommended installation procedure took up quite a bit of space in their disks and that some instructions dealing with the removal of ctalearn and the cleaning of the all the dependencies that were installed altogether with ctalearn would be appreciated.

Make mapping tables configurable

At present, the mapping tables from pixel vectors to camera shapes are fixed. The following configuration options may be added:

  • Additional padding around the camera image (default: none). This could be useful for matching a shower image to the fixed size expected by a model if resizing is not preferred, or for constraining images from different cameras to have the same shape. This applies to all telescope types.
  • Hexagonal to square pixel conversion method, such as oversampling or warping, to apply (default: uncertain). Each method will require its own parameters, e.g. whether to apply smoothing when oversampling and which technique to use for it. This applies only to telescope types with hexagonal pixels.

Correct telescope sorting options

Make the sort_telescopes_by_trigger option do what it says, and include the current functionality as a separate option sort_telescopes_by_size.

Reproduction of benchmark 0.2.0 results

I have reproduced the benchmark 0.2.0 results in the UCM server (unsure of the NVIDIA GPU model @nietootein ?).

Input Telescope Type Train Events Val Events Train steps Batch size Train time Val Acc Val Gamma Acc Val Proton Acc Val AUC
Single LST 161631 17960 37500 16 1h 0m 55s 0.7034521 0.63914883 0.7676981 0.7905172
Single MSTF 666288 74032 37500 16 1h 25m 39s 0.7445564 0.8048835 0.68821025 0.8311684
Single MSTN 772385 85821 37500 16 1h 29m 53s 0.7803801 0.82118773 0.742127 0.8679488
Single MSTS 541990 60222 37500 16 1h 14m 38s 0.78469664 0.8445853 0.7228224 0.86962146
Single SST1 379611 42180 37500 16 55m 59s 0.7793741 0.8133864 0.74446315 0.8592271
Single SSTA 404866 44986 37500 16 1h 7m 45s 0.72549236 0.66570693 0.78389066 0.8107811
Single SSTC 392626 43626 37500 16 1h 5m 31s 0.7493009 0.75805485 0.7405917 0.8215633
Array LST 76860 8541 37500 16 37m 27s 0.7280178 0.8020253 0.6643433 0.82055163
Array MSTF 224831 24982 37500 16 2h 7m 11s 0.80393887 0.8210466 0.78773 0.895706
Array MSTN 242425 26937 37500 16 2h 11m 18s 0.8277462 0.87108314 0.7881501 0.9178416
Array MSTS 200745 22306 37500 16 2h 31m 13s 0.8198691 0.8415895 0.7982652 0.9099872
Array SST1 178090 19788 37500 16 3h 13m 27s 0.7991712 0.7722142 0.8288231 0.89470106
Array SSTA 165302 18367 37500 16 2h 9m 23s 0.76920563 0.7152889 0.82571363 0.86622685
Array SSTC 171574 19064 37500 16 1h 51m 9s 0.8208665 0.81669563 0.8252668 0.90955627

The results are comparable to those reported in config/v_0_2_0_benchmark/readme.md.

The models where the difference in accuracy was more than 1% are: SSTC single telescope, SST1 array, SSTA array, SSTC array. In all those cases the difference in accuracy was below 2%.

The difference in AUROC was below 0.01 in magnitude for all models except SSTA array, where it was 0.01062685.

The difference in train times is more significant, but this can be easily explained by the use of different machines.

Refactor data loading

Refactor load_HDF5_data.py to have a data loader class parses settings and has methods to return numpy arrays of data, instead of a set of free-floating functions that must be called in an undocumented order. Instead of being stored in separate external dictionaries, the metadata, auxiliary data, and processed parameters should be stored internally in the class. The class should have a legible storage structure that distinguishes between parameters inherent in the dataset (metadata, auxiliary data) and those that rely on the settings specified by the user (the settings arguments and "processed parameters").

A suggested API is as follows:
class HDF5_data_loader(data_files, data_loading_settings, data_processing_settings, image_mapping_settings)
The settings arguments are dictionaries with settings for methods implemented directly in this module, implemented in process_data.py, and image_mapping.py, respectively.

HDF5_data_loader.load_data(filename, index)
Return the data from the specied filename and index as a numpy array. Note that because data_loader already knows whether single or multiple tel data are requested as a setting, and the metadata and auxiliary data are stored in the class, it's now only necessary to specify the filename and index. The method should automatically return the correct kind of data.

HDF5_data_loader.get_generators(training=False, validation=False, test=False)
Returns the specified generators. Allowed arguments are training=True, validation=False, test=False; training=True, validation=True, test=False; training=False, validation=False, test=True; all other combinations raise an error.

Depends on #14.

Rewrite MobileNet implementation

The current MobileNet code has a dependency on Tensorflow slim, with the implication that the train op must be implemented in slim as well. Rewrite it using the standard layers API.

Add time channel to images

This is a request for a new feature.

So far only single-channel images, where that channel contains the image charge, are loaded and passed to the networks. The arrival time of such a charge is also available for most telescope types (except for ASTRI) and actually stored in the DL1 h5 files. These arrival times have being used in the past to help "cleaning" the charge image, since the arrival times for those pixels receiving most of their charge from photons coming from the showers are correlated, whereas the arrival times for neighboring pixels illuminated just by night sky background are typically uncorrelated. Thus, it would be interesting to be able to parse our data as two-channel images, one channel containing the charge and the other containing the arrival times, hoping that this additional timing channel may improve on the event reconstruction that is performed considering solely the charge.

Implementation of this new feature seems more natural after #29 is resolved.

Make train.py compatible with the data_loader and data_processor classes

Split data_processing_settings into data_loading_settings and data_processing_settings. There should be four dictionaries of data-related settings:
data_input_settings: for TensorFlow Estimator input_fn
data_loading_settings: for loading HDF5 data (methods directly implemented in load_HDF5_data.py)
Includes validation_split, min_num_tels and cut_condition, use_telescope_position, chosen_telescope_types, model_type
data_processing_settings: for processing data in process_data.py
Includes sort_telescopes_by_trigger, crop_images, log_normalize_charge, all cleaning and cropping options
image_mapping_settings: for mapping pixels to arrays in image_mapping.py, to be implemented in a separate issue #10

Rewrite the section stating on line 140 with "if data_format == 'hdf5':" to use the data_loader class methods.

Depends on #15 and #16.

Specify prediction output format

Specify the predict output format and make predict.py able to both return and write to file data in that format. A suggested format is a list of filenames and a numpy array of file_index, event_index, predictions, classifier values, where file_index is the index in the list of filenames and event_index corresponds to index in the event table for event classification and index in the a telescope table for single telescope classification. Predictions and classifier values are the contents of the dictionary returned by tf.Estimator.predict(). The output format needs to handle both the cases in which the true labels are available (simulations) and are not available (data). The long-term aim is for it to be easy for ctapipe to read in and translate the predictions into its native format.

DivisionByZeroError in `apply_cuts` during `HDF5DataLoader` initialization

Configuration file of the run:
20180905_231431_config.txt

(it's actually a .yml file, but I had to change the extension to upload it)

List of example files used (they are part of the benchmark):
sample_files.txt

Traceback:

Traceback (most recent call last):
  File "/home/jsevillamol/Documentos/ctlearn_clean/ctlearn/run_model.py", line 459, in <module>
    run_model(config, mode=args.mode, debug=args.debug, log_to_file=args.log_to_file)
  File "/home/jsevillamol/Documentos/ctlearn_clean/ctlearn/run_model.py", line 130, in run_model
    **data_loading_settings)
  File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/ctlearn/data_loading.py", line 136, in __init__
    self._apply_cuts()
  File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/site-packages/ctlearn/data_loading.py", line 536, in _apply_cuts
    self.class_weights.append(num_examples/float(self.passing_num_examples_by_particle_id[particle_id]))
ZeroDivisionError: float division by zero
Closing remaining open files:/home/jsevillamol/Documentos/datasample/gamma_20deg_0deg_srun103-219___cta-prod3_desert-2150m-Paranal-HB9_cone10.h5...done/home/jsevillamol/Documentos/datasample/proton_20deg_0deg_srun1-10___cta-prod3_desert-2150m-Paranal-HB9.h5...done

You can replicate it also just running this:

data_files = ['/home/jsevillamol/Documentos/ctlearn/datasample/gamma_20deg_0deg_srun103-219___cta-prod3_desert-2150m-Paranal-HB9_cone10.h5', '/home/jsevillamol/Documentos/ctlearn/datasample/proton_20deg_0deg_srun1-10___cta-prod3_desert-2150m-Paranal-HB9.h5']
image_mapping_settings = {'hex_conversion_algorithm': 'oversampling', 'padding': {'LST': 2, 'MSTF': 1, 'MSTN': 2, 'MSTS': 4, 'SST1': 1, 'SSTA': 0, 'SSTC': 0, 'VTS': 1}}
data_loading_settings = {'cut_condition': '(mc_energy > 1.0) & (h_first_int < 20000)', 'example_type': 'array', 'min_num_tels': 1, 'seed': 1234, 'selected_tel_ids': [1, 2, 3, 4], 'selected_tel_type': 'LST', 'validation_split': 0.1}
    
data_loader = HDF5DataLoader(
            data_files,
            mode='train',
            image_mapper=ImageMapper(**image_mapping_settings),
            **data_loading_settings)

(change data_files to whichever location has the relevant files)

Clean up repository

Ensure that the Readme is up to date with all the changes for v0.2.0. Update the version number in setup.py.

Create a directory called deprecated in the models directory and move all the unused models there. The unused models are all but cnn_rnn.py, single_tel.py, and variable_input model.py. The deprecated models will be removed in the next release if not used by then.

Remove the unnecessary ctalearn/ctalearn/scripts directory. Put train.py and predict.py in ctalearn/ctalearn/. (These will be merged into a single module, see #22.)

Create a ctalearn/config directory and move example_config.ini there. This is also where config files for standard networks will go.

Create a ctalearn/scripts directory and move plot_classifer_value.py, plot_roc_curves.py, visualize_bounding_boxes.py, train_configurations.py, and test_metadata.py there.

Remove plot_gpu_util.py as it is not relevant for using ctalearn.

Move ctalearn/misc/images/ to ctalearn/images and delete the ctalearn/misc/ directory.

Allow array-level models to handle additional auxiliary info

In the current implementation of combine_telescopes in the variable input model, the only possible auxiliary info are the telescope positions and telescope triggers. Because cropping is now an option, the shower centroid (x, y) should be provided as additional auxiliary info to the model knows where in the camera the shower was detected.

One implementation to cleanly allow this is to rename the telescope_positions to telescope_auxiliary_info and concatenate the telescope positions with the shower positions to produce a single auxiliary info tensor. The model should be provided with some structure allowing it to parse the tensor into its components if needed. It is also critical that the total number of auxiliary inputs per telescope be passed into the network, which will require the metadata to be updated after the data processing arguments are known. Therefore this logic should be handled in add_processed_parameters() in data.py.

Replace ConfigObj config file format with YAML

train_configurations.py script must be rewritten to use YAML-formatted config file. Example config file needs to be re-written. Any relevant value-checking/exceptions must be added. Loading/reading config in run_model.py must also be refactored. Variable input model must be updated to match new configuration option names.

The standard pattern for loading configuration options is as follows: The settings which are used for each part of the pipeline/each class (DataLoader, DataProcessor, ImageMapper) should be collected in separate sections within the config file and read in as a complete dictionary. This dictionary is then unpacked directly into the constructor for the corresponding class to pass all desired settings(which should all be implemented as keyword arguments).

Interpreting output of training

I have run one of the benchmark configurations (concretely config/v_0_2_0_benchmarks/LST_cnn_rnn_config.yml) on training mode.

The beginning of the logfile.log produced is as follows:

INFO:Batch size: 16
INFO:Training and evaluating...
INFO:Total number of training events: 76860
INFO:Total number of validation events: 8541
INFO:Number of training steps per epoch: 4803
INFO:Number of training steps per validation: 2500
INFO:Total number of examples: 377098
INFO:Number of gamma (class 0) examples: 189633 (50.287%)
INFO:Number of proton (class 1) examples: 187465 (49.713%)

This prompts me the following questions:

  1. I would have assumed that the Total number of examples was the summation of training and validation events, but it is not. What does it represent then? In single telescope runs it coincides with the summation of training and validation events.
  2. Minor suggestion: Number of training steps per validation should be changed to Number of training steps *between* validations. I got very confused trying to understand what it meant.

After each evaluation, a line is produced saying things like INFO:Saving dict for global step 2500: accuracy = 0.6158129, accuracy_gamma = 0.6657754, accuracy_proton = 0.5658949, auc = 0.6625105, global_step = 2500, loss = 1.3074992.

I assume that those are the metrics on the validation set (correct?). Are they just stored in the eval_validation/events.out.tfevents file?

If so, how could I produce something similar to the plots in config/v_0_2_0_benchmark/readme.md? The plotting scripts in scripts seem to expect a list of .csv files as an argument. I assume this would be the output of running a trained net in prediction mode, but how do I feed the validation set I used for training to the trained net for predictions?

Also, what is the easiest way of reading the training time off the output?

One more question: since the number of training steps per epoch is different but the number of epochs is the same in each config.yml, how come that the last checkpoint is always in step 37500?

Refactor data processing

Refactor process_data.py to have a data processor class for data manipulation and augmentation with methods that accept numpy arrays of unprocessed data and return numpy arrays of processed and/or augmented data. The implementation should be generic with no dependencies on other ctalearn modules.

A suggested API is as follows:

class data_processor(data_processing_settings)
The data_processing_settings are a dictionary that is passed from data_loader.

data_processor.process_data(data)
Argument: numpy array of data
Returns a numpy array of processed data.

data_processor.augment_data(data)
Argument: numpy array of data
Returns a numpy array of processed data. Currently, since no data augmentation is implemented, so this function doesn't do anything, but is where this functionality will be put in the next version.

Depends on #14.

Add config files and optional graphs for important results

Add config files for any results that we use as benchmarks or have shown in presentations. This will aid in archiving and reproducing the results. We should show benchmarks for the newest version of ctalearn, so we can rerun a single telescope network when the other updates have been made and use that as the new benchmark. We should also include the config files for the CNN_RNN results that have been shown in presentations and posters. Optionally, graphs and descriptions can be added as well.

The config files should go in ctalearn/config/ and any plots in ctalearn/images/ (see #25).

Add mapping tables for all telescope types

At present the only mapping table in image.py from vectors of pixels to the image shape is for the SCT (MSTS). In order to process data for other telescope types, mapping tables for other telescope types must be added. It would make sense to start with square-pixel telescopes, as there isn't yet a standard method for converting hexagonal pixels to a square grid.

Make package compatible with ConfigObj

Example configuration files are currently being stored in two locations, ctalearn/config and ctalearn/ctalearn/config. Determine which location is better and merge the two directories.

The script train_configurations.py for automating hyperparameter searches relies heavily on the ConfigParser file format. Update it to match the new configuration file format.

As currently written, run_models.py requires an additional command line argument for the config spec. This is unnecessary because the configuration spec won't change from run to run. Instead hardcode the path to the config spec. See f4a420f and ffcfda8 for a good way to do this.

Update variable_input_model.py to use the new configuration option names.

Add a comment in example_config.conf to indicate that the allowed values and types for each option are identified in config_spec.ini.

Fix or remove test_metadata.py

The use case of the script script test_metadata.py is unclear and needs to be clearly defined.

Assuming it's worth keeping, several problems need to be fixed. As written, sections are missing and it doesn't run. There is a hardcoded path on line 26 that needs to be removed. The name "test_metadata" is ambiguous and the script should be renamed to remove the word "test". It also should be updated to be compatible with the HDF5_data_loader class (see #15).

Alternatively, if the script isn't worth keeping, remove it.

Implement Single Telescope training capability

Currently the approach used for training single tel models is to attach a logits layer directly to an array-model CNN-block. It may be worthwhile to implement a way to train arbitrary single tel models without this restriction.

Deep copy nested dicts in run_multiple_configurations.py run combinations file

Dictionary config arguments in the run combinations file outputted by run_multiple_configurations.py are sometimes not deep copied, with references appearing instead of the actual configuration parameter. For example, this occurs in the layers parameter recorded in run_combinations.py created when running the v0.2.0 benchmark config (snippet below). This issue doesn't seem to affect performance, just the appearance of the parameters in the saved file.

run00:
  batch_size: 64
  example_type: single_tel
  layers: &id001
  - filters: 32
    kernel_size: 3
  - filters: 32
    kernel_size: 3
  - filters: 64
    kernel_size: 3
  - filters: 128
    kernel_size: 3
  learning_rate: 5.0e-05
  model:
    function: single_tel_model
    module: single_tel
  sorting: null
  tel_type: LST
run01:
  batch_size: 16
  example_type: array
  layers: *id001
  learning_rate: 0.0001
  model:
    function: cnn_rnn_model
    module: cnn_rnn
  sorting: size
  tel_type: LST

Add option to list data files directly in configuration file

Since YAML provides a convenient way to include lists directly in the config file, allow Data:file_list to accept either a path to a file containing file paths (the current method) or a list of file paths written directly in the config file.

Telescope IDs are not unique, causing critical error

The current design of DataLoader relies on the tel_id parameter as a unique key to index the telescopes, making checks that each tel_id corresponds to a telescope of the correct type. However, this assumption is invalid. The telescope IDs in the MLProto dataset are not unique. The conflict is with SST1. IDs 1-4 are used by both SST1 and LST; 5-29 by SST1 and MSTF; and 30-33 by SST1 and MSTN.

For example, tel_id 1 is assigned to SST1, so when running a model using LST data, run_model.py crashes with the following output:

Traceback (most recent call last):
  File "ctlearn/ctlearn/run_model.py", line 456, in <module>
    run_model(config, mode=args.mode, debug=args.debug, log_to_file=args.log_to_file)
  File "ctlearn/ctlearn/run_model.py", line 130, in run_model
    **data_loading_settings)
  File "/home/shevek/software/anaconda3/envs/testing-brill/lib/python3.6/site-packages/ctlearn/data_loading.py", line 125, in __init__
    self._select_telescopes(selected_tel_type, tel_ids=selected_tel_ids)
  File "/home/shevek/software/anaconda3/envs/testing-brill/lib/python3.6/site-packages/ctlearn/data_loading.py", line 366, in _select_telescopes
    raise ValueError("Selected tel id {} is of wrong tel type {}.".format(tel_id, all_tel_ids[tel_id]))
ValueError: Selected tel id 1 is of wrong tel type LST.

The treatment of tel_ids in DataLoader needs to rewritten to accommodate this. The best way may be to move away from integer tel_ids, so that the overall tel_id could be either a string of the telescope type concatenated with the tel_id number, perhaps with a separator, or a tuple of (tel_type, tel_id).

Append CTLearn version to the copy of the configuration file generated for each training

In order to ensure the reproducibility of each training run a copy of the parsed config file is stored along with the rest of training outputs. Ideally, such reproducibility should not depend on the version of the code that was used for training, although this dependency might be present during the pre-release development phase. Therefore, appending the code version to the copy of the configuration file may be helpful.

Define API for accessing DataLoader class attributes

All HDF5DataLoader class attributes are currently specific to that class as opposed to DataLoader, but are accessed outside the class in run_model.py.

When defining data_loader, generator_output_dtypes, map_fn_output_dtypes, output_names, and output_is_label are all accessed. It doesn't seem these are specific to the data format, so if they are made part of the DataLoader base class, then this entire section except for data_loader = HDF5DataLoader() can be pulled out of the if clause, resulting in cleaner code.

Similarly, when getting the event indices in predict mode, example_type and examples are accessed and the event index names are manually defined, but again it may be possible to define all these in such a way that the event indices whatever they may be could be accessed for any DataLoader without relying on specific attributes for each DataLoader subclass.

This issue is probably best tackled at the time we actually add another DataLoader subclass.

Add missing requirements

Add the following missing requirements to the requirements files: scipy, configobj, validate.

Improve configurability of training hyperparameters

At present, most of the training hyperparameters are required config options. Make these optional, providing reasonable defaults.

Add the ability to choose any optimizer available in TensorFlow (or at least all the commonly used ones), as well as the ability to configure their parameters. Since there are a variety of optimizers and they all take different arguments, this could be accomplished by just having an optimizer_arguments dictionary that is passed directly to the optimizer on initialization without any intermediate parsing. This would replace the current adam_epsilon configuration parameter.

Add options for additional training hyperparameters such as regularization type and strength and learning rate annealing.

Reproducibility

Hi,

I'm trying to run your models and reproduce your results, but it wasn't possible because there aren't example datasets in your repository.

Can you provide links where we can download some simtel or hdf5 files you use to train the models?

Greetings,
-- mavillan

Add Unit Tests

Although other priorities (reorganizing the code, implementing new functionality, and making the project accessible and available for outside use) have been our primary focus so far, as the codebase becomes larger and we consider the possibility of outside contributions it seems like a good idea to begin looking seriously at implementing tests for maintaining and monitoring the code quality.

The project is relatively small and there is no need for the sort of serious testing that is used in more complicated software. However, with several interacting components, it seems like a few simple tests may be appropriate and helpful. Testing portions of the code in isolation should make it easier to recognize bugs and easier to test new changes without having to run the full training pipeline and search for errors by hand. Tests may also be helpful as a way of evaluating pull requests and preventing code regression/breaking changes.

The framework for testing and continuous integration with pytest and TravisCI is already in place, all that remains to be written are the tests themselves. The overall approach was based on the implementation of tests in ctapipe, which seems like it may be a helpful example.

Maybe we can discuss below what (if any) tests might be appropriate/helpful and who would be willing to write them.

Move open_file and close_file calls outside load_data functions

If Dataset.from_generator is changed to support multi-threading like Dataset.map, calls to open HDF5 files should preferably be moved outside of the load_data functions. i.e. all HDF5 files should be opened in advance and the file handles passed into the load_data functions instead of filename strings. This avoids a lot of unnecessary open and close files calls.

On a test of loading 1000 examples from a single HDF5 file with load_data_single_tel_HDF5, moving the file open and closing outside of load_data_single_tel_HDF5 (instead of calling them for each example) reduced runtime by ~30%. Improvement is probably much larger when >>1000 examples are read per HDF5 file.

Fix logging of array examples by class

When running an array-level model HDF5DataLoader.log_class_breakdown() incorrectly reports the number of examples by class. For example, from the logfiles produced when running the LST single tel and CNN-RNN v0.2.0 benchmarks, single tel has the correct behavior:

INFO:Batch size: 64
INFO:Training and evaluating...
INFO:Total number of training events: 161631
INFO:Total number of validation events: 17960
INFO:Number of training steps per epoch: 2525
INFO:Number of training steps per validation: 2500
INFO:Total number of examples: 179591
INFO:Number of gamma (class 0) examples: 89165 (49.649%)
INFO:Number of proton (class 1) examples: 90426 (50.351%)

but for the array-level model, CNN-RNN, the total number of examples and examples by class are much larger than the actual number of events used (in this case, 76860+8541=85401):

INFO:Batch size: 16
INFO:Training and evaluating...
INFO:Total number of training events: 76860
INFO:Total number of validation events: 8541
INFO:Number of training steps per epoch: 4803
INFO:Number of training steps per validation: 2500
INFO:Total number of examples: 377098
INFO:Number of gamma (class 0) examples: 189633 (50.287%)
INFO:Number of proton (class 1) examples: 187465 (49.713%)

This may be because the function is reporting the total number of examples in the dataset before the min_num_tels and possibly cut_condition cuts are applied.

Benchmark in peak_times channel mode

input_type tel_type auroc
array LST 0.8286985
array MSTF 0.89471704
array MSTN 0.9177482
array MSTS 0.91144246
array SST1 0.8937345
array SSTA 0.86309916
array SSTC 0.90838027
single_tel LST 0.7877764
single_tel MSTF 0.8263566
single_tel MSTN 0.8663868
single_tel MSTS 0.866212
single_tel SST1 0.85580283
single_tel SSTA 0.8018345
single_tel SSTC 0.80345446

Troubleshooting the "Run a Model" section of the readme

I am trying to run a sample model with the example_config.yml provided in the repo.

With a terminal opened in the root directory of the repo (/ctlearn), I ran the following two commands:


export CTLEARN_DIR=/home/jsevillamol/Documentos/ctlearn/ctlearn
python $CTLEARN_DIR/run_model.py config/example_config.yml

This produces the following error:

Traceback (most recent call last):
  File "/home/jsevillamol/Documentos/ctlearn/ctlearn/run_model.py", line 459, in <module>
    run_model(config, mode=args.mode, debug=args.debug, log_to_file=args.log_to_file)
  File "/home/jsevillamol/Documentos/ctlearn/ctlearn/run_model.py", line 65, in run_model
    model_module = importlib.import_module(config['Model']['model']['module'])
  File "/home/jsevillamol/anaconda3/envs/ctlearn/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'cnn_rnn'

But the cnn_rnn.py file is in its proper folder (ctlearn/models).

What am I doing wrong?

I am using Python 3.6 on Ubuntu 16.04, with Tensorflow-gpu 1.10

Merge predict.py into train.py

Combine the functionality of predict.py into train.py. Prediction will be toggled on as a mode train.py --predict. In the current implementation, there is a very large amount of duplicated code that is changed often, which is an unsustainable situation. While the script train.py being used for prediction is a slight misnomer, ignore this for now. Add a [Predict] section to the config file including options for whether to export as a file, the filename, and whether the true label is present in the data files.

Refactor image.py to use a class

To reduce ambiguity, rename the module to image_mapping.py.

A suggested API is as follows:

class image_mapper(image_mapping_settings)
Argument: dictionary of settings. Right now there isn't anything relevant implemented. See #10 for more details on what the settings should be.

image_mapper.map_image(pixels, telescope_type)
Arguments: pixels is a numpy array of values for each pixel, in order of pixel index. The array has dimensions [N_pixels, N_channels] where N_channels is e.g. 1 when just using charges and 2 when using charges and peak arrival times. telescope_type is a string specifying the telescope type as defined in the HDF5 format, for example 'MSTS' for SCT data, which is the only currently implemented telescope type.
Returns: A numpy array of data with shape [length, width, depth] corresponding to a telescope image mapped to an array.

Clean up input_fn in run_model.py

Use dictionary unpacking to pass in settings instead of passing the dictionary directly. This will provide a convenient way to provide default arguments, making it easier to have optional data input settings. At least one options that is currently configurable should be hardcoded, that is whether to use dataset.map() - this is already constrained by the choice of data format. Also, investigate whether it's worth making whether to prefetch a configurable option or if there's a best practice for it that could just be hardcoded.

Minimize dependencies in virtual environment

We should filter out from requirements.txt all those dependencies that are not actually necessary for ctalearn to properly run, so the installation is lighter and faster. Some users have requested instructions for a clean uninstall that will remove not just the virtual environment but also the packages that were downloaded to set it up (without conflicting with other environments).

Split data.py into data loading and data processing modules

Currently, data.py combines the functionality of data loading specific to our HDF5 file format with data processing which is independent of the data format. Split it into into two separate files, load_HDF5_data.py and process_data.py, where load_HDF5_data.py includes all functionality specific to loading numpy arrays of data from the HDF5 files and process_data.py includes all functionality of data processing (and potentially augmentation) that is independent of the format.

load_HDF5_data.py should include the various HDF5 data loading helper functions, load_data_eventwise_HDF5, load_data_single_tel_HDF5, load_auxiliary_data_HDF5, load_metadata_HDF5, add_processed_parameters, load_image_HDF5, apply_cuts_HDF5, get_data_generators_HDF5 and helper functions. Functionality from the load_data functions involving processing the data (cropping, normalization) should be split out as a call to process_data.py. Since HDF5 is in the name of the module, drop "HDF5" from all the function names.

process_data.py should include crop_image and any split out functionality from the load_data functions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.