stefan-endres / dwpm-mixture-model Goto Github PK

View Code? Open in Web Editor NEW

7.0 4.0 3.0 2.46 MB

Phase seperation calculation using the DWPM mixture rule.

Python 26.31% Jupyter Notebook 73.69%

dwpm-mixture-model's People

Stargazers

Watchers

Forkers

alchemyst jonkermotumi youzhilin

dwpm-mixture-model's Issues

Error in math in tgo.py

In tgo.py:

    k_c = numpy.floor((-(ep - 1) + numpy.sqrt((ep - 1.0)--2 + 80.0 * ep))
                      / 2.0)

There is a --2 which I am pretty sure is not supposed to be there. Is it perhaps supposed to be **2 instead?

The build unittests are failing because the config.cfg file is not in the public GitHub repository, when a customized one is added it cannot find the correct directory where the data is stored locally.

Is there anyway to fix this? Maybe by adding a remote data storage for the .csv files?

`tgo_tests.py` not running

Tests should run properly.

Delete unused code

There are many commented-out pieces of code in the repository. If they are not used anymore, delete them. You can always get them back from the git history. I'm also not sure that all the files in the repository are currently doing something important (see #12). It is important that only working, used code is in the master branch. If you have some other stuff you're playing with that's not yet ready for prime time, use a new branch. If you have some old stuff which was for testing or for a different purpose, delete it.

Use branches instead of directories for old files

The beauty of having a version control system is that you don't need to keep the old versions of files around. If a file is no longer being used, just delete it. The same goes for code. Remember you can always look at the older versions or retrieve a file from an old version, so you can delete with impunity and have the current repository only contain the stuff that is needed right now.

Case conflict: nComp_tests.py vs ncomp_tests.py

You currently have two files in the repository with the same name but different casing (see above). You may not be able to see this on your local copy, but you can verify that it is the case by looking at the list of files on GitHub. If you have renamed nComp_test.py to ncomp_tests.py you should have done "git mv nComp_test.py ncomp_test.py". The key point is that git is case sensitive with filenames and that the current situation is causing issues with my mac which has a case-insensitive filesystem. Please do "git rm whichever_file_has_wrong_contents" and then if necessary, do "git mv correct_file_contents correct_file_name".

Remove clone code

There are quite a few pieces of copy-paste cloning in the code base at the moment. Worse, some of them are not exactly the same and differ in small difficult-to-see ways.

Run clonedigger on the code and examine the output.html file it will create. If there is redundant code which has been superceded by new code, delete the old code. If both clones are still used, combine them to a common function/method.

I recommend that you develop working tests before doing this task.

Use logging module instead of print warnings

There are a couple of places in the code where print is used to print warnings. The logging module is in the standard library and allows for more fine grained control of user feedback. I recommend all the parts where warnings are printed be replaced by logging.warn(). Most user output which is not nicely formatted (just messages) can be handled this way.

Remove references to `math`

The routines in numpy are probably faster and have better numerical behaviour than the routines in math. Find all the stuff you use in math and use the numpy version instead. As an addendum, also use exp(x) rather than e**x, as this can be more accurate and better behaved since e will only be known to finite precision, while an exponentiation function can calculate accurately without relying on a finite version of e.

csvDict vs pandas

Wouldn't a pandas dataframe be a better option than the csvDict class?

Dealing with 'nan' values and RuntimeWarnings during optimisation

I've started working on more polar systems which are so non-ideal that the Gibbs surface is not defined for most parameters (which incidentally also destroys a few proofs I was working on due to lack of continuity). This ruins the lagrange plane optimisation due to nan - nan behaving as a zero float when added to the objective function's summation (a scalar float output).

Essentially what I'm trying to do now is to catch any RuntimeWarning like:

ncomp.py:523: RuntimeWarning: invalid value encountered in log
  - s.m['a'] / (p.m['R'] * s.m['T'] * V))
G_sol_I = [ 0.32783229]
G_sol_II = [ nan]

Which can be done with exception handling with a few simple modifications:
http://stackoverflow.com/questions/15933741/how-do-i-catch-a-numpy-warning-like-its-an-exception-not-just-for-testing
...and then add penalties to the objective function.

The question now becomes how to deal with this on a high level optimisation, the options as I see them are:

Catch the Warning on a high level function evaluation and add an arbitrarily high penalty to the objective function. Since there are an extremely large amount of cases where the function is not defined it will be difficult to know what is actually causing the error and the penalty with have to be arbitrary, although we might be able to find meaningful values by evaluating every single state variable.
Catch the Warning on a low level inside every subfunction of the Gibbs equation. This has the advantage in that we can applly more meaningful penalties (ex. abs([invalid log value])). However, this will require a major rewrite of most functions wherein we eiter

Have several outputs to be passed to higher functions instead of the single floats most functions currently output.
Wrap every function inside a class that has a built in penalty tracker which resets at every high-level evaluation.

Please let me know what you think.

Extend test suite

This is a high priority request. For me to help more in this code base, it is important to have tests which excercise most of the code and automatically check correct outputs. I am having trouble with some of the refactoring I'd like to do because I'm not sure I will notice if I introduce an error.

Create a README file

Please create a README file which explains what the main entry points to the code base are. I notice there are files called pure.py binary.py and nComp.py, which I assume handle progressively harder problems, but it is not clear how it all fits together. A couple of lines in a README would help anyone to navigate the code a bit better.

`data.optimse` in main.py

This appears to be a typo, but worse, there doesn't seem to be anything like this in the data structure. Don't add code which breaks to the version control without notes that it will break.

Don't re-raise exceptions

There are far too many try statements in the code as it stands now which encompas too much code. Especially egregious are the parts where you catch an exception only to raise one again (see for instance in data_handling.py. In that case it is really bad because you print "Data not found" even if for instance the read failed for another reason.

Make testing code always runnable

There are some lines of testing code which are commented out. This is a bad pattern, as the way you would run this code now becomes "uncomment code, run, observe results, comment code again". It is far better to write this code as a separate file which can be run anytime. It is even better to have it written as a proper test. This pattern of having code hidden in comments which is to be run by commenting out section is to be avoided in all cases as it causes churn in the version control system.

Rename the repo (typos)

Go to settings and rename the project to "DWPM-Mixture-Model"

Split functionality into different files

The current workflow is 1. edit the config file 2. run the script. This has issues with for instance automating running the script for different sets of components in an automated fasion.

I recommend the following idea:

Have only things which will change very infrequently in the config file. At the moment this is the path to the data.
Use program arguments to specify things like the components to be run and split out the parts which plot data and the parts which calculate data. I recommend a system of having a file for each case (components + model). This plays well with Dropbox as it doesn't touch the old files if you're working on a new system.
Write separate programs or one program with optional arguments for all the plotting parts. This also enables you to plot stuff quickly after simulation without having to rerun the program.

This gets rid of many conditionals in your simulation program, making it a lot cleaner. Your main.py is a step in the same direction, but I think you will find splitting the plotting code from the calculation code with a data file in between makes a lot of the logic easier.

Improve coverage of tgo tests

Use coverage to look at the coverage of your tests, which is not currently 100% in tgo.py and write tests which at least cover all the code you have written.

Add alchemyst as a collaborator

I want to change some stuff on the base repo. Specifically I want to add automated testing checks via Travis.

Code climate push hook

I've registered the project at Code Climate. It would be useful for you to register a push hook there so that they will regenerate every time you push.

Avoid using exec

You appear to be using exec quite a lot, but it is really not to be used often. I would suggest searching for all the places you use exec and just rewriting those sections without exec.

Don't store data in the repo

It is by and large not a good idea to store data in the repository. It's likely to change often, leading to messy commits. It can also get large, leading to large commits. If multiple people run the code, the data can also change differently leading to messy merges.

The pattern that I have found to work well is this:

Have only code and a very small amount of static data in the repository itself (so stuff like the gas constant is not going to change, for instance).
Have a config file which you read the location of the data from (see for instance how it is handled in pulpsim)
Store the data on Dropbox if it takes long to generate or is to be used by everyone who wants to run the code, or locally if the program produces output which is fast to produce

Remove automatically generated files

.pyc files are platform specific and should not be in the repo. You can add them to .gitignore so that they don't show up as dirty. In general, don't just add "all files" to the repo, be judicious.

Make alchemyst a collaborator

Click here and enter alchemyst.

stefan-endres / dwpm-mixture-model Goto Github PK

dwpm-mixture-model's People

Stargazers

Watchers

Forkers

dwpm-mixture-model's Issues

Recommend Projects

Recommend Topics

Recommend Org