Git Product home page Git Product logo

xenonpy's People

Contributors

eltociear avatar imgbotapp avatar qi-zh avatar stewu5 avatar tsumina avatar yohrrr avatar zguo235 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xenonpy's Issues

Error fix for proposal

When a molecule is found not to be able to convert to Mol,
current proposal class directly skip the generated smiles string.
Instead, we should either fill in with None or return back to the old smiles string,
and throw out a warning for that.

data elements not exist

I installed development mode with this site,
https://xenonpy.readthedocs.io/en/latest/installation.html

and tried to check the preset.elements with this page.
https://xenonpy.readthedocs.io/en/latest/tutorial/1-dataset.html

then I got the following error message.

>>> preset.elements
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hagawa/codes/xenonpy/XenonPy/xenonpy/datatools/preset.py", line 230, in elements
    self._check('elements')
  File "/home/hagawa/codes/xenonpy/XenonPy/xenonpy/datatools/preset.py", line 192, in _check
    "data {0} not exist, please run <Preset.sync('{0}')> to download from the repository".format(data))
RuntimeError: data elements not exist, please run <Preset.sync('elements')> to download from the repository

I cannot calculate descriptor without this preset.elements data.
How can I get it?

>>> from xenonpy.datatools import preset

doesn't make any error.
I'm using Ubuntu18.04 and Anaconda.

HTTPError message does not shown when response return 504

Hello. Thank you for the awesome package.
I have founded trivial things regarding the raise exception.

  1. I would like to check which models are prepared. Could you provide me the list of trained models, for my research purpose?
  2. In addition, I have a simple query to get all Models data.
    When server return the 504 status, the raise message did not match as your expectation.
mdl = MDL()
models = mdl("_", save_to=False)
C:\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

When 504 error, ret.json() doesn't success because the return value is not json, since
JSONDecodeError overwrite raise HTTPError.

    def query(self, query, variables):
        payload = json.dumps({'query': query, 'variables': variables})
        ret = requests.post(url=self._url, headers=self._headers, data=payload)
        if ret.status_code != 200:
            raise HTTPError('status_code: %s, %s' %
                            (ret.status_code, ret.json()))
        ret = ret.json()['data']
        return ret
ipdb> ret
<Response [504]>
ipdb> ret.content
b'<html>\r\n<head><title>504 Gateway Time-out</title></head>\r\n<body bgcolor="white">\r\n<center><h1>504 Gateway Time-out</h1></center>\r\n<hr><center>nginx/1.14.2</center>\r\n</body>\r\n</html>\r\n'
ipdb> ret.json()
*** json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I'm looking forward for your kind reply. Thank you very much.

[RFC] too complex to install

I failed to install xenonpy on anaconda3 on linux and failed to use the docker xenonpy on linux. (It looks that the docker xenonpy doesn't work because of a proxy problem.)
I am afraid that xenonpy is now too integrated to install. How about separating them into layers, e.g. preset feature, feature calculators, and MDL, and so on, depending on users' purposes?

[RFC][WIP] Add more query functions into MDL

We need to add more querying functions to MDL class. At here, I want to collect some suggestions for what function should be be given priority.

  • list_properties; return a list of all properties' name.
  • query_properties; return a list of matching property.
  • list_mode_sets; return a list of all model sets' name.
  • query_mode_sets; return a list of matching model set.
  • pull_mode_set; return the details of a model set and download the models in it.
  • list_descriptors; return a list of all descriptors' name.
  • query_descriptors; return a list of matching descriptor.

get_prob error causes interruption

The modifier needs a try-catch to avoid stopping the iteration due to only one failure of molecule modification (caused by not able to find a substring in the N-gram table inside the get_prob function).

[RFC] Consistency checking after removing error descriptors

In XenonPy/xenonpy/inverse/iqspr/estimator.py, there is a bug.

# remove NaN fromm X
desc = self._descriptor.transform(smiles)
desc = pd.DataFrame(data=desc).reset_index(drop=True)
y = y.reset_index(drop=True)
desc.dropna(inplace=True)
y = y.loc[desc.index]

'desc' and 'y' are merged if length of them are different.
In previous sample/iQSPR.ipynb, there are bug relating this problem in cell In[11]: (Already fixed).

I guess you should check length of smiles list and property DataFrame.
Following is my opinion.

if len(smiles) != y.shape[0]:
    raise RuntimeError('len(smiles) != y.shape[0]')

Thank you.
And I apologize for my poor English.

Change default file extension in Dataset

File extension will be changed.

before after description
pkl.pd_ pd(.*) pandas.DataFrame object file
csv csv csv file
xlsx xlsx or xls excel file
pkl.z_ pkl(.*) common pickled files

Ngram update functionality

Allow weighted combination of multiple Ngram, trained separately.
(Need to watch out for how to track the number of trained data in each case)

bug in set_para for BaseDescriptor

set_para will not function properly in BaseDescriptor because the values assigned will not pass to BaseFeaturizers inside.
E.g., when setting on_errors with .set_para in BaseDescriptor, the on_errors actually will not pass to the BaseFeaturizers defined inside.
A new def for set_para in BaseDescriptor is needed

Add feature selection for "BaseDescriptor" class

We designed the BaseDescriptor as a container of BaseFeaturizer calculators. By using this, user can batch a lot of featurizers as a preset for pipelining. This proposal for add feature selection function to BaseDescriptor class.

Proposal

Assume we have some class like this:

class BaseDescriptor:
    def __init__(sefl, *, featurizers='all'):
        ....

class NewDescriptor(BaseDescriptor):
    def __init__(sefl, *, featurizers):
        super().__init__(featurizers=featurizers)
        ....

        sefl.input = Featurizer1()
        sefl.input = Featurizer2()
        sefl.input = Featurizer3()

descriptor = NewDescriptor()

In this case, for the input has column named input, descriptor will calculate all features that associate with self.input then concatenate them. This is exactly what we did in current version.

In this proposal, user can initialize the NewDescriptor with a parameter called featurizers. This parameter contains the name of features. Only the featurizer which have name in the featurizers will be activated.

In following example, only the specific features 'Featurizer1' and 'Featurizer3' will be calculated.

descriptor = NewDescriptor(featurizers=['Featurizer1', 'Featurizer3'])

data source and new data

  1. Don't you add Prof. Oguchi's atomic data made of his first-principles calculation? I think that you have them.
  2. Is it possible to show original feature values with NaN and the interpolated ones? How about changing them with a flag?
  3. Can you show data sources of the features? Sometimes their names are vague to judge what is the descriptor. You can't correct the values if you don't know what they are exactly.

Cannot download pre-trained models

I found that I can download the models for "stable inorganic compounds for materials project" but cannot download those for "QM9 Dataset" or "PolymerGenome Dataset" either.

When I tried mdl.pull(urls), it responded as follows:

FileNotFoundError: [Errno 2] No such file or directory: '~\S3\organic.nonpolymer.mu_debye\rcdk.fp.fingerprint\mxnet.nn.neural_network\shotgun_mu_Debye_randFP4975_corr-0.7528_mxnet_294-101-28-1_2018-06-13\model-175724d\shotgun_mu_Debye_randFP4975_corr-0.7528_mxnet_294-101-28-1_2018-06-13-045255-symbol.json'

Does anyone have any idea for this problem?

BayesianRidgeEstimator.fit dropna issue

When property NaN rows and descriptor NaN rows are not matching (not subset of one of the other), there may be mismatch of rows and causing error in fitting?

Add migration function

Because of #20, users have to move their own data from ~/.xenonpy/dataset to userdata where specific in the ~/.xenonpy/conf.yml.

We should add some migration functions to help that.

$ python -m xenonpy migrate

[RFC] Remove python3.5 support

New versions of rdkit, pandas and pymatgen are no longer supported in python 3.5 officially. It's time for us to finish the python 3.5 support now.

[WIP] Allow 'BaseDescriptor' class to use anonymous/renamed input

Until now BaseDescriptor class use a fixed name to get input data from pd.DataFrame object like:

class YouDescriptor(BaseDescriptor):
    def __init__(self, n_jobs=-1):
        self.descriptor1 = Feature1(n_jobs)

Here descriptor1 must be the column name of input. To use this class is something like:

descriptor = YouDescriptor()
input = pd.Series(<'list of samples'>, name='descriptor1')
output = descriptor.fit_transform(input)

That's not flexible in the following 2 case:

  1. only one input are needed.
  2. multiple descriptor use same input.

These issue is a proposal to improve user experience.

  • allow use a list as input when single type descriptor.
  • use name mapping: inner_name: user_given_name.

How to use R model?

In the last part of XenonPy/samples/mdl.ipynb, it is announced the R tutorials will be released.
I want to use the R model, since the python model is available only for inorganic crystals.
Please disclose how to calculated the fingerprint value input to the R model.
If there is some reference code, it is ok.

minor bugs found when using frozen featurizer in iQSPR

  1. Add functionality to allow automatic label detection in BaseFeaturizer when we use n_job = 0.
  2. BaseEstimator bug: when input pd.Series (taking one column of pandas dataframe) as property input in "fit", there will be error.

Allow to load a user determined dataset version

We specify the db version in conf.yml file to sync user' dataset with us, but this is not a flexible way.

This is a proposal to refactor the data loader which allow to load the user determined version of dataset. This can be used like this:

from xenonpy.datatools import Preset
with Preset(ver='0.1.1') as preset:
    data = preset.mp_structure
  • refactor class Preset

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.