Git Product home page Git Product logo

spflow's Introduction

pypi License Build Status

SPFlow: An Easy and Extensible Library for Sum-Product Networks

SPFlow, an open-source Python library providing a simple interface to inference, learning and manipulation routines for deep and tractable probabilistic models called Sum-Product Networks (SPNs). The library allows one to quickly create SPNs both from data and through a domain specific language (DSL). It efficiently implements several probabilistic inference routines like computing marginals, conditionals and (approximate) most probable explanations (MPEs) along with sampling as well as utilities for serializing,plotting and structure statistics on an SPN.

Furthermore, SPFlow is extremely extensible and customizable, allowing users to promptly create new inference and learning routines by injecting custom code into a light-weight functional-oriented API framework.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Installing

To install the latest released version of SPFlow using pip

pip3 install spflow

An AUR package is available for Arch Linux. The PKGBUILD should automatically apply a patch for SPFlow to work with Tensorflow 2.

yay -S python-spflow

Examples

We start by creating an SPN. Using a Domain-Specific Language (DSL), we can quickly create an SPN of categorical leave nodes like this:

from spn.structure.leaves.parametric.Parametric import Categorical

spn = 0.4 * (Categorical(p=[0.2, 0.8], scope=0) *
             (0.3 * (Categorical(p=[0.3, 0.7], scope=1) *
                     Categorical(p=[0.4, 0.6], scope=2))
            + 0.7 * (Categorical(p=[0.5, 0.5], scope=1) *
                     Categorical(p=[0.6, 0.4], scope=2)))) \
    + 0.6 * (Categorical(p=[0.2, 0.8], scope=0) *
             Categorical(p=[0.3, 0.7], scope=1) *
             Categorical(p=[0.4, 0.6], scope=2))

We can create the same SPN using the object hierarchy:

from spn.structure.leaves.parametric.Parametric import Categorical

from spn.structure.Base import Sum, Product

from spn.structure.Base import assign_ids, rebuild_scopes_bottom_up


p0 = Product(children=[Categorical(p=[0.3, 0.7], scope=1), Categorical(p=[0.4, 0.6], scope=2)])
p1 = Product(children=[Categorical(p=[0.5, 0.5], scope=1), Categorical(p=[0.6, 0.4], scope=2)])
s1 = Sum(weights=[0.3, 0.7], children=[p0, p1])
p2 = Product(children=[Categorical(p=[0.2, 0.8], scope=0), s1])
p3 = Product(children=[Categorical(p=[0.2, 0.8], scope=0), Categorical(p=[0.3, 0.7], scope=1)])
p4 = Product(children=[p3, Categorical(p=[0.4, 0.6], scope=2)])
spn = Sum(weights=[0.4, 0.6], children=[p2, p4])

assign_ids(spn)
rebuild_scopes_bottom_up(spn)

The p parameter indicates the probabilities, and the scope indicates the variable we are modeling.

We can now visualize the SPN using:

from spn.io.Graphics import plot_spn

plot_spn(spn, 'basicspn.png')

basicspn.png

Marginalizing an SPN means summing out all the other non-relevant variables. So, if we want to marginalize the above SPN and sum out all other variables leaving only variables 1 and 2, we can do:

from spn.algorithms.Marginalization import marginalize

spn_marg = marginalize(spn, [1,2])

Here, we marginalize all the variables not in [1,2], and create a NEW structure that knows nothing about the previous one nor about the variable 0.

We can use this new spn to do all the operations we are interested in. That means, we can also plot it!

plot_spn(spn_marg, 'marginalspn.png')

basicspn.png

We can also dump the SPN as text:

from spn.io.Text import spn_to_str_equation
txt = spn_to_str_equation(spn_marg)
print(txt)

And the output is:

(0.6*((Categorical(V1|p=[0.3, 0.7]) * Categorical(V2|p=[0.4, 0.6]))) + 0.12000000000000002*((Categorical(V1|p=[0.3, 0.7]) * Categorical(V2|p=[0.4, 0.6]))) + 0.27999999999999997*((Categorical(V1|p=[0.5, 0.5]) * Categorical(V2|p=[0.6, 0.4]))))

However, the most interesting aspect of SPNs is the tractable inference. Here is an example on how to evaluate the SPNs from above. Since we have 3 variables, we want to create a 2D numpy array of 3 columns and 1 row.

import numpy as np
test_data = np.array([1.0, 0.0, 1.0]).reshape(-1, 3)

We then compute the log-likelihood:

from spn.algorithms.Inference import log_likelihood

ll = log_likelihood(spn, test_data)
print(ll, np.exp(ll))

And the output is:

[[-1.90730501]] [[0.14848]]

We can also compute the log-likelihood of the marginal SPN:

llm = log_likelihood(spn_marg, test_data)
print(llm, np.exp(llm))

Note that we used the same test_data input, as the SPN is still expecting a numpy array with data at columns 1 and 2, ignoring column 0. The output is:

[[-1.68416146]] [[0.1856]]

Another alternative, is marginal inference on the original SPN. This is done by setting as np.nan the feature we want to marginalize on the fly. It does not change the structure.

test_data2 = np.array([np.nan, 0.0, 1.0]).reshape(-1, 3)
llom =  log_likelihood(spn, test_data2)
print(llom, np.exp(llom))

The output is exactly the same as the evaluation of the marginal spn:

[[-1.68416146]] [[0.1856]]

We can use tensorflow to do the evaluation in a GPU:

from spn.gpu.TensorFlow import eval_tf
lltf = eval_tf(spn, test_data)
print(lltf, np.exp(lltf))

The output is as expected, equal to the one in python:

[[-1.90730501]] [[0.14848]]

We can also use tensorflow to do the parameter optimization in a GPU:

from spn.gpu.TensorFlow import optimize_tf
optimized_spn = optimize_tf(spn, test_data)
lloptimized = log_likelihood(optimized_spn, test_data)
print(lloptimized, np.exp(lloptimized))

The output is of course, higher likelihoods:

[[-1.38152628]] [[0.25119487]]

We can generate new samples that follow the joint distribution captured by the SPN!

from numpy.random.mtrand import RandomState
from spn.algorithms.Sampling import sample_instances
print(sample_instances(spn, np.array([np.nan, np.nan, np.nan] * 5).reshape(-1, 3), RandomState(123)))

Here we created 5 new instances that follow the distribution

[[0. 1. 0.]
 [1. 0. 0.]
 [1. 1. 0.]
 [1. 1. 1.]
 [1. 1. 0.]]

the np.nan values indicate the columns we want to sample.

We can also do conditional sampling, that is, if we have evidence for some of the variables we can pass that information to the SPN and sample for the rest of the variables:

from numpy.random.mtrand import RandomState
from spn.algorithms.Sampling import sample_instances
print(sample_instances(spn, np.array([np.nan, 0, 0] * 5).reshape(-1, 3), RandomState(123)))

Here we created 5 new instances whose evidence is V1=0 and V2=0

[[0. 0. 0.]
 [1. 0. 0.]
 [0. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]

We can do classification, by learning an SPN from data and then comparing the probabilities for the given classes: Imagine we have the following dataset:

basicspn.png

generated by two gaussians with means (5,5) and (10,10), and we label the cluster at (5,5) to be class 0 and the cluster at (10,10) to be class 1.

np.random.seed(123)
train_data = np.c_[np.r_[np.random.normal(5, 1, (500, 2)), np.random.normal(10, 1, (500, 2))],
                   np.r_[np.zeros((500, 1)), np.ones((500, 1))]]

We can learn an SPN from data:

from spn.algorithms.LearningWrappers import learn_parametric, learn_classifier
from spn.structure.leaves.parametric.Parametric import Categorical, Gaussian
from spn.structure.Base import Context
spn_classification = learn_classifier(train_data,
                       Context(parametric_types=[Gaussian, Gaussian, Categorical]).add_domains(train_data),
                       learn_parametric, 2)

Here, we model our problem as containing 3 features, two Gaussians for the coordinates and one Categorical for the label. We specify that the label is in column 2, and create the corresponding SPN.

Now, imagine we want to classify two instances, one located at (3,4) and another one at (12,8). To do that, we first create an array with two rows and 3 columns. We set the last column to np.nan to indicate that we don't know the labels. And we set the rest of the values in the 2D array accordingly.

test_classification = np.array([3.0, 4.0, np.nan, 12.0, 18.0, np.nan]).reshape(-1, 3)

the first row is the first instance, the second row is the second instance.

[[ 3.  4. nan]
 [12. 18. nan]]

We can do classification via approximate most probable explanation (MPE). Here, we expect the first instance to be labeled as 0 and the second one as 1.

from spn.algorithms.MPE import mpe
print(mpe(spn_classification, test_classification))

as we can see, both instances are classified correctly, as the correct label is set in the last column

[[ 3.  4.  0.]
 [12. 18.  1.]]

We can learn an MSPN and a parametric SPN from data:

import numpy as np
np.random.seed(123)

a = np.random.randint(2, size=1000).reshape(-1, 1)
b = np.random.randint(3, size=1000).reshape(-1, 1)
c = np.r_[np.random.normal(10, 5, (300, 1)), np.random.normal(20, 10, (700, 1))]
d = 5 * a + 3 * b + c
train_data = np.c_[a, b, c, d]

Here, we have a dataset containing four features, two Discrete and two Real valued.

We can learn an MSPN with:

from spn.structure.Base import Context
from spn.structure.StatisticalTypes import MetaType

ds_context = Context(meta_types=[MetaType.DISCRETE, MetaType.DISCRETE, MetaType.REAL, MetaType.REAL])
ds_context.add_domains(train_data)

from spn.algorithms.LearningWrappers import learn_mspn

mspn = learn_mspn(train_data, ds_context, min_instances_slice=20)

We can learn a parametric SPN with:

from spn.structure.Base import Context
from spn.structure.leaves.parametric.Parametric import Categorical, Gaussian

ds_context = Context(parametric_types=[Categorical, Categorical, Gaussian, Gaussian]).add_domains(train_data)

from spn.algorithms.LearningWrappers import learn_parametric

spn = learn_parametric(train_data, ds_context, min_instances_slice=20)

Multivariate leaf

We can learn a SPN with multivariate leaf. For instance SPN with Chow Liu tree (CLTs) as multivariate leaf can be learned with:

import numpy as np
np.random.seed(123)

train_data = np.random.binomial(1, [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.1], size=(100,10))

from spn.structure.leaves.cltree.CLTree import create_cltree_leaf
from spn.structure.Base import Context
from spn.structure.leaves.parametric.Parametric import Bernoulli
from spn.algorithms.LearningWrappers import learn_parametric
from spn.algorithms.Inference import log_likelihood

ds_context = Context(parametric_types=[Bernoulli,Bernoulli,Bernoulli,Bernoulli,
                                       Bernoulli,Bernoulli,Bernoulli,Bernoulli,
                                       Bernoulli,Bernoulli]).add_domains(train_data)

spn = learn_parametric(train_data, 
                       ds_context, 
                       min_instances_slice=20, 
                       min_features_slice=1, 
                       multivariate_leaf=True, 
                       leaves=create_cltree_leaf)

ll = log_likelihood(spn, train_data)
print(np.mean(ll))

Cutset Networks (CNets)

With SPFlow we can learn both the structure and the parameters of CNets, a particular kind of SPNs with CLTs as leaf providing exact MPE inference, with:

import numpy as np
np.random.seed(123)


from spn.structure.leaves.cltree.CLTree import create_cltree_leaf
from spn.structure.Base import Context
from spn.structure.leaves.parametric.Parametric import Bernoulli
from spn.algorithms.LearningWrappers import learn_parametric, learn_cnet
from spn.algorithms.Inference import log_likelihood

train_data = np.random.binomial(1, [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.1], size=(100,10))

ds_context = Context(parametric_types=[Bernoulli,Bernoulli,Bernoulli,Bernoulli,
                                       Bernoulli,Bernoulli,Bernoulli,Bernoulli,
                                       Bernoulli,Bernoulli]).add_domains(train_data)

# learning a CNet with a naive mle conditioning
cnet_naive_mle = learn_cnet(train_data, 
                            ds_context, 
                            cond="naive_mle", 
                            min_instances_slice=20, 
                            min_features_slice=1)

# learning a CNet with random conditioning
cnet_random = learn_cnet(train_data, 
                         ds_context, 
                         cond="random", 
                         min_instances_slice=20, 
                         min_features_slice=1)

ll = log_likelihood(cnet_naive_mle, train_data)
print("Naive mle conditioning", np.mean(ll))

ll = log_likelihood(cnet_random, train_data)
print("Random conditioning", np.mean(ll))

# computing exact MPE
from spn.algorithms.MPE import mpe
train_data_mpe = train_data.astype(float)
train_data_mpe[:,0] = np.nan
print(mpe(cnet_random, train_data_mpe)) 

Expectations and Moments

SPNs allow you to compute first and higher order moments of the represented probability function by directly evaluating the tree structure. There are three main functions implemented for that.

The Expectations function allows you to directly compute first oder moments given an SPN and (optionally) a list of features for which you need the expectation and an array of evidence.

from spn.algorithms.stats.Expectations import Expectation
from spn.structure.leaves.piecewise.PiecewiseLinear import PiecewiseLinear

piecewise_spn = ((0.5 * PiecewiseLinear([0, 1, 2], [0, 1, 0], [], scope=[0]) +
                  0.5 * PiecewiseLinear([-2, -1, 0], [0, 1, 0], [], scope=[0])) *
                 (0.5 * PiecewiseLinear([0, 1, 2], [0, 1, 0], [], scope=[1]) +
                  0.5 * PiecewiseLinear([-1, 0, 1], [0, 1, 0], [], scope=[1])))
Expectation(piecewise_spn) # = [[0, 0.5]]

If you pass a feature scope, only the expectation for those features will be returned:

from spn.algorithms.stats.Expectations import Expectation

piecewise_spn = ((0.5 * PiecewiseLinear([0, 1, 2], [0, 1, 0], [], scope=[0]) +
                  0.5 * PiecewiseLinear([-2, -1, 0], [0, 1, 0], [], scope=[0])) *
                 (0.5 * PiecewiseLinear([0, 1, 2], [0, 1, 0], [], scope=[1]) +
                  0.5 * PiecewiseLinear([-1, 0, 1], [0, 1, 0], [], scope=[1])))
Expectation(piecewise_spn, feature_scope=[0]) # = [[0]]
Expectation(piecewise_spn, feature_scope=[1]) # = [[0.5]]

Finally, you can also pass evidence to the network which computes the conditional expectation:

from spn.algorithms.stats.Expectations import Expectation

piecewise_spn = ((0.5 * PiecewiseLinear([0, 1, 2], [0, 1, 0], [], scope=[0]) +
                  0.5 * PiecewiseLinear([-2, -1, 0], [0, 1, 0], [], scope=[0])) *
                 (0.5 * PiecewiseLinear([0, 1, 2], [0, 1, 0], [], scope=[1]) +
                  0.5 * PiecewiseLinear([-1, 0, 1], [0, 1, 0], [], scope=[1])))
Expectation(piecewise_spn, feature_scope=[0], evidence=np.array([[np.nan, 0]])) # = [[0]]
Expectation(piecewise_spn, feature_scope=[1], evidence=np.array([[0, np.nan]])) # = [[0.5]]

Utilities

Finally, we have some basic utilities for working with SPNs:

We can make sure that the SPN that we are using is valid, that is, it is consistent and complete.

from spn.algorithms.Validity import is_valid
print(is_valid(spn))

The output indicates that the SPN is valid and there are no debugging error messages:

(True, None)

To compute basic statistics on the structure of the SPN:

from spn.algorithms.Statistics import get_structure_stats
print(get_structure_stats(spn))

Layerwise SPN in PyTorch

A layerwise implementation of leaf, sum and product nodes in PyTorch is available in the spn.algorithms.layerwise module. For more information, check out the Layerwise SPN README.

Extending the library

Using the SPN is as we have seen, relatively easy. However, we might need to extend it if we want to work with new distributions.

Imagine, we wanted to create a new Leaf type that models the Pareto distribution. We start by creating a new class:

from spn.structure.leaves.parametric.Parametric import Leaf
class Pareto(Leaf):
    def __init__(self, a, scope=None):
        Leaf.__init__(self, scope=scope)
        self.a = a

Now, if we want to do inference with this new node type, we just implement the corresponding likelihood function:

def pareto_likelihood(node, data=None, dtype=np.float64):
    probs = np.ones((data.shape[0], 1), dtype=dtype)
    from scipy.stats import pareto
    probs[:] = pareto.pdf(data[:, node.scope], node.a)
    return probs

This function receives the node, the data on which to compute the probability and the numpy dtype for the result.

Now, we just need to register this function so that it can be used seamlessly by the rest of the infrastructure:

from spn.algorithms.Inference import add_node_likelihood
add_node_likelihood(Pareto, pareto_likelihood)

Now, we can create SPNs that use the new distribution and also evaluate them.

spn = 0.3 * Pareto(2.0, scope=0) + 0.7 * Pareto(3.0, scope=0)
log_likelihood(spn, np.array([1.5]).reshape(-1, 1))

this produces the output:

[[-0.52324814]]

All other aspects of the SPN library can be extended in a similar same way.

Papers SPFlow can reproduce

  • Nicola Di Mauro, Antonio Vergari, Teresa M.A. Basile, Floriana Esposito. "Fast and Accurate Density Estimation with Extremely Randomized Cutset Networks". In: ECML/PKDD, 2017.
  • Nicola Di Mauro, Antonio Vergari, and Teresa M.A. Basile. "Learning Bayesian Random Cutset Forests". In ISMIS 2015, LNAI 9384, pp. 1-11, Springer, 2015.
  • Nicola Di Mauro, Antonio Vergari, and Floriana Esposito. "Learning Accurate Cutset Networks by Exploiting Decomposability". In AI*IA. 2015, LNAI 9336, 1-12, Springer, 2015.
  • Antonio Vergari, Nicola Di Mauro, and Floriana Esposito. "Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning". In ECML/PKDD, LNCS, 343-358, Springer. 2015.

Papers implemented in SPFlow

  • Molina, Alejandro, Sriraam Natarajan, and Kristian Kersting. "Poisson Sum-Product Networks: A Deep Architecture for Tractable Multivariate Poisson Distributions." In AAAI, pp. 2357-2363. 2017.

  • Molina, Alejandro, Antonio Vergari, Nicola Di Mauro, Sriraam Natarajan, Floriana Esposito, and Kristian Kersting. "Mixed sum-product networks: A deep architecture for hybrid domains." In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2018.

Citation

If you find SPFlow useful please cite us in your work:

@misc{Molina2019SPFlow,
  Author = {Alejandro Molina and Antonio Vergari and Karl Stelzner and Robert Peharz and Pranav Subramani and Nicola Di Mauro and Pascal Poupart and Kristian Kersting},
  Title = {SPFlow: An Easy and Extensible Library for Deep Probabilistic Learning using Sum-Product Networks},
  Year = {2019},
  Eprint = {arXiv:1901.03704},
}

Authors

  • Alejandro Molina - TU Darmstadt
  • Antonio Vergari - Max-Planck-Institute
  • Karl Stelzner - TU Darmstadt
  • Robert Peharz - University of Cambridge
  • Nicola Di Mauro - University of Bari Aldo Moro
  • Kristian Kersting - TU Darmstadt

See also the list of contributors who participated in this project.

Contributors

  • Moritz Kulessa - TU Darmstadt
  • Claas Voelcker - TU Darmstadt
  • Simon Roesler - Karlsruhe Institute of Technology
  • Steven Lang - TU Darmstadt
  • Alexander L. Hayes - Indiana University, Bloomington

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE.md file for details

Acknowledgments

  • Parts of SPFlow as well as its motivating research have been supported by the Germany Science Foundation (DFG) - AIPHES, GRK 1994, and CAML, KE 1686/3-1 as part of SPP 1999- and the Federal Ministry of Education and Research (BMBF) - InDaS, 01IS17063B.

  • This project received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant Agreement No. 797223 (HYBSPN).

spflow's People

Contributors

a03ki avatar adrianjav avatar alejandromolinaml avatar andreasntr avatar andyshih12 avatar arranger1044 avatar braun-steven avatar cvoelcker avatar fabriziov avatar hayesall avatar lapayo avatar maximiliangottschalk avatar minimrbanana avatar moritzkulessa avatar nicoladimauro avatar pranavsubramani avatar renatogeh avatar shannonkroes avatar tmadeira avatar tosemml avatar xiaotingshao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spflow's Issues

Getting error in plot_spn running introductory example

Description

An error is thrown when trying to run the introductory example specified in https://github.com/SPFlow/SPFlow using spflow 0.0.40

from spn.structure.leaves.parametric.Parametric import Categorical
from spn.io.Graphics import plot_spn

spn = 0.4 * (Categorical(p=[0.2, 0.8], scope=0) *
             (0.3 * (Categorical(p=[0.3, 0.7], scope=1) *
                     Categorical(p=[0.4, 0.6], scope=2))
            + 0.7 * (Categorical(p=[0.5, 0.5], scope=1) *
                     Categorical(p=[0.6, 0.4], scope=2)))) \
    + 0.6 * (Categorical(p=[0.2, 0.8], scope=0) *
             Categorical(p=[0.3, 0.7], scope=1) *
             Categorical(p=[0.4, 0.6], scope=2))

plot_spn(spn, 'basicspn.png')

Expected behavior:

The spn is plotted when executing plot_spn(spn, 'basicspn.png')

Encountered behavior:

The following error is found:

Traceback (most recent call last):
File "/home/jesus/src/spflow-samples/samples.py", line 13, in
plot_spn(spn, 'basicspn.png')
File "...../python3.7/site-packages/spn/io/Graphics.py", line 80, in plot_spn
g, pos=pos, edge_labels=nx.get_edge_attributes(g, "weight"), font_size=16, clip_on=False, alpha=0.6
TypeError: draw_networkx_edge_labels() got an unexpected keyword argument 'clip_on'

Additional Information

  • Operating System: Ubuntu 20.04.1 LTS
  • Python Version: 3.7.5
  • spflow = 0.0.40

Documentation/basic.py fails

Running basic.py from the Documentation folder currently fails when calling classification with the following error message

``

File "Documentation/basics.py", line 208, in
classification()
File "Documentation/basics.py", line 157, in classification
learn_parametric, 2)
File "/home/iliricon/Documents/Studium/ProjectsKersting/SimpleSPN/src/spn/algorithms/LearningWrappers.py", line 25, in learn_classifier
branch = spn_learn_wrapper(data[data[:, label_idx] == label, :], ds_context, cpus=cpus, rand_gen=rand_gen)
File "/home/iliricon/Documents/Studium/ProjectsKersting/SimpleSPN/src/spn/algorithms/LearningWrappers.py", line 117, in learn_parametric
return learn(data, ds_context, cols, rows, min_instances_slice, threshold, ohe)
File "/home/iliricon/Documents/Studium/ProjectsKersting/SimpleSPN/src/spn/algorithms/LearningWrappers.py", line 112, in learn
return learn_structure(data, ds_context, split_rows, split_cols, leaves, nextop)
File "/home/iliricon/Documents/Studium/ProjectsKersting/SimpleSPN/src/spn/algorithms/StructureLearning.py", line 175, in learn_structure
data_slices = split_cols(local_data, ds_context, scope)
File "/home/iliricon/Documents/Studium/ProjectsKersting/SimpleSPN/src/spn/algorithms/splitting/RDC.py", line 433, in split_cols_RDC_py
rand_gen=rand_gen)
File "/home/iliricon/Documents/Studium/ProjectsKersting/SimpleSPN/src/spn/algorithms/splitting/RDC.py", line 393, in getIndependentRDCGroups_py
rand_gen=rand_gen)
File "/home/iliricon/Documents/Studium/ProjectsKersting/SimpleSPN/src/spn/algorithms/splitting/RDC.py", line 355, in rdc_test
for i, j in pairwise_comparisons)
File "/home/iliricon/.virtualenvs/SimpleSPN/lib/python3.6/site-packages/joblib/parallel.py", line 994, in call
self.retrieve()
File "/home/iliricon/.virtualenvs/SimpleSPN/lib/python3.6/site-packages/joblib/parallel.py", line 897, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/iliricon/.virtualenvs/SimpleSPN/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 515, in wrap_future_result
return future.result(timeout=timeout)
File "/home/iliricon/.virtualenvs/SimpleSPN/lib/python3.6/site-packages/joblib/externals/loky/_base.py", line 431, in result
return self.__get_result()
File "/home/iliricon/.virtualenvs/SimpleSPN/lib/python3.6/site-packages/joblib/externals/loky/_base.py", line 382, in __get_result
raise self._exception
IndexError: list index out of range

SPN figure is wrong?

What did you expect to happen?

The child of a product node should be a sum node and vice-versa. This is what I have read about SPNs.

What actually happened?

The figure shown has a product node as child of a product node. How is that possible?

Describe your attempts to resolve the issue

No response

Steps to reproduce

The given code

System Information

Google colab

Installed Python Packages

spnflow

Error during EM Optimization when SPN has leaf nodes with multiple parents

Description

When an SPN structure is created manually in a way that some leaf nodes have multiple parents, an error is thrown when trying to optimize the weights using the function EM_optimization.

The following code throws an error:

import numpy as np
from spn.algorithms.EM import EM_optimization
from spn.structure.Base import Product, Sum, assign_ids, rebuild_scopes_bottom_up
from spn.structure.leaves.parametric.Parametric import Categorical

p7 = Categorical(p=[0.5, 0.5], scope=0)
p8 = Categorical(p=[0.5, 0.5], scope=0)
p9 = Categorical(p=[0.5, 0.5], scope=1)
p10 = Categorical(p=[0.5, 0.5], scope=1)

p3 = Sum(weights=[0.5, 0.5], children=[p7, p8])
p4 = Sum(weights=[0.5, 0.5], children=[p9, p10])
p5 = Sum(weights=[0.5, 0.5], children=[p7, p8])
p6 = Sum(weights=[0.5, 0.5], children=[p9, p10])

p1 = Product(children=[p3, p4])
p2 = Product(children=[p5, p6])
spn = Sum(weights=[0.5, 0.5], children=[p1, p2])

assign_ids(spn)
rebuild_scopes_bottom_up(spn)

train_data = np.array([
    [0, 1],
    [0, 0],
    [1, 1],
    [1, 1],
    [0, 0]], dtype=float)

EM_optimization(spn, train_data, iterations=100)

However, if the above SPN is modified so that no leaf node has multiple parents, then no error happens.

Expected behavior:

No error happens

Encountered behavior:

The following error is thrown:

  File "/home/user/src/machine-learning-samples/SPNs/error.py", line 30, in <module>
    EM_optimization(spn, train_data, iterations=100)
  File "/home/user/.local/share/virtualenvs/SPNs-1WNQAoe_/lib/python3.7/site-packages/spn/algorithms/EM.py", line 61, in EM_optimization
    gradients = gradient_backward(spn, lls_per_node)
  File "/home/user/.local/share/virtualenvs/SPNs-1WNQAoe_/lib/python3.7/site-packages/spn/algorithms/Gradient.py", line 86, in gradient_backward
    lls_per_node=lls_per_node,
  File "/home/user/.local/share/virtualenvs/SPNs-1WNQAoe_/lib/python3.7/site-packages/spn/structure/Base.py", line 445, in eval_spn_top_down
    result = func(n, param, **args)
  File "/home/user/.local/share/virtualenvs/SPNs-1WNQAoe_/lib/python3.7/site-packages/spn/algorithms/Gradient.py", line 18, in leaf_gradient_backward
    gradient_result[:, node.id] = gradients
ValueError: could not broadcast input array from shape (10) into shape (5)

Additional Information

  • Operating System: Ubuntu 20.04.1 LTS
  • Python Version: 3.7.5

plot_spn not working

What did you expect to happen?

it should have plotted the basic SPN as created in the README

What actually happened?


FileNotFoundError Traceback (most recent call last)
File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/site-packages/pydot.py:1923, in Dot.create(self, prog, format, encoding)
1922 try:
-> 1923 stdout_data, stderr_data, process = call_graphviz(
1924 program=prog,
1925 arguments=arguments,
1926 working_dir=tmp_dir,
1927 )
1928 except OSError as e:

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/site-packages/pydot.py:132, in call_graphviz(program, arguments, working_dir, **kwargs)
130 program_with_args = [program, ] + arguments
--> 132 process = subprocess.Popen(
133 program_with_args,
134 env=env,
135 cwd=working_dir,
136 shell=False,
137 stderr=subprocess.PIPE,
138 stdout=subprocess.PIPE,
139 **kwargs
140 )
141 stdout_data, stderr_data = process.communicate()

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/subprocess.py:858, in Popen.init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
855 self.stderr = io.TextIOWrapper(self.stderr,
856 encoding=encoding, errors=errors)
--> 858 self._execute_child(args, executable, preexec_fn, close_fds,
859 pass_fds, cwd, env,
860 startupinfo, creationflags, shell,
861 p2cread, p2cwrite,
862 c2pread, c2pwrite,
863 errread, errwrite,
864 restore_signals, start_new_session)
865 except:
866 # Cleanup if the child failed starting.

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/subprocess.py:1704, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
1703 err_msg = os.strerror(errno_num)
-> 1704 raise child_exception_type(errno_num, err_msg, err_filename)
1705 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'dot'

During handling of the above exception, another exception occurred:

FileNotFoundError Traceback (most recent call last)
Input In [6], in
----> 1 plot_spn(spn,'basicspn.png')

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/site-packages/spn/io/Graphics.py:91, in plot_spn(spn, fname)
90 def plot_spn(spn, fname="plot.pdf"):
---> 91 plt = draw_spn(spn)
92 plt.savefig(fname, bbox_inches="tight", pad_inches=0)

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/site-packages/spn/io/Graphics.py:59, in draw_spn(spn)
55 plt.clf()
57 g, labels = _get_networkx_obj(spn)
---> 59 pos = graphviz_layout(g, prog="dot")
60 ax = plt.gca()
62 nx.draw(
63 g,
64 pos,
(...)
72 font_size=16,
73 )

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/site-packages/networkx/drawing/nx_pydot.py:263, in graphviz_layout(G, prog, root)
233 def graphviz_layout(G, prog="neato", root=None):
234 """Create node positions using Pydot and Graphviz.
235
236 Returns a dictionary of positions keyed by node.
(...)
261 This is a wrapper for pydot_layout.
262 """
--> 263 return pydot_layout(G=G, prog=prog, root=root)

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/site-packages/networkx/drawing/nx_pydot.py:312, in pydot_layout(G, prog, root)
308 P.set("root", str(root))
310 # List of low-level bytes comprising a string in the dot language converted
311 # from the passed graph with the passed external GraphViz command.
--> 312 D_bytes = P.create_dot(prog=prog)
314 # Unique string decoded from these bytes with the preferred locale encoding
315 D = str(D_bytes, encoding=getpreferredencoding())

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/site-packages/pydot.py:1733, in Dot.init..new_method(f, prog, encoding)
1729 def new_method(
1730 f=frmt, prog=self.prog,
1731 encoding=None):
1732 """Refer to docstring of method create."""
-> 1733 return self.create(
1734 format=f, prog=prog, encoding=encoding)

File ~/opt/anaconda3/envs/SPFlow/lib/python3.8/site-packages/pydot.py:1933, in Dot.create(self, prog, format, encoding)
1930 args = list(e.args)
1931 args[1] = '"{prog}" not found in path.'.format(
1932 prog=prog)
-> 1933 raise OSError(*args)
1934 else:
1935 raise

FileNotFoundError: [Errno 2] "dot" not found in path.

Describe your attempts to resolve the issue

No response

Steps to reproduce

from spn.structure.leaves.parametric.Parametric import Categorical
from spn.structure.Base import Sum, Product
from spn.structure.Base import assign_ids, rebuild_scopes_bottom_up
p0 = Product(children=[Categorical(p=[0.3, 0.7], scope=1), Categorical(p=[0.4, 0.6], scope=2)])
p1 = Product(children=[Categorical(p=[0.5, 0.5], scope=1), Categorical(p=[0.6, 0.4], scope=2)])
s1 = Sum(weights=[0.3, 0.7], children=[p0, p1])
p2 = Product(children=[Categorical(p=[0.2, 0.8], scope=0), s1])
p3 = Product(children=[Categorical(p=[0.2, 0.8], scope=0), Categorical(p=[0.3, 0.7], scope=1)])
p4 = Product(children=[p3, Categorical(p=[0.4, 0.6], scope=2)])
spn = Sum(weights=[0.4, 0.6], children=[p2, p4])
assign_ids(spn)
rebuild_scopes_bottom_up(spn)
from spn.io.Graphics import plot_spn
plot_spn(spn,'basicspn.png')

System Information

Python version: 3.8.12
SPFLOW version: 0.0.41
Operating System: MacOS
Darwin Ramyanees-MacBook-Pro.local 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:54 PST 2021; root:xnu-8019.61.5~1/RELEASE_X86_64 x86_64

Installed Python Packages

appnope @ file:///Users/runner/miniforge3/conda-bld/appnope_1635819647402/work arff==0.9 asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1618968359944/work attrs==21.4.0 backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1618230623929/work black @ file:///home/conda/feedstock_root/build_artifacts/black-recipe_1639405010350/work certifi==2021.10.8 click @ file:///Users/runner/miniforge3/conda-bld/click_1635822681337/work cycler==0.11.0 dataclasses @ file:///home/conda/feedstock_root/build_artifacts/dataclasses_1628958434797/work debugpy @ file:///Users/runner/miniforge3/conda-bld/debugpy_1636043372408/work decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1605121927639/work/dist/entrypoints-0.3-py2.py3-none-any.whl ete3==3.1.2 executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1633213722787/work fonttools==4.28.5 iniconfig==1.1.1 ipykernel @ file:///Users/runner/miniforge3/conda-bld/ipykernel_1642098150029/work/dist/ipykernel-6.7.0-py3-none-any.whl ipython @ file:///Users/runner/miniforge3/conda-bld/ipython_1642613824459/work jedi @ file:///Users/runner/miniforge3/conda-bld/jedi_1637175422032/work joblib==1.1.0 jupyter-client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1642858610849/work jupyter-core @ file:///Users/runner/miniforge3/conda-bld/jupyter_core_1636814374382/work kiwisolver==1.3.2 lark-parser==0.12.0 matplotlib==3.5.1 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1631080358261/work mpmath==1.2.1 mypy-extensions @ file:///Users/runner/miniforge3/conda-bld/mypy_extensions_1635839836513/work nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1638419302549/work networkx==2.6.3 numpy==1.22.1 packaging==21.3 pandas==1.4.0 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work pathspec @ file:///home/conda/feedstock_root/build_artifacts/pathspec_1626613672358/work patsy==0.5.2 pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1602535608087/work pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work Pillow==9.0.0 platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1630400214373/work pluggy==1.0.0 prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1639065841292/work ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work py==1.11.0 pydot==1.4.2 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1641580240686/work pyparsing==3.0.7 PyQt5==5.15.6 PyQt5-Qt5==5.15.2 PyQt5-sip==12.9.0 pytest==6.2.5 python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work pytz==2021.3 pyzmq @ file:///Users/runner/miniforge3/conda-bld/pyzmq_1635877502229/work scikit-learn==1.0.2 scipy==1.7.3 six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work sklearn==0.0 spflow==0.0.41 stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1642255706390/work statsmodels==0.13.1 sympy==1.9 threadpoolctl==3.0.0 toml==0.10.2 tomli @ file:///home/conda/feedstock_root/build_artifacts/tomli_1635181214134/work tornado @ file:///Users/runner/miniforge3/conda-bld/tornado_1635819838703/work tqdm==4.62.3 traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1635260543454/work typed-ast @ file:///Users/runner/miniforge3/conda-bld/typed-ast_1638670816302/work typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1638334978229/work wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1600965781394/work

Piecewise cannot represent density functions due to bug in inference

Description

The following code does not work as expected:

import numpy as np
from spn.structure.leaves.piecewise.PiecewiseLinear import PiecewiseLinear
from spn.algorithms.Inference import likelihood

piecewise_spn = PiecewiseLinear([0, 0.5, 1], [0, 2, 0], [], scope=[0])

print(likelihood(piecewise_spn, np.array([[0.5]])))```

It raises an error, since the node returns 2, which is a perfectly valid density, just not a valid probability. 

#### Expected behavior:
I expect 2, since this is the density described by the piecewise linear node

#### Encountered behavior:
raises an assertion error, due to an assertion in piecewise inference:
```python
assert prob > 0 && prob <= 1

Additional Information

Can be fixed by removing the assertion. The assertion seems to assume probabilities, not densities, but is this the expected usage of PiecewiseLinear?

Add global float precision for all operations and data objects

Description

Similar to PyTorch, we should have a global float precision that defines the float precision of new data objects for all backends (numpy, torch, tensorflow, etc.) so that we can ensure precision consistencies. Furthermore, public API functions (likelihood, etc.) should offer a "dtype" argument that sets the global precision for all computations involved during the function evaluation. This could be partially supported by a contextmanager.

Sampling MSPN: 'Histogram' has no attribute '_eval_func'

Description

Sampling does not work on MSPN as it does on regular SPN (Like the example for sample_instances in README.md).

import numpy as np
from numpy.random.mtrand import RandomState
from spn.algorithms.Sampling import sample_instances
from spn.structure.Base import Context
from spn.structure.StatisticalTypes import MetaType
from spn.algorithms.LearningWrappers import learn_mspn

np.random.seed(123)
a = np.random.randint(2, size=1000).reshape(-1, 1)
b = np.random.randint(3, size=1000).reshape(-1, 1)
c = np.r_[np.random.normal(10, 5, (300, 1)), np.random.normal(20, 10, (700, 1))]
d = 5 * a + 3 * b + c
train_data = np.c_[a, b, c, d]

ds_context = Context(meta_types=[MetaType.DISCRETE, MetaType.DISCRETE, MetaType.REAL, MetaType.REAL]).add_domains(train_data)
mspn = learn_mspn(train_data, ds_context, min_instances_slice=20)

print(sample_instances(mspn, np.array([np.nan, np.nan, np.nan, np.nan] * 5).reshape(-1, 4), RandomState(123)))

Expected behavior:

A sample similar to sampling an spn (As in the example in the readme):

[[ 0.          1.         23.64695818 28.274582  ]
 [ 1.          0.         -8.22234143 -6.12003724]
 [ 0.          1.         -0.60335736  3.04162869]
 [ 0.          0.         13.4827991  13.99828954]
 [ 1.          1.          0.44601695  3.36859188]]

Encountered behavior:

An Error occured:

  File "<ipython-input-3-585fe05aac76>", line 4, in <module>
    print(sample_instances(mspn, np.array([np.nan, np.nan, np.nan, np.nan]).reshape(-1, 4), RandomState(123)))
  File "/home/me/apps/anaconda3/envs/modelbase/lib/python3.7/site-packages/spn/algorithms/Sampling.py", line 120, in sample_instances
    node, node_sampling, parent_result=instance_ids, data=data, lls_per_node=lls_per_node, rand_gen=rand_gen
  File "/home/me/apps/anaconda3/envs/modelbase/lib/python3.7/site-packages/spn/structure/Base.py", line 442, in eval_spn_top_down
    func = n.__class__._eval_func[-1]
AttributeError: type object 'Histogram' has no attribute '_eval_func'

Additional Information

  • Operating System:
    Archlinux
  • Python Version:
    Python 3.7.3

Add Sphinx based documentation

The documentation shall be provided using Sphinx. The content should at least contain the following:

  • Welcome/Introduction page with a short description of what this library does, listing most of its features, and showing a simple example on how to construct a 1) simple SPN and a 2) complex SPN (like EiNet or something)
  • Example pages with multiple examples on how to do
    • inference (+ conditional)
    • marginalization (also structurally)
    • sampling (+ conditional)
    • learning (structure/parameters)
    • visualization methods / IO
    • multiple different network types
    • multiple different backends (base/torch/tf/jax)
    • conversions between backends
  • API Reference according to our modules (extract from the python docstrings via Sphinx autodoc)
  • FAQ section
  • Help page on how to reach out to us (github issues, gitter chat(?), etc.)

References

Histogram leaf learning cannot deal with singular values

Description

Minimum working example: Learning a histogram or piecewise distribution over just a single value is impossible, because the code contains an assertion which checks max and min.

Expected behavior:

I would expect the following code (context omitted for clarity):

from spn.algorithms import LearningWrappers
data = np.array([[0.] * 100])
LearningWrappers.learn_mspn(data)

to create a distribution over 0. with a likelihood of 100 % (certain event). This is a problem especially if a splitting in the dataset accidentally results in a single value column which can happen pretty fast (e.g. a categorical value separates another column perfectly into independent splits).

Encountered behavior:

The code contains an assertion that no single value column exists.

importing learn_parametric crash on Windows

###Description

Just by importing learn_parametric makes the whole script crash on Windows. To be sure it was that function to cause the error, I created a script whit just the import statement and the result was the same.

###Environement

  • Operating System: Windows 10
  • Python Version: 3.7.2 / 3.6.8

###Stacktrace

Traceback (most recent call last):
File "", line 1, in
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
Traceback (most recent call last):
File "", line 1, in
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
Traceback (most recent call last):
File "", line 1, in
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
exitcode = _main(fd)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 114, in _main
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
prepare(preparation_data)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 225, in prepare
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 263, in run_path
_fixup_main_from_path(data['init_main_from_path'])
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
pkg_name=pkg_name, script_name=fname)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\main.py", line 2, in
from StackSPN import StackSPN
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\StackSPN.py", line 1, in
from spn.algorithms.LearningWrappers import learn_parametric
File "C:\Users\dedo2\ONEDRI1\Tesi\venv\lib\site-packages\spn\algorithms\LearningWrappers.py", line 9, in
run_name="mp_main")
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\main.py", line 2, in
from StackSPN import StackSPN
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\StackSPN.py", line 1, in
from spn.algorithms.LearningWrappers import learn_parametric
File "C:\Users\dedo2\ONEDRI
1\Tesi\venv\lib\site-packages\spn\algorithms\LearningWrappers.py", line 9, in
from spn.algorithms.StructureLearning import get_next_operation, learn_structure
File "C:\Users\dedo2\ONEDRI1\Tesi\venv\lib\site-packages\spn\algorithms\StructureLearning.py", line 33, in
pool = multiprocessing.Pool(processes=cpus)
exitcode = _main(fd)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 119, in Pool
from spn.algorithms.StructureLearning import get_next_operation, learn_structure
File "C:\Users\dedo2\ONEDRI
1\Tesi\venv\lib\site-packages\spn\algorithms\StructureLearning.py", line 33, in
pool = multiprocessing.Pool(processes=cpus)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 119, in Pool
run_name="mp_main")
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 85, in _run_code
context=self.get_context())
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 176, in init
exec(code, run_globals)
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\main.py", line 2, in
Traceback (most recent call last):
context=self.get_context())
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 176, in init
File "", line 1, in
from StackSPN import StackSPN
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\StackSPN.py", line 1, in
from spn.algorithms.LearningWrappers import learn_parametric
File "C:\Users\dedo2\ONEDRI1\Tesi\venv\lib\site-packages\spn\algorithms\LearningWrappers.py", line 9, in
from spn.algorithms.StructureLearning import get_next_operation, learn_structure
File "C:\Users\dedo2\ONEDRI
1\Tesi\venv\lib\site-packages\spn\algorithms\StructureLearning.py", line 33, in
pool = multiprocessing.Pool(processes=cpus)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 176, in init
self._repopulate_pool()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\process.py", line 112, in start
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
self._repopulate_pool()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\process.py", line 112, in start
exitcode = _main(fd)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
self._repopulate_pool()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\process.py", line 112, in start
run_name="mp_main")
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 263, in run_path
self._popen = self._Popen(self)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 322, in _Popen
self._popen = self._Popen(self)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 322, in _Popen
self._popen = self._Popen(self)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 322, in _Popen
pkg_name=pkg_name, script_name=fname)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 85, in _run_code
return Popen(process_obj)
exec(code, run_globals)
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\main.py", line 2, in
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\popen_spawn_win32.py", line 33, in init
return Popen(process_obj)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\popen_spawn_win32.py", line 33, in init
from StackSPN import StackSPN
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\StackSPN.py", line 1, in
return Popen(process_obj)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\popen_spawn_win32.py", line 33, in init
from spn.algorithms.LearningWrappers import learn_parametric
File "C:\Users\dedo2\ONEDRI1\Tesi\venv\lib\site-packages\spn\algorithms\LearningWrappers.py", line 9, in
from spn.algorithms.StructureLearning import get_next_operation, learn_structure
File "C:\Users\dedo2\ONEDRI
1\Tesi\venv\lib\site-packages\spn\algorithms\StructureLearning.py", line 33, in
pool = multiprocessing.Pool(processes=cpus)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 119, in Pool
prep_data = spawn.get_preparation_data(process_obj._name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 143, in get_preparation_data
prep_data = spawn.get_preparation_data(process_obj._name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 143, in get_preparation_data
prep_data = spawn.get_preparation_data(process_obj._name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 143, in get_preparation_data
context=self.get_context())
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 176, in init
_check_not_importing_main()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
_check_not_importing_main()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
_check_not_importing_main()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
self._repopulate_pool()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 241, in _repopulate_pool
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
w.start()

File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 143, in get_preparation_data
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
is not going to be frozen to produce an executable.''')

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
_check_not_importing_main()

File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "", line 1, in
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\main.py", line 2, in
from StackSPN import StackSPN
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\StackSPN.py", line 1, in
from spn.algorithms.LearningWrappers import learn_parametric
File "C:\Users\dedo2\ONEDRI1\Tesi\venv\lib\site-packages\spn\algorithms\LearningWrappers.py", line 9, in
from spn.algorithms.StructureLearning import get_next_operation, learn_structure
File "C:\Users\dedo2\ONEDRI
1\Tesi\venv\lib\site-packages\spn\algorithms\StructureLearning.py", line 33, in
pool = multiprocessing.Pool(processes=cpus)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 176, in init
self._repopulate_pool()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 143, in get_preparation_data
Traceback (most recent call last):
File "", line 1, in
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\main.py", line 2, in
from StackSPN import StackSPN
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\StackSPN.py", line 1, in
from spn.algorithms.LearningWrappers import learn_parametric
File "C:\Users\dedo2\ONEDRI1\Tesi\venv\lib\site-packages\spn\algorithms\LearningWrappers.py", line 9, in
from spn.algorithms.StructureLearning import get_next_operation, learn_structure
File "C:\Users\dedo2\ONEDRI
1\Tesi\venv\lib\site-packages\spn\algorithms\StructureLearning.py", line 33, in
pool = multiprocessing.Pool(processes=cpus)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 176, in init
self._repopulate_pool()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
_check_not_importing_main()

File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "main.py", line 4, in
train = np.array(np.genfromtxt('train.csv', delimiter=',')[:, :3],
File "C:\Users\dedo2\ONEDRI1\Tesi\venv\lib\site-packages\numpy\lib\npyio.py", line 1761, in genfromtxt
first_line = _decode_line(next(fhd), encoding)
OSError: [Errno 22] Invalid argument
Traceback (most recent call last):
File "", line 1, in
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\main.py", line 2, in
from StackSPN import StackSPN
File "C:\Users\dedo2\OneDrive - Università degli Studi di Bari\Tesi\SPN\1_Layer_Bernoulli\StackSPN.py", line 1, in
from spn.algorithms.LearningWrappers import learn_parametric
File "C:\Users\dedo2\ONEDRI
1\Tesi\venv\lib\site-packages\spn\algorithms\LearningWrappers.py", line 9, in
from spn.algorithms.StructureLearning import get_next_operation, learn_structure
File "C:\Users\dedo2\ONEDRI~1\Tesi\venv\lib\site-packages\spn\algorithms\StructureLearning.py", line 33, in
pool = multiprocessing.Pool(processes=cpus)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 176, in init
self._repopulate_pool()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "c:\users\dedo2\appdata\local\programs\python\python37\Lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Multivariate Gaussians

Description

Implement Multivariate leaves for SPNs

Expected behavior:

Multivariate gaussians work as expected and can be used in SPNs as leaves.

Migrate from TF1 to TF2

Description

SPFlow uses the old 1.14 version of Tensorflow, but since Tensorflow 2 is already out, SPFlow should make use of the new features and syntax of TF2.

We could alternatively reach a compromise and have support for both versions.

A patch for the Arch Linux package already exists to convert a portion of SPFlow to TF2 runnable code, but since the test coverage is hardly comprehensive and some of the existing tests require datasets that are not present in the repository itself, it is not guaranteed to provide a reliable refactoring.

Expected behavior:

SPFlow should support TF2.

Encountered behavior:

SPFlow does not support TF2.

Additional Information

  • Operating System: Linux
  • Python Version: 3.8

Tensorflow Variables: Initializer and dtype do not match

The generation of a Tensorflow variable from a gaussian node goes wrong if the initializer dtype does not match the dtype that has been passed as option here (and probably also in all other usages of tf.variable(...)).

Exemplary code

from spn.structure.leaves.parametric.Parametric import Gaussian
from spn.structure.leaves.parametric.Tensorflow import gaussian_to_tf_graph
import numpy as np

var = gaussian_to_tf_graph(
    node=Gaussian(mean=0.0, stdev=1.0), dtype=np.float64
)

Results in:

Traceback (most recent call last):
  File "SPFlow/main.py", line 192, in <module>
    test_tf_opt()
  File "SPFlow/main.py", line 165, in test_tf_opt
    node=Gaussian(mean=float(0.0), stdev=float(1.0)), dtype=np.float64
  File "SPFlow/src/spn/structure/leaves/parametric/Tensorflow.py", line 17, in gaussian_to_tf_graph
    mean = tf.get_variable("mean", initializer=node.mean, dtype=dtype)
  File "SPFlow/env/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1487, in get_variable
    aggregation=aggregation)
  File "SPFlow/env/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1214, in get_variable
    "don't match." % (init_dtype, dtype))
ValueError: Initializer type '<dtype: 'float32'>' and explicit dtype '<class 'numpy.float64'>' don't match.

A case where this happens is when you first learn an SPN from data and want to optimize the weights in Tensorflow afterwards:

from spn.algorithms.LearningWrappers import learn_classifier, learn_parametric
from spn.structure.leaves.parametric.Parametric import Categorical, Gaussian
from spn.structure.Base import Context
from spn.gpu.TensorFlow import optimize_tf
from spn.algorithms.Inference import log_likelihood
import numpy as np

# Sample data
train_data = np.array([1.0, 0.0, 1.0], dtype=np.float64).reshape(-1, 3)

# Learn SPN
spn = learn_classifier(
    train_data,
    ds_context=Context(
        parametric_types=[Gaussian, Gaussian, Categorical]
    ).add_domains(train_data),
    spn_learn_wrapper=learn_parametric,
    label_idx=2,
    cpus=-1,
)

# Run weight optimization 
optimized_spn = optimize_tf(spn, train_data) # <-- tf variable generation fails here
lloptimized = log_likelihood(optimized_spn, train_data)
print(lloptimized, np.exp(lloptimized))

A possible fix would be to replace:

tf.get_variable(
    "mean", 
    initializer=node.mean, 
    dtype=dtype
)

with

dtype = np.dtype(dtype).type
tf.get_variable(
    "mean", 
    initializer=dtype(node.mean), 
    dtype=dtype
)

MPE broken for histogram

Calling MPE on a histogram node leads to the following node:

TypeError: histogram_top_down() got an unexpected keyword argument 'lls_per_node'

Warnings in learn_parametric and bad results

Hey!
I am right now trying to learn a spn using a dataset with one categorical label and 20 continuous variables, but I have some problems getting the spn to learn.

I often get warnings like

X scores are null at iteration 0
  warnings.warn('X scores are null at iteration %s' % k)

and

invalid value encountered in true_divide
  c /= stddev[:, None]

when calling learn_parametric / learn_classifier.
After updating from 25988df to HEAD ( a1ce6d3 ) I get even more warnings regarding the "X scores".

The code used is:

ds_context = Context(parametric_types=[Categorical] + [Gaussian] * 20).add_domains(train_data)
pspn = learn_classifier(train_data, ds_context, learn_parametric, 0, min_instances_slice=20)

Additionally the classification results gathered from such a SPN is not very good - but I dont know if this is related to the warnings or if this another problem. (e.g. 72% Acc. compared to 92% with a logistic regression)

What might be the problem, when encountering such issues? Is this a dataset related problem? Or am I using wrong parameters? Or do those messages have no meaning and I can safely ignore them?

I would very much appreciate any help!

EM on non-tree structures returns a broadcast error

Description

Running EM_optimization on a non-tree structure returns a broadcast error. Below is a minimal example for reproducibility:

import spn
from spn.structure.leaves.parametric.Parametric import Gaussian, Bernoulli
import spn.structure.Base as Base
import spn.algorithms.Sampling as Sampling
import numpy as np

SEED = 101

def rand_weights(m = 2, n = 5):
  r = np.random.rand(n, m)
  return (r/r.sum(axis=1)[:,np.newaxis]).tolist()

def gen_spn(true_spn = False):
  if true_spn:
    W = [[0.3, 0.7], [0.6, 0.4], [0.5, 0.5], [0.2, 0.8], [0.45, 0.55]]
    W.reverse()
  else:
    W = rand_weights()

  X_11 = Bernoulli(p=0.2, scope=[0])
  X_12 = Bernoulli(p=0.4, scope=[0])
  X_21 = Bernoulli(p=0.6, scope=[1])
  X_22 = Bernoulli(p=0.8, scope=[1])

  S_0 = Base.Sum(weights=W.pop(), children=[X_11, X_12])
  S_1 = Base.Sum(weights=W.pop(), children=[X_21, X_22])
  S_2 = Base.Sum(weights=W.pop(), children=[X_11, X_12])
  S_3 = Base.Sum(weights=W.pop(), children=[X_21, X_22])

  P_0 = Base.Product(children=[S_0, S_1])
  P_1 = Base.Product(children=[S_2, S_3])

  R = Base.Sum(weights=W.pop(), children=[P_0, P_1])

  Base.assign_ids(R)
  Base.rebuild_scopes_bottom_up(R)

  M = {'X_11': X_11, 'X_12': X_12, 'X_21': X_21, 'X_22': X_22,
       'S_0': S_0, 'S_1': S_1, 'S_2': S_2, 'S_3': S_3,
       'P_0': P_0, 'P_1': P_1,
       'R': R}

  return R, M

def generate_data(S, n=1000):
  return Sampling.sample_instances(S, np.array([np.nan, np.nan] * n).reshape(-1, len(S.scope)),
                                   np.random.mtrand.RandomState(SEED))

TRUE_SPN, TRUE_SPN_MAP = gen_spn(true_spn = True)
D = generate_data(TRUE_SPN)
SPN, SPN_MAP = gen_spn()

def print_spn(S_MAP):
  for k in S_MAP:
    print(k, S_MAP[k].parameters)

def main():
  spn.algorithms.EM.EM_optimization(SPN, D, iterations=10)
  print('True values:')
  print_spn(TRUE_SPN_MAP)
  print('====')
  print_spn(SPN_MAP)

if __name__ == '__main__':
  main()

Through some debugging with pdb, I found that whenever a node has two or more parents, the parent_results array has length greater than one, causing merge_gradients to return an array of size double the expected length. This is what apparently causes the error.

Expected behavior:

Return an SPN with EM optimized weights.

Encountered behavior:

Numpy broadcast error:

Traceback (most recent call last):
  File "minimal.py", line 65, in <module>
    main()
  File "minimal.py", line 58, in main
    spn.algorithms.EM.EM_optimization(SPN, D, iterations=10)
  File "/usr/lib/python3.8/site-packages/spn/algorithms/EM.py", line 61, in EM_optimization
    gradients = gradient_backward(spn, lls_per_node)
  File "/usr/lib/python3.8/site-packages/spn/algorithms/Gradient.py", line 81, in gradient_backward
    eval_spn_top_down(
  File "/usr/lib/python3.8/site-packages/spn/structure/Base.py", line 445, in eval_spn_top_down
    result = func(n, param, **args)
  File "/usr/lib/python3.8/site-packages/spn/algorithms/Gradient.py", line 18, in leaf_gradient_backward
    gradient_result[:, node.id] = gradients
ValueError: could not broadcast input array from shape (2000) into shape (1000)

Additional Information

  • Operating System: Arch Linux 5.5.4 x86_64
  • Python Version: 3.8.1

Structure learning and optimization algs

Hi, I am using the default parameters for learn_paremetric, with univariate Gaussian leaves. Would someone be able to tell me the structure learning and optimization algorithms used? And if possible, the papers that they are from? Thank you in advance if you are able to help.


ds_context = Context(parametric_types=[Gaussian]*M).add_domains(train_batch).

def learn_parametric(
data,
ds_context,
cols="rdc",
rows="kmeans",
min_instances_slice=200,
min_features_slice=1,
multivariate_leaf=False,
threshold=0.3,
ohe=False,
leaves=None,
memory=None,
rand_gen=None,
cpus=-1,
)

Incompability with Tensorflow 2.x.x

Description

Hi, I noticed an incompatibility of SPFlow with the tensorflow 2.x.x library.
Should I create a PR to fix it?

Steps to reproduce

$ pip3 install spflow
$ pip3 install tensorflow
from spn.gpu.TensorFlow import eval_tf

Expected Behavior

No warnings and exception.

Encountered Behavior

  • tensorflow==2.2.0
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-acf9474828af> in <module>
----> 1 from spn.gpu.TensorFlow import eval_tf

~/anaconda3/lib/python3.7/site-packages/spn/gpu/__init__.py in <module>
----> 1 from spn.structure.leaves.parametric.Tensorflow import add_parametric_tensorflow_support
      2 
      3 add_parametric_tensorflow_support()

~/anaconda3/lib/python3.7/site-packages/spn/structure/leaves/parametric/Tensorflow.py in <module>
      7 import tensorflow as tf
      8 
----> 9 from spn.gpu.TensorFlow import add_node_to_tf_graph, add_tf_graph_to_node
     10 from spn.structure.leaves.parametric.Parametric import (
     11     Gaussian,

~/anaconda3/lib/python3.7/site-packages/spn/gpu/TensorFlow.py in <module>
    113     batch_size: int = None,
    114     optimizer: tf.train.Optimizer = None,
--> 115     return_loss=False,
    116 ) -> Union[Tuple[Node, List[float]], Node]:
    117     """

AttributeError: module 'tensorflow._api.v2.train' has no attribute 'Optimizer'

For version 1.14.0 the code runs but throws a warning:

  • tensorflow 1.14.0
WARNING:tensorflow:From /home/queensgambit/anaconda3/lib/python3.7/site-packages/spn/gpu/TensorFlow.py:151: The name tf.train.GradientDescentOptimizer is deprecated. Please use tf.compat.v1.train.GradientDescentOptimizer instead.

Additional Information

  • tensorflow==2.2.0
  • spflow==0.0.40
  • Operating System: Ubuntu 18.04.4 LTS
  • Python Version: Python 3.7.6

MAP inference

Hi, is there any MAP inference available in your implementation?

Learn_parametric with gaussian gives wrong parameters

When I try to learn a spn using learn_parametric (4 Categorical, 4 Gaussian leaves) I often run into the problem, that I get some Gaussian-Leaves that have mean:0, stdev: 1e-8, which causes the SPN to completely break. (Giving percentages like "21561682839640.922")

Is this a bug or is this normal? How could I avoid this from happening?

Thanks in advance!

Save circuit to file

Hi,

Is there a recommended way to save SPNs or CNets to disk in SPFlow? Is pickling the whole circuit safe?

Thanks

Make leaf node parameter constraints flexible

Our current approach to parameter constraints is to apply a fixed set of mappings (e.g. lower-bound is realized with exp(x) + lb, upper bound with -exp(x) + ub, and so on). Although this achieves the goal, it may influence the optimization in an unintended way since the optimization landscape is mapped with the pre-defined functions (exp, sigmoid).

A user may want to specify her/his own way on how to satisfy a constraint. So, instead of having hard-coded mappings, a flexible solution would be to introduce a constraint architecture where one can define constraints, implemented by specific mappings/functions provided by the user. These constraints could then be "registered" to specific parameters or sets thereof.

Provide default implementation in base package

We agreed upon providing the python(-base) implementation as default. Right now, to access the python(-base) implementation we need to use spflow.python.foobar. The goal is to make everything in spflow.python directly importable from spflow itself as to not confuse end-users who have a "first-look" at the library, wondering why there is a sub-module called "python" (also, I think we should rename this base implementation, since everything in spflow is python, or we resolve this by moving everything from spflow.python into spflow, see #99 ).

Before:

from spflow.python.structure import ...

After:

from spflow.structure import ...

This should be solved with proper imports in spflow.__init__.py (as simple as from spflow.python import *?).

Probability of each node

Hi,
First of all, I want to say thank you to the developers of this library.
I am new to SPN models. What I read in the theory is that each node has a probability in the SPN graph.
May I ask how we are representing the probability of each node in the sample example given to create the SPN graph?
Also what does "Categorical(p=[0.2,0.8])" represent exactly?

Please help

Thank you

add_domain in Base.Context does not work with missing values

Description

The add_domains method in the Context class calculates the domains for input data. In the presence of missing data (in this case represented by nan values, as expected by the learn_mspn_with_missing function), the function breaks down due to the implementation of np.min

Expected behavior:

The domain ignores missing value

Encountered behavior:

np.min and np.max return np.nan for an array which contains nan values

Additional Information

This behavior can be changed by using np.nanmin and np.nanmax in the function. This should not introduce a breaking change (to the best of my knowledge).

  • Operating System:
  • Python Version:

Error in learn parametric with cltree and naive_mle

Description

Using the CLTree method with naive_mle and I am getting nodes with -9999 in the depth first order

Expected behavior:

I am thinking about the rule to use when the node is not in the tree if I am interpreting that correctly, I believe there either needs to be error catching or does it get enclose with the same ==-1 type?

Encountered behavior:

learning parametric with cltree and naive_mle throws an error when depth first order gives a return of -9999

Additional Information

  • Operating System: Linux, Windows, replicated on both
  • Python Version: 3.7

Unable to Reproduce Example Visualization

Description

Following the beginning of the README documentation and running

from spn.structure.leaves.parametric.Parametric import Categorical

spn = 0.4 * (Categorical(p=[0.2, 0.8], scope=0) *
             (0.3 * (Categorical(p=[0.3, 0.7], scope=1) *
                     Categorical(p=[0.4, 0.6], scope=2))
            + 0.7 * (Categorical(p=[0.5, 0.5], scope=1) *
                     Categorical(p=[0.6, 0.4], scope=2)))) \
    + 0.6 * (Categorical(p=[0.2, 0.8], scope=0) *
             Categorical(p=[0.3, 0.7], scope=1) *
             Categorical(p=[0.4, 0.6], scope=2))

followed by

from spn.io.Graphics import plot_spn

plot_spn(spn, 'basicspn.png')

results in (an) error(s).

Expected behavior:

Should create the visualization file basicspn.png.

Encountered behavior:

1st error is: FileNotFoundError: [Errno 2] "dot" not found in path.
and can be resolved by running brew install graphviz

2nd error is: TypeError: draw_networkx_edge_labels() got an unexpected keyword argument 'clip_on'
unresolved

Additional Information

  • Operating System: MacOS Big Sur 11.0 Beta
  • Python Version: 3.7

Fix "Optional" Typings

Description

Some Optional typings are unnecessary when the variable is immediately assigned to a value that is not None such as in the following line:

all_results: Optional[Dict[INode, ndarray]] = {}

Here, Optional can be safely removed. Please check for all instances of Optional and adapt accordingly.

Add a regression example in readme.txt?

Describe your request

Here is a classification example. But I don't know how to learn and predict real value of a continue variable after I read the readme.txt. Could you add a regression example please?

Briefly explain its use-case

For beginers to understand the ability of SPN and get started quickly.

Rename base package name to spflow

Complying with the typical python packaging guidelines, we should rename the base package from spn to spflow to make the user experience more consistent:

Before:

pip install spflow
from spn import ...

After:

pip install spflow
from spflow import ...

PIP Requirements broken in Python 3.8.1

Description

When trying to install spflow in a virtual environment, PyQt5 seems to break some dependencies.

Encountered behavior:

Steps to reproduce:

cd /tmp                    

/tmp
❯ virtualenv venv --python /usr/bin/python
Already using interpreter /usr/bin/python
Using base prefix '/usr'
New python executable in /tmp/venv/bin/python
Installing setuptools, pip, wheel...
done.

/tmp
❯ source ./venv/bin/activate

/tmp
❯ pip install spflow
Collecting spflow
  Using cached https://files.pythonhosted.org/packages/3e/12/5339c16a1dba799fb23eff7f9b01b33b51e7d59f151a518610938810530f/spflow-0.0.40-py3-none-any.whl
Requirement already satisfied: matplotlib in /home/tak/.local/lib/python3.8/site-packages (from spflow) (3.1.2)
Collecting scipy==1.2
  Using cached https://files.pythonhosted.org/packages/ea/c8/c296904f2c852c5c129962e6ca4ba467116b08cd5b54b7180b2e77fe06b2/scipy-1.2.0.tar.gz
Collecting ete3>=3.1.1
  Using cached https://files.pythonhosted.org/packages/21/17/3c49b7fafe10ed63bb7904ebf9764b98db726aa5fd482fb006818854bc04/ete3-3.1.1.tar.gz
Collecting arff
  Using cached https://files.pythonhosted.org/packages/50/de/62d4446c5a6e459052c2f2d9490c370ddb6abc0766547b4cef585913598d/arff-0.9.tar.gz
Requirement already satisfied: joblib in /home/tak/.local/lib/python3.8/site-packages (from spflow) (0.14.1)
Requirement already satisfied: sklearn in /home/tak/.local/lib/python3.8/site-packages (from spflow) (0.0)
Collecting sympy
  Using cached https://files.pythonhosted.org/packages/ce/5b/acc12e3c0d0be685601fc2b2d20ed18dc0bf461380e763afc9d0a548deb0/sympy-1.5.1-py2.py3-none-any.whl
Requirement already satisfied: numpy in /home/tak/.local/lib/python3.8/site-packages (from spflow) (1.18.1)
Collecting tqdm
  Using cached https://files.pythonhosted.org/packages/47/55/fd9170ba08a1a64a18a7f8a18f088037316f2a41be04d2fe6ece5a653e8f/tqdm-4.43.0-py2.py3-none-any.whl
Collecting PyQt5==5.9.2
  Using cached https://files.pythonhosted.org/packages/3a/c6/26270f5550f00920045c2f0b222a7d03d7a64382825c68bf0bb1a51d854c/PyQt5-5.9.2-5.9.3-cp35.cp36.cp37-abi3-manylinux1_x86_64.whl
Requirement already satisfied: pytest in /usr/lib/python3.8/site-packages (from spflow) (5.3.5)
Collecting statsmodels
  Using cached https://files.pythonhosted.org/packages/84/91/daae8f782758ebef9d701eb56cb42abdbe89b6245b6002fdaed60b9534aa/statsmodels-0.11.1-cp38-cp38-manylinux1_x86_64.whl
Requirement already satisfied: networkx in /home/tak/.local/lib/python3.8/site-packages (from spflow) (2.4)
Collecting pydot
  Using cached https://files.pythonhosted.org/packages/33/d1/b1479a770f66d962f545c2101630ce1d5592d90cb4f083d38862e93d16d2/pydot-1.4.1-py2.py3-none-any.whl
Collecting lark-parser
  Using cached https://files.pythonhosted.org/packages/f9/78/c2b1381f878ccf85582a37ee54b5c2da7b7ffa855063823d8d1dfd4021d2/lark-parser-0.8.1.tar.gz
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/lib/python3.8/site-packages (from matplotlib->spflow) (2.4.6)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/tak/.local/lib/python3.8/site-packages (from matplotlib->spflow) (1.1.0)
Requirement already satisfied: python-dateutil>=2.1 in /home/tak/.local/lib/python3.8/site-packages (from matplotlib->spflow) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /home/tak/.local/lib/python3.8/site-packages (from matplotlib->spflow) (0.10.0)
Requirement already satisfied: scikit-learn in /home/tak/.local/lib/python3.8/site-packages (from sklearn->spflow) (0.22.1)
Collecting mpmath>=0.19
  Using cached https://files.pythonhosted.org/packages/ca/63/3384ebb3b51af9610086b23ea976e6d27d6d97bf140a76a365bd77a3eb32/mpmath-1.1.0.tar.gz
ERROR: Could not find a version that satisfies the requirement sip<4.20,>=4.19.4 (from PyQt5==5.9.2->spflow) (from versions: 5.0.0, 5.0.1, 5.1.0, 5.1.1)
ERROR: No matching distribution found for sip<4.20,>=4.19.4 (from PyQt5==5.9.2->spflow)

Additional Information

  • Operating System: Linux xps 5.5.7-arch1-1 #1 SMP PREEMPT Sat, 29 Feb 2020 19:06:02 +0000 x86_64 GNU/Linux
  • Python Version: 3.8.1

tf.distributions deprecated: move to tfp.distributions

Description

tf.distribution has been moved to tfp.distributions, as this warning shows:

`WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/spn/experiments/RandomSPNs/RAT_SPN.py:120: Normal.__init__ (from tensorflow.python.ops.distributions.normal) is deprecated and will be removed after 2019-01-01.

Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use "tfp.distributions" instead of "tf.distributions".`

I think you would only update the imports

Implement inference as recursive graph evaluation with memoization

Description

The current inference pass (eval_spn_bottom_up) first sorts the DAG in a topological order to obtain a list and then evaluates nodes in a linear fashion. This is unnecessarily complicated, especially when we have modules of modules and more complex structures. Instead, we should evaluate modules by traversing the DAG from the root node/module using some memoization cache that is passed along the evaluation function.

Get SPN size

Description

I want to compare SPN with other methods such as machine learning models. For models I can calculate the size of the model by # parameter x parameter_size (4 byte for example). Is there any convenient way to get SPN's size properly?

Expected behavior:

Encountered behavior:

Additional Information

  • Operating System:
  • Python Version:

Bug in algorithms/Condition.py

Description

Hi :), there is a bug in "algorithms/Condition.py". Conditioning does not work, if the complete scope of a product node needs to be pruned. The following code fails:


import numpy as np
from spn.structure.leaves.parametric.Parametric import Categorical
from spn.structure.Base import assign_ids, rebuild_scopes_bottom_up
from spn.algorithms import Condition

sub_spn1 = Categorical(p=[0.5, 0.5], scope=0) * Categorical(p=[0.5, 0.5], scope=1)
sub_spn2 = Categorical(p=[0.5, 0.5], scope=0) * Categorical(p=[0.5, 0.5], scope=1)
sub_spn3 = 0.5 * sub_spn1 + 0.5 * sub_spn2
sub_spn4 = Categorical(p=[0.5, 0.5], scope=2) * sub_spn3
sub_spn5 = Categorical(p=[0.5, 0.5], scope=0) * Categorical(p=[0.5, 0.5], scope=1) * Categorical(p=[0.5, 0.5], scope=2)
spn = 0.5 * sub_spn4 + 0.5 * sub_spn5

assign_ids(spn)
rebuild_scopes_bottom_up(spn)

evidence = np.array([[0, 0, np.nan]])
c_spn = Condition.condition(spn, evidence)

from spn.io.Text import spn_to_str_ref_graph
print(spn_to_str_ref_graph(c_spn))

Expected behavior:


SumNode_0 SumNode(0.5*CategoricalNode_1, 0.5*CategoricalNode_2){
	CategoricalNode_1 Categorical(V2|p=[0.5, 0.5])
	CategoricalNode_2 Categorical(V2|p=[0.5, 0.5])
	}

Encountered behavior:


C:\Users\Moritz\Miniconda3\envs\spn\lib\site-packages\spn\algorithms\Condition.py:46: RuntimeWarning: divide by zero encountered in log
  return None, np.log(sum(probs))
Traceback (most recent call last):
  File "C:/pyCharm/NSS/src/_other/test_condition.py", line 19, in 
    c_spn = Condition.condition(spn, evidence)
  File "C:\Users\Moritz\Miniconda3\envs\spn\lib\site-packages\spn\algorithms\Condition.py", line 65, in condition
    return Prune(new_root)
  File "C:\Users\Moritz\Miniconda3\envs\spn\lib\site-packages\spn\algorithms\TransformStructure.py", line 39, in Prune
    assert v, err
AssertionError: node 4 has no scope

Possible Solution

From my point of view, the problem can be solved by updating the code for "prod_condition(...)" to the following:


def prod_condition(node, children, input_vals=None, scope=None):
    if not scope.intersection(node.scope):
        return Copy(node), 0
    new_node = Product()
    new_node.scope = list(set(node.scope) - scope)
    probability = 0

    for c in children:
        if c[0]:
            new_node.children.append(c[0])
        probability += float(c[1])

    if len(new_node.children) == 0:
        return None, probability

    return new_node, probability

Additional Information

Moreover, it might be handy to have a version of "Condition" which does not use the logarithm to be able to also work with 0 probability.

  • Operating System: Windows
  • Python Version: 3.7

Error in plot_spn

Description

When I tried to print the network with the function plot_spn using the following piece of code,

# Your code here
from spn.structure.leaves.parametric.Parametric import Categorical
from spn.io.Graphics import plot_spn

spn = 0.4 * (Categorical(p=[0.2, 0.8], scope=0) *
             (0.3 * (Categorical(p=[0.3, 0.7], scope=1) *
                     Categorical(p=[0.4, 0.6], scope=2))
            + 0.7 * (Categorical(p=[0.5, 0.5], scope=1) *
                     Categorical(p=[0.6, 0.4], scope=2)))) \
    + 0.6 * (Categorical(p=[0.2, 0.8], scope=0) *
             Categorical(p=[0.3, 0.7], scope=1) *
             Categorical(p=[0.4, 0.6], scope=2))
plot_spn(spn, 'test.png')

I got the following error

Traceback (most recent call last):
  File "/spflow/lib/python3.7/site-packages/pydot.py", line 1915, in create
    working_dir=tmp_dir,
  File "/spflow/lib/python3.7/site-packages/pydot.py", line 136, in call_graphviz
    **kwargs
  File "/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'dot': 'dot'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "spn_codes.py", line 37, in <module>
    plot_spn(spn)
  File "/spflow/lib/python3.7/site-packages/spn/io/Graphics.py", line 60, in plot_spn
    pos = graphviz_layout(g, prog="dot")
  File "/spflow/lib/python3.7/site-packages/networkx/drawing/nx_pydot.py", line 268, in graphviz_layout
    return pydot_layout(G=G, prog=prog, root=root)
  File "/spflow/lib/python3.7/site-packages/networkx/drawing/nx_pydot.py", line 316, in pydot_layout
    D_bytes = P.create_dot(prog=prog)
  File "/spflow/lib/python3.7/site-packages/pydot.py", line 1723, in new_method
    format=f, prog=prog, encoding=encoding)
  File "/spflow/lib/python3.7/site-packages/pydot.py", line 1922, in create
    raise OSError(*args)
FileNotFoundError: [Errno 2] "dot" not found in path.

Additional Information

  • Operating System: Ubuntu 18.04.4
  • Python Version: Python 3.7.3

Thanks for your consideration.

MPE bottom-up mixing log-likelihoods with likelihoods

I probably found an issue regarding the MPE code.
It looks like log likelihoods (e.g. sum and product) are mixed with "normal" likelihoods in the bottom-up pass. (At least for categorical and gaussian leaves)
This leads to an undefined behavior when using MPE e.g. to classify data.

I also prepared a fix that should correct the problem at least for categorical and gaussian leaves.
Any thoughts on this?

Parameter Learning given an SPN structure

Hello,

I am interested in using the parameter learning functionality of SPFlow. I would like to be able to learn the optimum set of parameters given an SPN structure created manually.

For instance, if I create the following SPN using SPFlow:

p0 = Product(children=[Categorical(p=[1,0], scope=1), Categorical(p=[1,0], scope=0)])
p1 = Product(children=[Categorical(p=[1,0], scope=1), Categorical(p=[1,0], scope=0)])
spn = Sum(weights=[0.1, 0.9], children=[p0, p1])

assign_ids(spn)
rebuild_scopes_bottom_up(spn)

I noticed that I can update the parameters of the parametric leaves using MLE for a given dataset in the following way:

leaf_nodes = get_nodes_by_type(spn, Leaf)
for leaf in leaf_nodes:
    column_index = leaf.scope[0]
    d = data[:, column_index].reshape(-1, 1)
    update_parametric_parameters_mle(leaf, d)

However, I am not sure how I can find the optimal weights for every sum node (either using EM or Gradient Descent). Is this supported by SPFlow? If so, how?

Thank you

  • Operating System: Ubuntu 20.04.1 LTS
  • Python Version:3.7.5

basics.py classification fails with IndexError: index 1 is out of bounds for axis 1 with size 1

While running the basics.py classification example I get the following error:

Traceback (most recent call last):
  File "basics.py", line 226, in <module>
    classification()
  File "basics.py", line 143, in classification
    learn_parametric, 2)
  File "spflow/src/spn/algorithms/LearningWrappers.py", line 25, in learn_classifier
    branch = spn_learn_wrapper(data[data[:, label_idx] == label, :], ds_context, cpus=cpus, rand_gen=rand_gen)
  File "spflow/src/spn/algorithms/LearningWrappers.py", line 117, in learn_parametric
    return learn_param(data, ds_context, cols, rows, min_instances_slice, threshold, ohe)
  File "spflow/src/spn/algorithms/LearningWrappers.py", line 112, in learn_param
    return learn_structure(data, ds_context, split_rows, split_cols, leaves, nextop)
  File "spflow/src/spn/algorithms/StructureLearning.py", line 208, in learn_structure
    child_data_slice = data_slicer(data_slice, scope_slice, num_conditional_cols)
  File "spflow/src/spn/algorithms/StructureLearning.py", line 86, in default_slicer
    return data[:, cols[0]].reshape((-1, 1))
IndexError: index 1 is out of bounds for axis 1 with size 1

Is the example outdated or is this a fault in the learning code?

Thanks in advance!

str_to_spn breaks when the spn contains CLTREE

Description

I am able to turn an SPN into a string (for storage in my mongo db), but when I try to turn it back into an SPN the code breaks.

Expected behavior:

I would expect it to recreate the original SPN

Encountered behavior:

It crashes with the error: "No terminal defined for 'C' at line 1 col 112" when it reaches the C in "481481481481484*(((0.4195804195804196*((CLTREE(V0,V1,V2,V4,V5,V6,V7,V8,V9,V11,V1...."

Additional Information

Specific functions being used:
from spn.io.Text import spn_to_str_equation, str_to_spn

I can get the str_to_spn() to work on a simple spn.

I need to store the spn somehow. If there is an better, alternative solution to the way I am trying to store it, I'm willing to try it.

  • Operating System: Windows 10
  • Python Version: 3.7

learn_mspn_with_missing throws error with no leaves parameter specified

Description

While trying to train an mspn from data with missing values, the error occurred in the first call to the local subfunction l_mspn... Even when specifying the parameter leaves for the outer function, scoping does not seem to wor for the inner function.

Encountered behavior:

The scoping of leaves seems to be wrong:

.../site-packages/numpy/core/fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Traceback (most recent call last):
  File "main.py", line 22, in <module>
    ds_context)
  File ".../site-packages/spn/algorithms/LearningWrappers.py", line 109, in learn_mspn_with_missing
    return l_mspn_missing(data, ds_context, cols, rows, min_instances_slice, threshold, linear, ohe)
  File ".../site-packages/spn/algorithms/LearningWrappers.py", line 99, in l_mspn_missing
    if leaves is None:
UnboundLocalError: local variable 'leaves' referenced before assignment

Additional Information

  • Operating System: linux arch
  • Python Version: 3.7

Rename python submodule to something more appropriate

I think the name python of the spflow.python submodule is not a good choice. Everything in spflow is python based, i.e. we don't have any c/c++/other extension languages. We rather wanted to indicate that spflow.python contains the "naive"/"base" implementation that doesn't make any use of optimized libraries such as tensorflow, pytorch, or jax.

One option would be to rename this submodule to something more appropriate. I'm open to suggestions, some ideas could be:

  • base
  • naive
  • plain

Another option could be to simply move everything in spflow.python into spflow itself. What do you think? @pdeibert @bewit @HuyDC.

`requirements.txt` not complete

What did you expect to happen?

The requirements.txt should contain all the requirements needed for SPFlow. After installing all dependencies listed there were still some requirements missing.

What actually happened?

I couldn't run the code due to missing requirements.

Describe your attempts to resolve the issue

The requirements I found missing are:

  • rpy2
  • MulticoreTSNE

Adding them to the requirements.txt should do the job. Or is there a specific reason why these aren't included?

Steps to reproduce

pip install -r requirements.txt

System Information

Python version: 3.9
SPFlow version: 0.0.41
Operating system: Darwin MacBook Pro 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64

Installed Python Packages

See requirements.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.