photrek / nonlinear-statistical-coupling Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
The Coupled Power Function is a high priority as it is needed for the Weighted Generalized Mean function.
See equation 3.15 of the Reduced Perplexity book chapter for the specification of the coupled power. I will also give some consideration as to whether the coupled power can be defined in terms of the coupled logarithms and coupled exponential functions.
There are special cases of the coupled entropy function which reduce by either:
These special cases should be identified and solved analytically within the coupled entropy function rather than as separate functions. Its okay for the special functions which have already been developed to be sub functions of the coupled entropy.
To enable processing of CIFAR10 with the Coupled VAE a variety of enhancements are required. One critical one is to incorporate supervised learning with labels for the 10 classes of the CIFAR images. The labeling will allow each class to be trained with its own Coupled VAE latent layer. This is one step toward reducing the complexity of the dataset. Other steps will be required but let's complete this step first.
Review carefully the Boenninghoff paper on Student t Mixture Model VAE and related papers particularly the following reference: J. Domke and D. Sheldon, “Importance Weighting and Variational Inference,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2018.
Determine and specify what changes are needed in both the architecture of the VAE and the cost functions of the VAE.
Collaborate with John Clements on a plan to implement this change.
When I use tf.math to calculate kl divergence:
logpz = self.log_normal_pdf(z_sample, 0., 1.)
logqz_x = self.log_normal_pdf(z_sample, mean, logvar)
kl_div = logqz_x - logpz
I get the following numbers, averaging to 0.26684278.
ipdb> kl_div
<tf.Tensor: shape=(128,), dtype=float32, numpy=
array([ 0.86921954, 0.45371056, 0.88609505, -0.01174855, 0.46242428,
0.88558793, 0.9410727 , -0.3700683 , 0.84306717, -0.06638861,
0.37066197, 0.45233607, 0.8508425 , 0.52803755, 0.9505639 ,
-0.3642335 , 0.38080645, 0.9970349 , -0.27189922, 0.97796655,
0.95682573, -0.32304716, -0.813817 , -0.30250955, 0.6030083 ,
0.75281763, -1.2111926 , -0.7194972 , 0.2248416 , -0.5922799 ,
-0.16742158, 0.05214858, -0.53073287, 0.548347 , 0.6337991 ,
-0.40753698, 0.864239 , -1.0780277 , 0.7774732 , 0.6771748 ,
0.80476236, -0.46709728, -1.0554905 , 0.37865567, 0.7497237 ,
0.33856797, 0.81753445, 0.8892932 , 0.3270316 , -1.6759243 ,
0.4765191 , 0.64577174, 0.25702858, 0.26793242, 0.8592057 ,
0.7047727 , 0.9932246 , -1.3861675 , 0.10657287, 0.52103424,
0.56670666, 0.63626647, 0.5903802 , -1.5752082 , 0.23447895,
-2.8028917 , 0.61361504, 0.32030725, 0.77301764, -0.25954676,
-0.19354391, 0.91773224, 0.4544549 , 0.6440444 , 0.9674704 ,
-0.13501692, -0.4141333 , -0.14588952, 0.07112408, 0.96379423,
0.96018887, -0.28566027, 0.45304155, 0.64666224, 0.5147927 ,
0.460351 , -0.42211604, 0.88477373, 0.41102314, 0.663666 ,
0.86534095, 0.9025917 , 0.46783733, 0.47456598, 0.71588826,
0.99136996, 0.08168316, 0.95838964, 0.9762056 , -0.6931119 ,
0.39269042, 0.7297759 , 0.70975566, 0.9976181 , 0.1633246 ,
0.8174796 , 0.9928291 , 0.6770606 , 0.64429426, -0.60490847,
0.63297606, -0.11669183, 0.78825736, 0.90766 , -0.7307825 ,
0.9038205 , 0.05003333, 0.89798975, 0.58409166, -0.593915 ,
0.1855967 , -0.3870778 , 0.96894336, 0.33893538, -0.7414057 ,
0.6335244 , -2.4104114 , 0.20711422], dtype=float32)>
ipdb> tf.math.reduce_mean(kl_div)
<tf.Tensor: shape=(), dtype=float32, numpy=0.26684278>
However, when I use coupled_kl_divergence_norm
in the following manner:
x_recons_logits, z_sample, mean, logvar = self.model(x_true)
q_zx = MultivariateCoupledNormal(loc=mean.numpy(), scale=tf.exp(logvar/2).numpy())
p_z = MultivariateCoupledNormal(loc=np.zeros(mean.shape), scale=np.ones(logvar.shape))
kl_div = coupled_kl_divergence_norm(q_zx, p_z, root=False)
kl_div = tf.convert_to_tensor(kl_div, dtype=tf.float32)
I get the following numbers, averaging to just 0.0007430053.
...
...
[[ 1.5007547e-03]],
[[ 1.4787790e-04]],
[[ 4.7172891e-04]],
[[ 1.7600998e-03]],
[[-3.5504821e-05]],
[[ 2.9015096e-04]],
[[ 1.0139990e-03]],
[[ 2.1769719e-03]],
[[ 3.9246224e-04]],
[[ 3.0478922e-04]],
[[ 2.3205217e-03]],
[[ 1.1474675e-03]],
[[ 2.5465735e-04]],
[[ 1.3815672e-03]],
[[ 1.0416418e-03]]], dtype=float32)>
ipdb> tf.math.reduce_mean(kl_div)
<tf.Tensor: shape=(), dtype=float32, numpy=0.0007430053>
Plz advise.
Implement the entropy function for MultivariateCoupledNormal. Use the equivalent CoupledNormal function as a foundation.
See our latest nsc code here.
Instantiation distribution classes do not require passing in hard loc and scale 'tensors' right away, allowing for delayed execution. This allows for speedier runtime during the execution of the model. For example, @tf.function
does not have to be commented out when integrated nsc into a TF-based model.
Each function should have a text summary of the function with a reference to a paper and an equation on which the function is modeled.
The variable which in papers and in the documentation is called "Risk Bias" currently has a variable name kMult. Throughout the NSC and Coupled VAE library this variable needs to be renamed riskBias
Implement the KL-Divergence function for two MultivariateCoupledNormal distributions. Use the equivalent CoupledNormal function as a foundation.
Create a CoupledNormal distribution class that takes in coupling value kappa
and includes the following functions:
def log_prob(x, df, loc, scale):
"""Compute log probability of Student T distribution.
Note that scale can be negative.
Args:
x: Floating-point `Tensor`. Where to compute the log probabilities.
df: Floating-point `Tensor`. The degrees of freedom of the
distribution(s). `df` must contain only positive values.
loc: Floating-point `Tensor`; the location(s) of the distribution(s).
scale: Floating-point `Tensor`; the scale(s) of the distribution(s).
Returns:
A `Tensor` with shape broadcast according to the arguments.
"""
# Writing `y` this way reduces XLA mem copies.
y = (x - loc) * (tf.math.rsqrt(df) / scale)
log_unnormalized_prob = -0.5 * (df + 1.) * log1psquare(y)
log_normalization = (
tf.math.log(tf.abs(scale)) + 0.5 * tf.math.log(df) +
0.5 * np.log(np.pi) + tfp_math.log_gamma_difference(0.5, 0.5 * df))
return log_unnormalized_prob - log_normalization
def log_prob(x, df, loc, scale):
"""Compute log probability of Student T distribution.
Note that scale can be negative.
Args:
x: Floating-point `Tensor`. Where to compute the log probabilities.
df: Floating-point `Tensor`. The degrees of freedom of the
distribution(s). `df` must contain only positive values.
loc: Floating-point `Tensor`; the location(s) of the distribution(s).
scale: Floating-point `Tensor`; the scale(s) of the distribution(s).
Returns:
A `Tensor` with shape broadcast according to the arguments.
"""
# Writing `y` this way reduces XLA mem copies.
y = (x - loc) * (tf.math.rsqrt(df) / scale)
log_unnormalized_prob = -0.5 * (df + 1.) * log1psquare(y)
log_normalization = (
tf.math.log(tf.abs(scale)) + 0.5 * tf.math.log(df) +
0.5 * np.log(np.pi) + tfp_math.log_gamma_difference(0.5, 0.5 * df))
return log_unnormalized_prob - log_normalization
def entropy(df, scale, batch_shape, dtype):
"""Compute entropy of the StudentT distribution.
Args:
df: Floating-point `Tensor`. The degrees of freedom of the
distribution(s). `df` must contain only positive values.
scale: Floating-point `Tensor`; the scale(s) of the distribution(s). Must
contain only positive values.
batch_shape: Floating-point `Tensor` of the batch shape
dtype: Return dtype.
Returns:
A `Tensor` of the entropy for a Student's T with these parameters.
"""
v = tf.ones(batch_shape, dtype=dtype)
u = v * df
return (tf.math.log(tf.abs(scale)) + 0.5 * tf.math.log(df) +
tfp_math.lbeta(u / 2., v / 2.) + 0.5 * (df + 1.) *
(tf.math.digamma(0.5 * (df + 1.)) - tf.math.digamma(0.5 * df)))
Create a MultivariateCoupledNormal distribution class that takes in coupling value kappa
and number of dimensions dim
, and includes the following functions:
log_prob
. See our latest nsc code here.
sample_n
. See our latest nsc code here.
entropy
. See our latest nsc code here.
kl_multcouplednormal_multcouplednormal
. Or the KL-Divergence function for two MultivariateCoupledNormal distributions.
Use the respective CoupledNormal functions as a foundation to build the MultivariateCoupledNormal ones.
from scipy.stats import multivariate_t
produces probability density function, random deviates, etc.
allows non-diagonal sigma
https://colab.research.google.com/drive/1TcIfpMnx95QwUi0ZO3aVns7atYTepTyu?usp=sharing
Implement the log PDF for MultivariateCoupledNormal. Use the equivalent CoupledNormal function as a foundation.
See our latest nsc code here.
Complete the development of Coupled Cross-Entropy, Coupled Entropy, and Coupled Divergence in Python
Use the Mathematica functions in Coupled_Functions.nb as guides for the development
Daniel Svoboda has prototyped the code, see his workspace folder and the file functions.py
The important points are:
Implement the log PDF for CoupledNormal. Use the commented out TFP's Student log_prob
code as a foundation.
See our latest nsc code here.
See original StudentT code here:
def log_prob(x, df, loc, scale):
"""Compute log probability of Student T distribution.
Note that scale can be negative.
Args:
x: Floating-point `Tensor`. Where to compute the log probabilities.
df: Floating-point `Tensor`. The degrees of freedom of the
distribution(s). `df` must contain only positive values.
loc: Floating-point `Tensor`; the location(s) of the distribution(s).
scale: Floating-point `Tensor`; the scale(s) of the distribution(s).
Returns:
A `Tensor` with shape broadcast according to the arguments.
"""
# Writing `y` this way reduces XLA mem copies.
y = (x - loc) * (tf.math.rsqrt(df) / scale)
log_unnormalized_prob = -0.5 * (df + 1.) * log1psquare(y)
log_normalization = (
tf.math.log(tf.abs(scale)) + 0.5 * tf.math.log(df) +
0.5 * np.log(np.pi) + tfp_math.log_gamma_difference(0.5, 0.5 * df))
return log_unnormalized_prob - log_normalization
Support team in the implementation of effective numerical methods for the core Coupled Functions, particularly the current issue regarding integration for the entropy functions.
Provide guidance for Daniel Svoboda in completing a well-written review of applications of Coupled VAEs
Develop a plan for applying the Coupled VAE to a variety of processes (signal, image, and natural language).
Reporting some of the results I talked about with John, Bill and Kevin with regards to negative Kappa values
For -1< kappa <0, there are domain restriction on the input values. When kappa < -1, the shape of the exponential inverts and stop at the x-axis. At kappa = -1, it's a straight line which extends from -inf to 1.
Code to reproduce
X = np.linspace(-10, 5, n_sample*10)
y = {}
fig, ax = plt.subplots(figsize=(8, 12))
ax.axvline(c='black', lw=1)
ax.axhline(c='black', lw=1)
cm = plt.get_cmap('PiYG')
kappa_values = [round(value, 1) for value in np.arange(-2, -0.4, 0.2)]
n = len(kappa_values)
plt.xlim(-10, 10)
plt.ylim(-10, 10)
for kappa in kappa_values:
y[kappa] = nsc.exp(X, kappa)
plt.plot(X, y[kappa], label=kappa)
plt.legend()
plt.show();
I will comment more results in this issue as I continue to find them.
I was looking through the CoupledNormal class and was curious about the sampling method. I did some testing to compare it with the Student's t distribution and it looks like they're not consistent.
From my understanding, a coupled normal has the same distribution as a student's t with df = 1/kappa, provided loc and scale are the same. Empirically when I sample from the CoupledNormal and the numpy implementation of the student's t, the results are inconsistent, and in some cases wildly different.
sample_size = 1000
kappa = 0.4
loc = 0.
scale = 1.
np.random.seed(0)
cn = CoupledNormal(loc=loc, scale=scale, kappa=kappa, alpha=2)
cn_samples = cn.sample_n(n=sample_size)
t_samples = t.rvs(df=1/kappa, loc=loc, scale=scale, size=sample_size)
fig, ax = plt.subplots(figsize=(8, 5))
plt.hist(cn_samples, label='coupled normal', bins=30, alpha=0.5)
plt.hist(t_samples, label='students-t', bins=30, alpha=0.5)
plt.legend()
It seems to be particularly bad when scale > 1 and kappa is small
sample_size = 1000
kappa = 0.1
loc = 0.
scale = 10.
np.random.seed(0)
cn = CoupledNormal(loc=loc, scale=scale, kappa=kappa, alpha=2)
cn_samples = cn.sample_n(n=sample_size)
t_samples = t.rvs(df=1/kappa, loc=loc, scale=scale, size=sample_size)
fig, ax = plt.subplots(figsize=(8, 5))
plt.hist(cn_samples, label='coupled normal', bins=30, alpha=0.5)
plt.hist(t_samples, label='students-t', bins=30, alpha=0.5)
plt.legend()
Here is the list of desired deliverables for the next release of v0.0.4:
prob
and sample_n
functions for both CoupledNormal
and MultivariateCoupledNormal
classes.MultivariateCoupledNormal
class in the reparameterization as well coupled kl_divergence
and coupled_entropy
in the loss function.See the current production version here and the test version here.
The NSC library needs to include a Coupled Product Function. This is a high priority as it forms the foundation for the Generalized Mean function and will be used in other ways regarding the performance evaluation.
The specification is detailed in the mathematical code. The basic structure is exp_k (Total (log_k (input is typically a probability)). The first priority would be to implement a version similar to the mathematical code in which all the inputs have the same dimension. A lower priority would be to allow each input to have a different dimension.
See K. P. Nelson, “A definition of the coupled-product for multivariate coupled-exponentials,” Phys. A Stat. Mech. its Appl., vol. 422, pp. 187–192, Mar. 2015.
The computations work when there is a batch dimension, but we might want to display the loc and scale without a batch dimension for readability.
Initially, I have tried to use the following lambda function in MNC prob:
_normalized_X = lambda x: np.matmul(np.matmul(np.expand_dims(x-self._loc, axis=-2),
_sigma_inv
),
np.expand_dims(x-self._loc, axis=-1)
)
However, when doing so, I get the following error:
X_norm = np.apply_along_axis(_normalized_X, 1, X)
*** ValueError: operands could not be broadcast together with shapes (64,) (64,2)
Therefore, as a work-around, I have to use the following for-loop in order to populate X_norm. Nevertheless, I still believe that this can be done through vectorization, for example, using an alternative func other than np.apply_along_axis
.
Draft a template Copywrite statement for each file to include Copywrite Photrek Date with a second line stating Contributing Programmers and a third line stating Reviewers and Approvers
fdfdbf
We needed a clear plan of how we will demonstrate the capabilities of the Coupled VAE. This issue can address both short-term demonstrations which we complete for the current paper and longer-term interests in applications that potential sponsors would be interested in.
For the short-term:
For longer-term issues
Nsc lib is currently compatible with scalars and numpy arrays. Now we also like to make it compatible for Tensorflow's tf as well in order to use it to perform experiments for the VAE.
In the nsc's tensor branch, sample_n
function is the highest priority as we very likey to use tensor version of sample_n
rather than the numpy version of it.
The Coupled Sum Function is a lower priority as it is not likely to be needed for the Coupled VAE development. Nevertheless, for completeness, this would be nice to have in the library.
There are two ways the function could be developed.
The Coupled Sum arises from the product of coupled exponential functions which in turn results in the coupled sum of the exponents of a resultant. exp_k x * exp_k y = exp_k (x +_k y). The coupled sum is defined as x +_k y = x + y + k x y. However, a more complete version that accounts for the parameters alpha and dimension like the coupled product function does. Thus a better implementation would be as follows.
Coupled_Sum (X) = log_k ( Product_Total (exp_k (x_i) )), where X is an array and x_i are the elements. I'm not showing the alpha and dimension terms but these would be included to complete the expression.
In the current vae code, x_true
is the input images while x_recons_logits
is the output generated images. Both are Tensors. In the tf lib, there is a cross entropy function that calculates these two:
raw_cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(
labels=x_true, logits=x_recons_logits)
However, in our nsc lib, our coupled_cross_entropy function inputs density function p and q. Would it also work if we input in x_true
and x_recons_logits
? Will it be something like:
coupled_cross_entropy(x, x_gen, sample_n)
Although how do we get sample_n?
Implement the entropy function for CoupledNormal. Use the commented out TFP's Student entropy
code as a foundation.
See our latest nsc code here.
See original StudentT code here:
def entropy(df, scale, batch_shape, dtype):
"""Compute entropy of the StudentT distribution.
Args:
df: Floating-point `Tensor`. The degrees of freedom of the
distribution(s). `df` must contain only positive values.
scale: Floating-point `Tensor`; the scale(s) of the distribution(s). Must
contain only positive values.
batch_shape: Floating-point `Tensor` of the batch shape
dtype: Return dtype.
Returns:
A `Tensor` of the entropy for a Student's T with these parameters.
"""
v = tf.ones(batch_shape, dtype=dtype)
u = v * df
return (tf.math.log(tf.abs(scale)) + 0.5 * tf.math.log(df) +
tfp_math.lbeta(u / 2., v / 2.) + 0.5 * (df + 1.) *
(tf.math.digamma(0.5 * (df + 1.)) - tf.math.digamma(0.5 * df)))
Here is the list of desired deliverables for the next release of v0.0.3:
MultivariateCoupledNormal
class or enable CoupledNormal
to compute for n-dimensions.sample_n
function into MultivariateCoupledNormal
distribution class(es). Use inverse transform sampling or RNG?MultivariateCoupledNormal
with the coupled entropy function.MultivariateCoupledNormal
class in the reparameterization as well coupled kl_divergence
and coupled_entropy
in the loss function.See the current production version here and the test version here.
Implement the sampling function for CoupledNormal. Use the commented out TFP's Student sample_n
code as a foundation.
See our latest nsc code here.
See original StudentT code here:
def sample_n(n, df, loc, scale, batch_shape, dtype, seed):
"""Draw n samples from a Student T distribution.
Note that `scale` can be negative or zero.
The sampling method comes from the fact that if:
X ~ Normal(0, 1)
Z ~ Chi2(df)
Y = X / sqrt(Z / df)
then:
Y ~ StudentT(df)
Args:
n: int, number of samples
df: Floating-point `Tensor`. The degrees of freedom of the
distribution(s). `df` must contain only positive values.
loc: Floating-point `Tensor`; the location(s) of the distribution(s).
scale: Floating-point `Tensor`; the scale(s) of the distribution(s). Must
contain only positive values.
batch_shape: Callable to compute batch shape
dtype: Return dtype.
seed: Optional seed for random draw.
Returns:
samples: a `Tensor` with prepended dimensions `n`.
"""
normal_seed, gamma_seed = samplers.split_seed(seed, salt='student_t')
shape = ps.concat([[n], batch_shape], 0)
normal_sample = samplers.normal(shape, dtype=dtype, seed=normal_seed)
df = df * tf.ones(batch_shape, dtype=dtype)
gamma_sample = gamma_lib.random_gamma(
[n], concentration=0.5 * df, rate=0.5, seed=gamma_seed)
samples = normal_sample * tf.math.rsqrt(gamma_sample / df)
return samples * scale + loc
Create a KL-Divergence function that accepts two tfp StudentT distributions. See _kl_normal_normal function from tfp Normal as example.
@kullback_leibler.RegisterKL(Normal, Normal)
def _kl_normal_normal(a, b, name=None):
"""Calculate the batched KL divergence KL(a || b) with a and b Normal.
Args:
a: instance of a Normal distribution object.
b: instance of a Normal distribution object.
name: Name to use for created operations.
Default value: `None` (i.e., `'kl_normal_normal'`).
Returns:
kl_div: Batchwise KL(a || b)
"""
with tf.name_scope(name or 'kl_normal_normal'):
b_scale = tf.convert_to_tensor(b.scale) # We'll read it thrice.
diff_log_scale = tf.math.log(a.scale) - tf.math.log(b_scale)
return (
0.5 * tf.math.squared_difference(a.loc / b_scale, b.loc / b_scale) +
0.5 * tf.math.expm1(2. * diff_log_scale) -
diff_log_scale)
This specifies an alternative method to coding the generalizaed mean. It is lower priority since the straightforward mathematics is already coded.
While the formula is well known and was implemented in this manner for the Mathematica code. The preference would be to utilize the coupled product and coupled power functions. This would then provide a foundation for the function within the coupled algebra context and incorporate the subtleties which arise regarding the variables kappa, alpha, and dimension. It is probably best to have the inputs include an option to specify either the risk_bias = -alpha kappa / (1 + dim kappa) or the alpha, kappa, and dim directly.
The functional specification is given in equation 3.15 of the Reduced Perplexity book chapter.
Given the current issue regarding the limited range convergence for the Coupled VAE investigate two potential resolutions:
Does taking the root of the entropy function give sufficient stability to the quantification of generalized entropy such that the training of machine learning algorithms can converge of a broader range of coupling values. If this is the case plan to priorities use of the root in defining the Coupled Entropy and draft a paper that introduces the importance of this definition for generalized entropy.
Develop a better understand how the incorporation of the generalized cost functions into machine learning affects the gradient descent algorithms used in the training process.
Given the literature on Wasserstein metric for machine learning, determine exactly what the distinction if any is from the generalized mean. If these are closely related, or possibly equal, than clarify how the generalized mean could be used as a cost function rather than the generalized entropy functions.
This function provides numerical efficiency in computing the coupled cross entropy for the coupled Gaussian
Change name to coupled_cross_entropy_coupled_gaussian
Change name of coupled_cross_entropy to coupled_cross_entropy_general.
Create wrapper function which calls the subfunctions coupled_cross_entropy_coupled_gaussian and coupled_cross_entropy_general
The documentation at the head of the function should refer to the appropriate equations.
The same change needs to be completed for the coupled_entropy_norm and the coupled_divergence_norm
Implement the sampling function for MultivariateCoupledNormal. Use the equivalent CoupledNormal function as a foundation.
See our latest nsc code here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.