Git Product home page Git Product logo

smarttensors / nmfk.jl Goto Github PK

View Code? Open in Web Editor NEW
11.0 3.0 2.0 86.36 MB

Nonnegative Matrix Factorization + k-means clustering and physics constraints for Unsupervised and Physics-Informed Machine Learning

Home Page: https://smarttensors.github.io

License: GNU General Public License v3.0

Julia 2.01% CSS 0.06% JavaScript 1.09% HTML 58.96% Jupyter Notebook 37.89%
machine-learning feature-extraction blind-source-separation unsupervised-machine-learning source-identification physics-informed-learning julia scientific-machine-learning scientific-computing

nmfk.jl's Introduction

NMFk: Nonnegative Matrix Factorization + k-means clustering and physics constraints

nmfk

NMFk is a module of the SmartTensors ML framework (smarttensors.com).

SmartTensors

NMFk is a novel unsupervised machine learning methodology that allows for the automatic identification of the optimal number of features (signals/signatures) present in the data.

Classical NMF approaches do not allow for automatic estimation of the number of features.

NMFk estimates the number of features k through k-means clustering coupled with regularization constraints (sparsity, physical, mathematical, etc.).

SmartTensors can be applied to perform:

  • Feature extraction (FE)
  • Blind source separation (BSS)
  • Detection of disruptions/anomalies
  • Data gap discovery
  • Data gap filling and reconstruction
  • Image recognition
  • Text mining
  • Data classification
  • Separation (deconstruction) of co-occurring (physics) processes
  • Discovery of unknown dependencies and phenomena
  • Development of reduced-order/surrogate models
  • Identification of dependencies between model inputs and outputs
  • Guiding the development of physics models representing the ML-analyzed data
  • Blind predictions
  • Optimization of data acquisition (optimal experimental design)
  • Labeling of datasets for supervised ML analyses

NMFk provides high-performance computing capabilities to solve problems in parallel using Shared and Distributed Arrays. The parallelization allows for the utilization of multi-core / multi-processor environments. GPU and TPU accelerations are available through existing Julia packages.

NMFk provides advanced tools for data visualization, pre- and post-processing. These tools substantially facilitate the utilization of the package in various real-world applications.

NMFk methodology and applications are discussed in the research papers and presentations listed below.

NMFk is demonstrated with a series of examples and test problems provided here.

Awards

SmartTensors and NMFk were recently awarded:

R&D100

Installation

After starting Julia, execute:

import Pkg
Pkg.add("NMFk")

to access the latest released version.

To utilize the latest code updates (commits), use:

import Pkg
Pkg.add(Pkg.PackageSpec(name="NMFk", rev="master"))

Docker

docker run --interactive --tty montyvesselinov/tensors

The docker image provides access to all SmartTensors packages (smarttensors.github.io).

Testing

import Pkg
Pkg.test("NMFk")

Examples

A simple problem demonstrating NMFk can be executed as follows. First, generate 3 random signals in a matrix W:

a = rand(15)
b = rand(15)
c = rand(15)
W = [a b c]

Then, mix the signals to produce a data matrix X of 5 sensors observing the mixed signals as follows:

X = [a+c*3 a*10+b b b*5+c a+b*2+c*5]

This is equivalent to generating a mixing matrix H and obtaining X by multiplying W and H

H = [1 10 0 0 1; 0 1 1 5 2; 3 0 0 1 5]
X = W * H

After that, execute NMFk to estimate the number of unknown mixed signals based only on the information in X.

import NMFk
We, He, fitquality, robustness, aic, kopt = NMFk.execute(X, 2:5; save=false, method=:simple);

The execution will produce output like this:

[ Info: Results
Signals:  2 Fit:       15.489 Silhouette:    0.9980145 AIC:    -38.30184
Signals:  3 Fit: 3.452203e-07 Silhouette:    0.8540085 AIC:    -1319.743
Signals:  4 Fit: 8.503988e-07 Silhouette:   -0.5775127 AIC:    -1212.129
Signals:  5 Fit: 2.598571e-05 Silhouette:   -0.6757581 AIC:    -915.6589
[ Info: Optimal solution: 3 signals

The code returns the estimated optimal number of signals kopt, which in this case, as expected, is equal to 3.

The code returns the fitquality and robustness; they can applied to represent how the solutions change with the increase of k:

NMFk.plot_signal_selecton(2:5, fitquality, robustness)
signal_selection

The code also returns estimates of matrices W and H.

It can be easily verified that estimated We[kopt] and He[kopt] are scaled versions of the original W and H matrices.

Note that the order of columns ('signals') in W and We[kopt] are not expected to match. The order of rows ('sensors') in H and He[kopt] are also not expected to match. The estimated orders will be different every time the code is executed.

The matrices can be visualized using:

import Pkg; Pkg.add("Mads")
import Mads
Mads.plotseries([a b c])
Mads.plotseries(We[kopt] ./ maximum(We[kopt]))
signals_original
signals_reconstructed
NMFk.plotmatrix(H)
NMFk.plotmatrix(He[kopt] ./ maximum(He[kopt]))
signals_original
signals_reconstructed

More examples can be found in the test, demo, examples, and notebooks directories of the NMFk repository.

Applications:

NMFk has been applied in a wide range of real-world applications. The analyzed datasets include model outputs, experimental laboratory data, and field tests:

  • Climate data and simulations
  • Watershed data and simulations
  • Aquifer simulations
  • Surface-water and Groundwater analyses
  • Material characterization
  • Reactive mixing
  • Molecular dynamics
  • Contaminant transport
  • Induced seismicity
  • Phase separation of co-polymers
  • Oil / Gas extraction from unconventional reservoirs
  • Geothermal exploration and production
  • Geologic carbon storage
  • Wildfires

Videos:

  • Progress of nonnegative matrix factorization process:
nmfk-example

More videos are available at YouTube

Notebooks:

A series of Jupyter notebooks demonstrating NMFk have been developed:

The notebooks can also be accessed using:

NMFk.notebooks()

Other Examples:

Patent:

Alexandrov, B.S., Vesselinov, V.V., Alexandrov, L.B., Stanev, V., Iliev, F.L., Source identification by non-negative matrix factorization combined with semi-supervised clustering, US20180060758A1

Publications:

  • Vesselinov, V.V., Mudunuru, M., Karra, S., O'Malley, D., Alexandrov, B.S., Unsupervised Machine Learning Based on Non-Negative Tensor Factorization for Analyzing Reactive-Mixing, 10.1016/j.jcp.2019.05.039, Journal of Computational Physics, 2019. PDF
  • Vesselinov, V.V., Alexandrov, B.S., O'Malley, D., Nonnegative Tensor Factorization for Contaminant Source Identification, Journal of Contaminant Hydrology, 10.1016/j.jconhyd.2018.11.010, 2018. PDF
  • O'Malley, D., Vesselinov, V.V., Alexandrov, B.S., Alexandrov, L.B., Nonnegative/binary matrix factorization with a D-Wave quantum annealer, PlosOne, 10.1371/journal.pone.0206653, 2018. PDF
  • Stanev, V., Vesselinov, V.V., Kusne, A.G., Antoszewski, G., Takeuchi, I., Alexandrov, B.A., Unsupervised Phase Mapping of X-ray Diffraction Data by Nonnegative Matrix Factorization Integrated with Custom Clustering, Nature Computational Materials, 10.1038/s41524-018-0099-2, 2018. PDF
  • Iliev, F.L., Stanev, V.G., Vesselinov, V.V., Alexandrov, B.S., Nonnegative Matrix Factorization for identification of unknown number of sources emitting delayed signals PLoS ONE, 10.1371/journal.pone.0193974. 2018. PDF
  • Stanev, V.G., Iliev, F.L., Hansen, S.K., Vesselinov, V.V., Alexandrov, B.S., Identification of the release sources in advection-diffusion system by machine learning combined with Green function inverse method, Applied Mathematical Modelling, 10.1016/j.apm.2018.03.006, 2018. PDF
  • Vesselinov, V.V., O'Malley, D., Alexandrov, B.S., Contaminant source identification using semi-supervised machine learning, Journal of Contaminant Hydrology, 10.1016/j.jconhyd.2017.11.002, 2017. PDF
  • Alexandrov, B., Vesselinov, V.V., Blind source separation for groundwater level analysis based on non-negative matrix factorization, Water Resources Research, 10.1002/2013WR015037, 2014. PDF

Research papers are also available at Google Scholar, ResearchGate and Academia.edu

Presentations:

  • Vesselinov, V.V., Physics-Informed Machine Learning Methods for Data Analytics and Model Diagnostics, M3 NASA DRIVE Workshop, Los Alamos, 2019. PDF
  • Vesselinov, V.V., Unsupervised Machine Learning Methods for Feature Extraction, New Mexico Big Data & Analytics Summit, Albuquerque, 2019. PDF
  • Vesselinov, V.V., Novel Unsupervised Machine Learning Methods for Data Analytics and Model Diagnostics, Machine Learning in Solid Earth Geoscience, Santa Fe, 2019. PDF
  • Vesselinov, V.V., Novel Machine Learning Methods for Extraction of Features Characterizing Datasets and Models, AGU Fall meeting, Washington D.C., 2018. PDF
  • Vesselinov, V.V., Novel Machine Learning Methods for Extraction of Features Characterizing Complex Datasets and Models, Recent Advances in Machine Learning and Computational Methods for Geoscience, Institute for Mathematics and its Applications, University of Minnesota, 10.13140/RG.2.2.16024.03848, 2018. PDF
  • Vesselinov, V.V., Mudunuru. M., Karra, S., O'Malley, D., Alexandrov, Unsupervised Machine Learning Based on Non-negative Tensor Factorization for Analysis of Filed Data and Simulation Outputs, Computational Methods in Water Resources (CMWR), Saint-Malo, France, 10.13140/RG.2.2.27777.92005, 2018. PDF
  • O'Malley, D., Vesselinov, V.V., Alexandrov, B.S., Alexandrov, L.B., Nonnegative/binary matrix factorization with a D-Wave quantum annealer PDF
  • Vesselinov, V.V., Alexandrov, B.A, Model-free Source Identification, AGU Fall Meeting, San Francisco, CA, 2014. PDF

Presentations are also available at slideshare.net, ResearchGate and Academia.edu

Extra information

For more information, visit monty.gitlab.io, http://smarttensors.com [smarttensors.github.io],(https://smarttensors.github.io), and tensors.lanl.gov.

nmfk.jl's People

Contributors

boianalexandrov avatar jsybrandt avatar montyvesselinov avatar omalled avatar sevarvv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nmfk.jl's Issues

Mads dependency

Mads is only used for plotting in examples and notebooks. It brings with it a huge set of dependencies (>400!), the vast majority of which aren't needed for just computing the NMF model (which is all I need).

Julia doesn't have Rust's notion of package "features", which would have been ideal here :-(

A workaround would be to split the computation code into a separate ComputeNMFk package and make it a dependency of NMFk (and re-export everything from it, for 100% backward compatibility with the current implementation).

This would allow importing just the ComputeNMFk package if the notebook/display functionality isn't needed.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.