Git Product home page Git Product logo

topobank's Introduction

TopoBank

A surface metrology cloud database.

Settings

Moved to settings.

User Accounts

The application uses ORCID for user authentication, so you need an ORCID account to use it.

If you need a super user or staff user during development, e.g. for acccessing the admin page, connect to the datbase and set the is_superuser or is_staff flags manually.

Running tests with pytest

You need a PostgreSQL database to run tests.

$ pytest

Or use run configurations in your IDE, e.g. in PyCharm.

Linting with pre-commit hooks

We are testing the code quality in the test pipeline, if your code is not conform with flake8, the pipeline will fail. To prevent you from committing non-conform code, you can install pre-commit. pre-commit runs tests on your code befor the commit. Just install pre-commit with pip or your package manager. Then run:

pre-commit install

Thats all you really need to do!

To run the pre-commit hooks by hand you can run:

pre-commit run

If you want to skip a pre-commit stage, i.e. flake8, run:

SKIP=flake8 pre-commit run

Docker

The full application can be run in Docker containers, for development and production. This also includes the database, message brokers, celery workers and more. It is currently the easiest way to run the full stack.

See the Sphinx documentation how to install docker and how to start the application using docker, for deployment (see chapter "Deploy") or local development (see "Installation on development machine / Starting Topobank in Docker").

Celery

This app comes with Celery.

To run a celery worker:

cd topobank
celery -A topobank.taskapp worker -l info

Please note: For Celery's import magic to work, it is important where the celery commands are run. If you are in the same folder with manage.py, you should be right.

There is a bash script start-celery.sh which also sets some environment variables needed in order to connect to the message broker and to the result backend.

Funding

Development of this project is funded by the European Research Council within Starting Grant 757343.

topobank's People

Contributors

mcrot avatar pastewka avatar ioannisnezis avatar adityapadmanair avatar siddhantsaka avatar renovate[bot] avatar yaelgail avatar sannant avatar sicksmile1 avatar 21121995 avatar jotelha avatar standartkai avatar dependabot[bot] avatar

Stargazers

Nikolaus Schlemm avatar  avatar  avatar  avatar Chris Haberle avatar  avatar  avatar  avatar

Watchers

Lucas Frérot avatar James Cloos avatar  avatar  avatar  avatar  avatar

topobank's Issues

Use breadcrumbs for navigation

User Story

As a user I always want to have breadcrumbs visible on the page in order to navigate more easily and to understand the logic of the site.

Acceptance Criteria

Each template of the site has a breadcrumb line.

Provide initial surface for new users

User Story

As a new user, I want to be able to test and play with the capabilities of the system. For this it would be useful to have a single surface initially available in my database.

Acceptance Criteria

Provide a single surface containing the three topographies of Fig. 5 of:

Jacobs, Tevis D. B., Till Junge, and Lars Pastewka. 2017. “Quantitative Characterization of Surface Topography Using Spectral Analysis.” Surface Topography: Metrology and Properties 5 (1): 013001. https://doi.org/10.1088/2051-672X/aa51f8.

The files are attached here. The respective size is 100µm x 100µm, 10µm x 10µm and 1µm x 1µm. The free form description field of surface and topography should be pre-populated with some information. Surface should contain the reference above.

Data_of_Fig5.zip

After logging in the first time, a user finds an already generated surface with already uploaded topography data and metadata (see above), also the analysis calculations for all automatic functions have already been triggered.

Switch to JSONField for results dictionary

This is a suggestion and up for discussion: The present pickle representation is not searchable and not accessible from a third-party code. However, I don't know what JSONField will do with numpy arrays stored within a dictionary.

Analyses results do not show latest results for topography+function+args

Currently a test (topobank.analysis.tests.test_show_only_last_analysis), which checks that only the latest results for a combination of function+topography+args is shown (not old results), fails.

We have this user story:

As a researcher I only want to see the latest result for the same calculation (i.e. same topography, function, and function arguments) because the latest should be the most correct ones.

Control elements in analysis result view don't function when used alternately

In analyses view, the control elements for switching data series and source topographies on and off
don't work as expected.

Example 1

Two topographies and two data series are shown in one plot (4 results in one plot).

  1. Switch first data series off.
  2. Switch first source topography off.
  3. Switch first data series on again.

Now the first series for the first source is also activated in the plot.

Example 2

Two topographies and two data series are shown in one plot (4 results in one plot).

  1. Switch first data series off.
  2. Switch first source topography off.
  3. Switch first source topography on again.

Again the first series for the first source is also activated in the plot.

Example 2

Login to Web Application using ORCID ID

User Stories

As a site member I want to login and authenticate in order to organize my personal data and to perform calculations on it.

As a service provider we want to have the ORCID identifier for each user because so that we can use it when publishing data in a later version.

As a developer I want to have a login without external dependencies in order to run tests quickly.

Acceptance Criteria

  • user needs an ORCID identifier
  • user can login using his ORCID identifier
  • user can logout
  • for testing there is the possibility to only authenticate against the local test database

Attach ORCID to surfaces and topographies

For every piece of data uploaded or entered, we should store the source in terms of the researcher that created the entry. Ideally, this is the researchers ORCID.

Results only differing in arguments indistinguishable in legend in result card footer

If one analysis function for the same topography is called with different arguments, there
are separate analysis results, e.g. if the height distribution is first called with bins=None,
then with bins=100 and again with bins=10. In the legend of the result plot, there is the same topography name listed three times with the same name:
bildschirmfoto zu 2018-11-02 16-33-35

In general, the user cannot see which one belongs to which arguments.

Idea

We could the meaning of the slider to not only "Source topography" but "Arguments", which comprises the topography and the other arguments. The other arguments could be listed in parentheses
and the header could be "Source topography (+arguments)".

Another idea would be to simply add tooltips with the arguments, so that they appear
when the user moves the mouse over the topography name.

Show task list in UI

User Story

As a user I want to have an overview of the analysis tasks' progress in order to estimate when results are available.

Acceptance Criteria

  • Using the symbol with the stacked progress bars in the top navigation bar, the user sees a list of the latest 10 running tasks with progress (dropdown)
  • At the end of the dropdown is a link to a full task list, including finished results

Download of a surface container format

User Story

As service provider we want to define an extensible surface container format in order to always stick topography data and meta data together.

As user I want to be able to open the container by common OS tools available on my platform.

Acceptance Criteria

All uploaded topographies for a surface can be somehow downloaded in the container format, which looks like this:

  • a ZIP archive
  • included in this archive for surface data are all the datafiles given by the user
  • inside for meta data use a YAML file called "surface.yml" that presently contains a key "datafiles"
    which is a list holding the datafiles given by the user
  • This has to be defined more accurately for surfaces, e.g. there is also meta data for the topographies

Download of analysis data

We should think about download as:

  • Plain text
  • Excel sheet
  • PNG
  • SVG

Maybe:

  • JSON (simple) - this is how the data arrives at the frontend anyway
  • HDF5

Questions:

  • Should this be handled on the backend or frontend side? Backend is probably easier.
  • If on the backend, this would require a rendering mechanism separate from the front-end, likely matplotlib.

Download of analysis results not yet implemented

We should implement download as:

  • Plain text
  • Excel sheet

Maybe in the future:

  • SVG
  • PNG
  • JSON (simple) - this is how the data arrives at the frontend anyway
  • HDF5

Questions:

  • Should this be handled on the backend or frontend side? Backend is probably easier.
  • If on the backend, this would require a rendering mechanism separate from the front-end, likely matplotlib.

Tagging of surfaces and topographies

User story

As a user, I want to be able to filter the relevant topographies or surfaces from a large set of data in the database.

Acceptance criteria

to be defined, e.g. tags as entries in the search box but as labels in other colors?!

Linking of user profile with ORCID

We should add the capability to link ORCIDs to user profile. This would allow us to uniquely identify the researcher that has opened the respective account.

Visually distinguish search bar from search results

Presently, search bar and search results are displayed as Bootstrap "cards" of the same color. We should make clear that those are different entities of the user interface.

My suggestion is to not show the search bar as a card but by using a band of homogeneous background color, like the navbar but using a different color.

Don't allow invalid topography files

User Story

As a service provider I don't want to allow malicious files on the server for security reasons.
As user I want to be informed when I upload an unsupported file file so I can correct it.

Acceptance Criteria

  • user tries to upload a file which is not a topography file or cannot be read
  • user get's an error message, cannot upload the file, has information about the file format detected

Invalid character in sheet name when downloading XLSX

When downloading an XLSX file for the current result, I get an exception "Invalid character : found in sheet title" in my browser from Django.

It seems like that the colon in "RMS height: 19.6 nm" is not accepted.

Suggestion:

We remove all problematic characters when building the sheet (problematic for Excel and LibreOffice). My Libre Office reacts like this when trying to insert a colon manually:
bildschirmfoto zu 2018-11-02 17-11-03
We could implement exactly these requirements. Is it the same for Excel?

Another idea is to generally catch all errors which could happen during data export
such that the user gets a proper error message.

Storing default arguments for analysis functions in database

Currently, when running analyses, the default arguments are not saved in the database.
As example, this is the signature of an analysis function with a default argument:

@analysis_function(automatic=True)
def power_spectrum(topography, window='hann'):
    ....

In the database kwargs={} is stored. If the default argument changes, you cannot know
the argument so easily - okay, later we also want versioning for the functions. For the user interface
(e.g. analyses statuses) it would be easier to always store the default arguments in the database.
So here, the user can see, that a hann window was used.

Does this make sense to always store the actually used arguments, here kwargs={'window': 'hann'}
instead of kwargs={}? Same applies to positional arguments.

Zooming by wheel can be irritating

On analyses view, after selection of topographies and functions, in general many cards with results
are shown. When scrolling down the cards with the mouse wheel, you have to be careful not to move the mouse pointer on a plot because here the wheel is interpreted as change of the zoom scale.
So instead of scrolling, suddenly the zoom factor changes and scrolling stops. This is probably
not expected by new users.

Agree on an interpreter name for topobank in PyCharm IDE

Currently in every commit I have to deselect three files from the .idea directory,
because they contain configuration options which are specific to my environment:

.idea/runConfigurations/runserver.xml
.idea/misc.xml
.idea/topobank.iml

I don't want to check them in now because they currently contain some system-specific settings.
On the other hand it's recommended to check them in order to have a quick start as developer.
I would like to make these settings system-independent, if possible.

One problem (misc.xml) is, that the interpreter defined in PyCharm under

File -> Settings -> Projekt: topobank -> Project Interpreter 
       -> (wheel symbol) -> Show All.. ->  (pencil symbol)

is named different across our machines. On my machine it's named Python 3.7 (topobank), on others it's Python 3.7 or different. I would like to have somewhere "topobank" in the name in order to distinguish it from others.

I suggest to use simply topobank as name for the interpreter, then misc.xml won't change because of that. Then on the next pull all developers have to change the name of the configured interpreter for the project. Okay?

Visualization of surface summary

User Story

As a researcher I want to see at one first glance that the data for a surface is comprised of multiple measurements.

Acceptance Criteria

  • in the surface selection, for every selected surface a graph with the bandwidths of all
    related topographies should be shown
  • each bandwidth bar should be link to the topography details
  • the unit of the bandwidth axis should be automatically chosen in a reasonable way
  • in the detail view of a surface, this graph should also be shown, together with thumbnails
    of the topographies

Notes

Hint from Lars

The bandwidth would simply be pixel size to lateral size. I would use the following (t is our Topography instance from PyCo):

  • Lower bound on length: np.mean(t.size/t.resolution)
  • Upper bound on length: np.mean(t.size)

This is the bandwidth of a measurement. This of course excludes limits to the bandwidth due to instrumental artifacts but I think including those is too much at present.

Limit max number of search results

User story

As a researcher in order to easily compare results I only want to see a small subset of all available results and while narrowing the results the response should appear quickly.

Acceptance Criteria

  • the maximum number of shown results can be configured
  • then on the result page no more results should be shown
  • the total number of results is shown and a hint that the list has been truncated
  • there is a "load more" button, which allows to load more results (same number as the limit), so the list can be completed step-by-step

Change celery backend

Currently, when starting a celery worker, I get the message:
The AMQP result backend is scheduled for deprecation in version 4.0 and removal in version v5.0. Please use RPC backend or a persistent backend.
We need to change the result backend.

Single upload for multiple topographies

User Story

As a user I want to be able to create many topographies from one data file without uploading it many times.

Notes

Somtimes there are a kind of "movie" in one file with many frames. There should be only one upload. We need then a tricky way to efficiently specify meta data for each topography.

Acceptance Criteria

  • on creation of new topographies for a surface, the user first selects a data file
  • then he can select one or more data sources
  • he is asked for further meta data for each of the selected data sources
  • then all topographies are created one after the other

Individual data series should be shown with different dash and symbol

User Story

As a user, I want to be able to identify the data series from the visual presentation of the data.

Acceptance criteria

Topographies are currently distinguished by color, data series should be distinguished by dash of the line and the symbol. Dash and symbol should be shown in control element below the plot.

  • Data series differ by dash and symbol
  • Dash and symbol is also displace next to the control element for the respective data series

Empty topography thumbnail for example file

When uploading the topography file example2.x3p which is included in the test data of
the PyCo package
, there is a division by zero in the PyCo package and the thumbnail image is "empty".

bildschirmfoto zu 2018-10-31 15-44-01

The runtime warnings shown in topobank are:

/home/michael/miniconda3/envs/topobank/lib/python3.7/site-packages/PyCo/Topography/Detrending.py:92: RuntimeWarning: divide by zero encountered in true_divide
  np.array(list(arr.shape)+[1.])/np.array(list(size)+[1.])
/home/michael/miniconda3/envs/topobank/lib/python3.7/site-packages/PyCo/Topography/TopographyDescription.py:494: RuntimeWarning: invalid value encountered in multiply
  return self.parent_topography.array() + h0 + m * x + n * y
/home/michael/miniconda3/envs/topobank/lib/python3.7/site-packages/matplotlib/colors.py:916: UserWarning: Warning: converting a masked element to nan.
  dtype = np.min_scalar_type(value)
/home/michael/miniconda3/envs/topobank/lib/python3.7/site-packages/numpy/ma/core.py:715: UserWarning: Warning: converting a masked element to nan.
  data = np.array(a, copy=False, subok=subok)

Store topographies in flexible backend

User Story

As provider of the web service we want to store all the topography data in order to be able to do machine learning on it or publish the data someday.

As researcher I want to collect my topography data at one place in order to (re)perform calculations on different topographies easily.

As provider we want to be able to switch the backend later in order not to be dependent on one provider (e.g. SciServer).

Acceptance Criteria

  • a user can upload his topographies, logout, login again later and see details on already uploaded data
  • we as provider have the possiblity to use all topography data uploaded by the users
  • the backend is used via a common interface so that it could be theoretically exchanged

The first supported backend is the local filesystem. It should be possible to extend the application later by saving the data in the sciserver infrastructure (volumes).

Should the user be able to switch backend systems? How are the files moved then?
Some more description work needed here.

Cannot add topographies once the surface detail page after creating the surface was left

After implementing #29 , the user cannot add topographies later after closing the surface detail view
on first sight. Empty surfaces can be selected, but since there is no card shown any more,
there is also no "Add Topography" button. So the current navigation is incomplete.

Maybe we should separate between the selection of surfaces/topographies and the presentation
of a list with all surfaces.

Ideas:

What do you think? Do you have another idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.