contactengineering / topobank Goto Github PK

View Code? Open in Web Editor NEW

8.0 6.0 3.0 44.69 MB

Create, visualize, analyze, share, and publish digital surface twins

Home Page: https://contact.engineering

License: MIT License

Python 99.98% AGS Script 0.02%

topobank's Introduction

TopoBank

A surface metrology cloud database.

Settings

Moved to settings.

User Accounts

The application uses ORCID for user authentication, so you need an ORCID account to use it.

If you need a super user or staff user during development, e.g. for acccessing the admin page, connect to the datbase and set the is_superuser or is_staff flags manually.

Running tests with pytest

You need a PostgreSQL database to run tests.

$ pytest

Or use run configurations in your IDE, e.g. in PyCharm.

Linting with pre-commit hooks

We are testing the code quality in the test pipeline, if your code is not conform with flake8, the pipeline will fail. To prevent you from committing non-conform code, you can install pre-commit. pre-commit runs tests on your code befor the commit. Just install pre-commit with pip or your package manager. Then run:

pre-commit install

Thats all you really need to do!

To run the pre-commit hooks by hand you can run:

pre-commit run

If you want to skip a pre-commit stage, i.e. flake8, run:

SKIP=flake8 pre-commit run

Docker

The full application can be run in Docker containers, for development and production. This also includes the database, message brokers, celery workers and more. It is currently the easiest way to run the full stack.

See the Sphinx documentation how to install docker and how to start the application using docker, for deployment (see chapter "Deploy") or local development (see "Installation on development machine / Starting Topobank in Docker").

Celery

This app comes with Celery.

To run a celery worker:

cd topobank
celery -A topobank.taskapp worker -l info

Please note: For Celery's import magic to work, it is important where the celery commands are run. If you are in the same folder with manage.py, you should be right.

There is a bash script start-celery.sh which also sets some environment variables needed in order to connect to the message broker and to the result backend.

Funding

Development of this project is funded by the European Research Council within Starting Grant 757343.

topobank's People

Contributors

Stargazers

Watchers

Forkers

yaelgail dwaipayan-r-c deyh2020

topobank's Issues

Use breadcrumbs for navigation

User Story

As a user I always want to have breadcrumbs visible on the page in order to navigate more easily and to understand the logic of the site.

Acceptance Criteria

Each template of the site has a breadcrumb line.

Axes in thumbnails not legible

Don't show axes in thumbnails

Provide initial surface for new users

User Story

As a new user, I want to be able to test and play with the capabilities of the system. For this it would be useful to have a single surface initially available in my database.

Acceptance Criteria

Provide a single surface containing the three topographies of Fig. 5 of:

Jacobs, Tevis D. B., Till Junge, and Lars Pastewka. 2017. “Quantitative Characterization of Surface Topography Using Spectral Analysis.” Surface Topography: Metrology and Properties 5 (1): 013001. https://doi.org/10.1088/2051-672X/aa51f8.

The files are attached here. The respective size is 100µm x 100µm, 10µm x 10µm and 1µm x 1µm. The free form description field of surface and topography should be pre-populated with some information. Surface should contain the reference above.

Data_of_Fig5.zip

After logging in the first time, a user finds an already generated surface with already uploaded topography data and metadata (see above), also the analysis calculations for all automatic functions have already been triggered.

Switch to JSONField for results dictionary

This is a suggestion and up for discussion: The present pickle representation is not searchable and not accessible from a third-party code. However, I don't know what JSONField will do with numpy arrays stored within a dictionary.

Analyses results do not show latest results for topography+function+args

Currently a test (topobank.analysis.tests.test_show_only_last_analysis), which checks that only the latest results for a combination of function+topography+args is shown (not old results), fails.

We have this user story:

As a researcher I only want to see the latest result for the same calculation (i.e. same topography, function, and function arguments) because the latest should be the most correct ones.

When uploading a new topograpghy, pre-populate "name" with filename

Control elements in analysis result view don't function when used alternately

In analyses view, the control elements for switching data series and source topographies on and off
don't work as expected.

Example 1

Two topographies and two data series are shown in one plot (4 results in one plot).

Switch first data series off.
Switch first source topography off.
Switch first data series on again.

Now the first series for the first source is also activated in the plot.

Example 2

Two topographies and two data series are shown in one plot (4 results in one plot).

Switch first data series off.
Switch first source topography off.
Switch first source topography on again.

Again the first series for the first source is also activated in the plot.

Login to Web Application using ORCID ID

User Stories

As a site member I want to login and authenticate in order to organize my personal data and to perform calculations on it.

As a service provider we want to have the ORCID identifier for each user because so that we can use it when publishing data in a later version.

As a developer I want to have a login without external dependencies in order to run tests quickly.

Acceptance Criteria

user needs an ORCID identifier
user can login using his ORCID identifier
user can logout
for testing there is the possibility to only authenticate against the local test database

Tick labels on log-plots should only show for powers of 10

Presently, the tick labels for log plots are somewhat funky since they can be integer multiples of 10, while in a typical log plot only powers of 10 are shown.

Force user to enter a full name as display name

Probably this is most easily done during sign-up.

Attach ORCID to surfaces and topographies

For every piece of data uploaded or entered, we should store the source in terms of the researcher that created the entry. Ideally, this is the researchers ORCID.

Button for deleting surfaces is not functional yet

Results only differing in arguments indistinguishable in legend in result card footer

If one analysis function for the same topography is called with different arguments, there
are separate analysis results, e.g. if the height distribution is first called with bins=None,
then with bins=100 and again with bins=10. In the legend of the result plot, there is the same topography name listed three times with the same name:

In general, the user cannot see which one belongs to which arguments.

Idea

We could the meaning of the slider to not only "Source topography" but "Arguments", which comprises the topography and the other arguments. The other arguments could be listed in parentheses
and the header could be "Source topography (+arguments)".

Another idea would be to simply add tooltips with the arguments, so that they appear
when the user moves the mouse over the topography name.

Cancel button does not work in new topography

Show notifications under the little "bell" symbol in the navbar rather than below navbar

Logout confirmation appears twice

Show task list in UI

User Story

As a user I want to have an overview of the analysis tasks' progress in order to estimate when results are available.

Acceptance Criteria

Using the symbol with the stacked progress bars in the top navigation bar, the user sees a list of the latest 10 running tasks with progress (dropdown)
At the end of the dropdown is a link to a full task list, including finished results

Download of a surface container format

User Story

As service provider we want to define an extensible surface container format in order to always stick topography data and meta data together.

As user I want to be able to open the container by common OS tools available on my platform.

Acceptance Criteria

All uploaded topographies for a surface can be somehow downloaded in the container format, which looks like this:

a ZIP archive
included in this archive for surface data are all the datafiles given by the user
inside for meta data use a YAML file called "surface.yml" that presently contains a key "datafiles"
which is a list holding the datafiles given by the user
This has to be defined more accurately for surfaces, e.g. there is also meta data for the topographies

Download of analysis data

We should think about download as:

Plain text
Excel sheet
PNG
SVG

Maybe:

JSON (simple) - this is how the data arrives at the frontend anyway
HDF5

Questions:

Should this be handled on the backend or frontend side? Backend is probably easier.
If on the backend, this would require a rendering mechanism separate from the front-end, likely matplotlib.

Search bar suggests topographies of other user

Suggestion contains topography that does not belong to current user and hence cannot be accessed.

Download of analysis results not yet implemented

We should implement download as:

Plain text
Excel sheet

Maybe in the future:

SVG
PNG
JSON (simple) - this is how the data arrives at the frontend anyway
HDF5

Questions:

Should this be handled on the backend or frontend side? Backend is probably easier.
If on the backend, this would require a rendering mechanism separate from the front-end, likely matplotlib.

Tagging of surfaces and topographies

User story

As a user, I want to be able to filter the relevant topographies or surfaces from a large set of data in the database.

Acceptance criteria

to be defined, e.g. tags as entries in the search box but as labels in other colors?!

Linking of user profile with ORCID

We should add the capability to link ORCIDs to user profile. This would allow us to uniquely identify the researcher that has opened the respective account.

Visually distinguish search bar from search results

Presently, search bar and search results are displayed as Bootstrap "cards" of the same color. We should make clear that those are different entities of the user interface.

My suggestion is to not show the search bar as a card but by using a band of homogeneous background color, like the navbar but using a different color.

Pan-zoom should stop when all data is visible

Presently it is possible to zoom out further, leaving the plot in a state where it is impossible to get back to the original view.

Initial view in plots sometimes does not show all data

For the height-difference autocorrelation function, some data points are outside the initial view.

Don't allow invalid topography files

User Story

As a service provider I don't want to allow malicious files on the server for security reasons.
As user I want to be informed when I upload an unsupported file file so I can correct it.

Acceptance Criteria

user tries to upload a file which is not a topography file or cannot be read
user get's an error message, cannot upload the file, has information about the file format detected

Invalid character in sheet name when downloading XLSX

When downloading an XLSX file for the current result, I get an exception "Invalid character : found in sheet title" in my browser from Django.

It seems like that the colon in "RMS height: 19.6 nm" is not accepted.

Suggestion:

We remove all problematic characters when building the sheet (problematic for Excel and LibreOffice). My Libre Office reacts like this when trying to insert a colon manually:

We could implement exactly these requirements. Is it the same for Excel?

Another idea is to generally catch all errors which could happen during data export
such that the user gets a proper error message.

Selection dialogs currently only work with network connection

The reason is that currenly the select2 javascript package (js+css) is not located in the source
but referenced by external links. That package should also be included.
Otherwise also the tests cannot run without network connection.

Storing default arguments for analysis functions in database

Currently, when running analyses, the default arguments are not saved in the database.
As example, this is the signature of an analysis function with a default argument:

@analysis_function(automatic=True)
def power_spectrum(topography, window='hann'):
    ....

In the database kwargs={} is stored. If the default argument changes, you cannot know
the argument so easily - okay, later we also want versioning for the functions. For the user interface
(e.g. analyses statuses) it would be easier to always store the default arguments in the database.
So here, the user can see, that a hann window was used.

Does this make sense to always store the actually used arguments, here kwargs={'window': 'hann'}
instead of kwargs={}? Same applies to positional arguments.

Zooming by wheel can be irritating

On analyses view, after selection of topographies and functions, in general many cards with results
are shown. When scrolling down the cards with the mouse wheel, you have to be careful not to move the mouse pointer on a plot because here the wheel is interpreted as change of the zoom scale.
So instead of scrolling, suddenly the zoom factor changes and scrolling stops. This is probably
not expected by new users.

Agree on an interpreter name for topobank in PyCharm IDE

Currently in every commit I have to deselect three files from the .idea directory,
because they contain configuration options which are specific to my environment:

.idea/runConfigurations/runserver.xml
.idea/misc.xml
.idea/topobank.iml

I don't want to check them in now because they currently contain some system-specific settings.
On the other hand it's recommended to check them in order to have a quick start as developer.
I would like to make these settings system-independent, if possible.

One problem (misc.xml) is, that the interpreter defined in PyCharm under

File -> Settings -> Projekt: topobank -> Project Interpreter 
       -> (wheel symbol) -> Show All.. ->  (pencil symbol)

is named different across our machines. On my machine it's named Python 3.7 (topobank), on others it's Python 3.7 or different. I would like to have somewhere "topobank" in the name in order to distinguish it from others.

I suggest to use simply topobank as name for the interpreter, then misc.xml won't change because of that. Then on the next pull all developers have to change the name of the configured interpreter for the project. Okay?

Visualization of surface summary

User Story

As a researcher I want to see at one first glance that the data for a surface is comprised of multiple measurements.

Acceptance Criteria

in the surface selection, for every selected surface a graph with the bandwidths of all
related topographies should be shown
each bandwidth bar should be link to the topography details
the unit of the bandwidth axis should be automatically chosen in a reasonable way
in the detail view of a surface, this graph should also be shown, together with thumbnails
of the topographies

Notes

Hint from Lars

The bandwidth would simply be pixel size to lateral size. I would use the following (t is our Topography instance from PyCo):

Lower bound on length: np.mean(t.size/t.resolution)
Upper bound on length: np.mean(t.size)

This is the bandwidth of a measurement. This of course excludes limits to the bandwidth due to instrumental artifacts but I think including those is too much at present.

Limit max number of search results

User story

As a researcher in order to easily compare results I only want to see a small subset of all available results and while narrowing the results the response should appear quickly.

Acceptance Criteria

the maximum number of shown results can be configured
then on the result page no more results should be shown
the total number of results is shown and a hint that the list has been truncated
there is a "load more" button, which allows to load more results (same number as the limit), so the list can be completed step-by-step

Change celery backend

Currently, when starting a celery worker, I get the message:
The AMQP result backend is scheduled for deprecation in version 4.0 and removal in version v5.0. Please use RPC backend or a persistent backend.
We need to change the result backend.

Deleting topographies is not working

Error handling missing on loading erroneous topography files

Currently, when loading a wrong data file (e.g. some text file), the exception thrown by PyCo
is not caught and converted to an error message suitable for the user.

We should show a short error message what's gone wrong.

Use e-mail address as unique user identifier instead of user name

No more short "username" should be needed when a new user is signing up.
Instead an email address is mandatory and must be unique among all e-mail addresses.

Single upload for multiple topographies

User Story

As a user I want to be able to create many topographies from one data file without uploading it many times.

Notes

Somtimes there are a kind of "movie" in one file with many frames. There should be only one upload. We need then a tricky way to efficiently specify meta data for each topography.

Acceptance Criteria

on creation of new topographies for a surface, the user first selects a data file
then he can select one or more data sources
he is asked for further meta data for each of the selected data sources
then all topographies are created one after the other

Editing surfaces does not work

Individual data series should be shown with different dash and symbol

User Story

As a user, I want to be able to identify the data series from the visual presentation of the data.

Acceptance criteria

Topographies are currently distinguished by color, data series should be distinguished by dash of the line and the symbol. Dash and symbol should be shown in control element below the plot.

Data series differ by dash and symbol
Dash and symbol is also displace next to the control element for the respective data series

Switch from consecutive IDs to unique IDs

This would make an attack exposing other users data harder. ID could also serve as a unique identifier for publications.

https://docs.djangoproject.com/en/2.1/ref/forms/fields/#django.forms.UUIDField

Empty topography thumbnail for example file

When uploading the topography file example2.x3p which is included in the test data of
the PyCo package, there is a division by zero in the PyCo package and the thumbnail image is "empty".

The runtime warnings shown in topobank are:

/home/michael/miniconda3/envs/topobank/lib/python3.7/site-packages/PyCo/Topography/Detrending.py:92: RuntimeWarning: divide by zero encountered in true_divide
  np.array(list(arr.shape)+[1.])/np.array(list(size)+[1.])
/home/michael/miniconda3/envs/topobank/lib/python3.7/site-packages/PyCo/Topography/TopographyDescription.py:494: RuntimeWarning: invalid value encountered in multiply
  return self.parent_topography.array() + h0 + m * x + n * y
/home/michael/miniconda3/envs/topobank/lib/python3.7/site-packages/matplotlib/colors.py:916: UserWarning: Warning: converting a masked element to nan.
  dtype = np.min_scalar_type(value)
/home/michael/miniconda3/envs/topobank/lib/python3.7/site-packages/numpy/ma/core.py:715: UserWarning: Warning: converting a masked element to nan.
  data = np.array(a, copy=False, subok=subok)

Control elements show multiple times when display length scale is changed

Don't show all links in navbar when not logged in

Although these links redirect to the login page ("tasks" and "notifications" not yet implemented),
they should only appear, when logged in:

Surfaces
Analyses
Tasks
Notifications

Show full name rather than username in navbar

Store topographies in flexible backend

User Story

As provider of the web service we want to store all the topography data in order to be able to do machine learning on it or publish the data someday.

As researcher I want to collect my topography data at one place in order to (re)perform calculations on different topographies easily.

As provider we want to be able to switch the backend later in order not to be dependent on one provider (e.g. SciServer).

Acceptance Criteria

a user can upload his topographies, logout, login again later and see details on already uploaded data
we as provider have the possiblity to use all topography data uploaded by the users
the backend is used via a common interface so that it could be theoretically exchanged

The first supported backend is the local filesystem. It should be possible to extend the application later by saving the data in the sciserver infrastructure (volumes).

Should the user be able to switch backend systems? How are the files moved then?
Some more description work needed here.

Cannot add topographies once the surface detail page after creating the surface was left

After implementing #29 , the user cannot add topographies later after closing the surface detail view
on first sight. Empty surfaces can be selected, but since there is no card shown any more,
there is also no "Add Topography" button. So the current navigation is incomplete.

Maybe we should separate between the selection of surfaces/topographies and the presentation
of a list with all surfaces.

Ideas:

having a third menu option which simply shows all surfaces where you can add topographies, or
integrating the cards with surface selection in the "Analyses" view somehow, e.g. by creating
a collapsible area with the surface cards in an accordion: http://getbootstrap.com/docs/4.1/components/collapse/#accordion-example

What do you think? Do you have another idea?

Thumbnails have wrong aspect ratio

The images shown in the thumbnails have an aspect ratio different from the original surface.

Card of surface should not show if no topography of this surface is selected

Presently, topobank shows an empty card saying "No topography selected from this surface". This can become confusing is the user has a larger number of surfaces, because most of those will then show as empty and it will be difficult to navigate to the selected topographies.

contactengineering / topobank Goto Github PK

topobank's Introduction

TopoBank

Settings

User Accounts

Running tests with pytest

Linting with pre-commit hooks

Docker

Celery

Funding

topobank's People

Contributors

Stargazers

Watchers

Forkers

topobank's Issues

User Story

Acceptance Criteria

User Story

Acceptance Criteria

Example 1

Example 2

User Stories

Acceptance Criteria

Idea

User Story

Acceptance Criteria

User Story

Acceptance Criteria

User story

Acceptance criteria

User Story

Acceptance Criteria

User Story

Acceptance Criteria

Notes

Hint from Lars

User story

Acceptance Criteria

User Story

Notes

Acceptance Criteria

User Story

Acceptance criteria

User Story

Acceptance Criteria

Recommend Projects

Recommend Topics

Recommend Org