mathesar-foundation / mathesar Goto Github PK

Web application providing an intuitive user experience to databases.

License: GNU General Public License v3.0

Dockerfile 0.08% Python 46.67% HTML 0.59% Svelte 23.38% JavaScript 0.22% SCSS 1.44% TypeScript 19.75% Shell 0.15% PLpgSQL 7.58% MDX 0.14% CSS 0.01%

python django sqlalchemy javascript typescript svelte sveltejs django-rest-framework postgresql postgres

mathesar's People

Contributors

Stargazers

Watchers

Forkers

dhruvkb eito-fis mr-gabe49 patelaryan7751 damain andyharley-cci obulat aaditkamat arvind-1 linzhou-zhong seancolsen dmos62 nvamsikrishna05 vijaydasmp liangguo bohemia420 silentninja rajeshkumarrs davidylee gagan2005 fkarg henryrocha vfermat andruwwwka yankevn carlosbogo eld120 twuntee ishqelliot sahilsaini1107 manishshah120 anish9901 sariyahsintayehu battuashita nk183 spectre-ak muhsinkamil priyang12 vorakunal shreyas-sas kunalprakash1309 nitin5901 aviwad aditmehta9 arthrowabstract xianhe-zhang 7saikat7 shrey27tri01 naveenvarma600 rohit-kalavapudi kid-116 jinxiao0302 dante381 premsagarkushwaha henilp105 sakshampathak1508 jyuart drashtipatel2503 manasa-sm yashpatwa60 lekhit edwardbetts anv3sh ritikbhandari ktun95 abhishak3 aadityasinha-dotcom yashrajsingh-4799 sristis0202 sebasttiandaza nitheezkant kaushikaryan04 surajrane42 chidam333 lakshya-dhariwal josephmarks kaimonmoy mohit-aasirwal aleklefebvre creative1800 steve-lgtm arijitghosal03 erohrer5 azunna ayao451 yash7raut kwliang1 withgaurav apolo151 rohitverma1222 davidkratochvila mistryiam vaibhavjak gaurav-mahato sumeetharyani harshvmahawar praneeth-rdy hritvik-mohan hatilar420 segrr

mathesar's Issues

Populate Mathesar wiki with contents of old notes and roadmap

We have a new wiki! https://wiki.mathesar.org/

I'm going to populate it with the contents of our previous notes and delete our old notes. This issue is to track that work.

Test all code brought in from the prototype

Problem

The code brought over from the prototype in PRs #57 , #60, #75, #79, #85 is not yet fully tested

Proposed solution

Write tests covering all code from those PRs which is not yet covered.

CSV upload functionality lacks tests

Describe the bug
We need tests for the CSV upload functionality so that we can ensure we don't break it while working on other things.

Move columns between tables

Problem

While data modeling, it may be that a user would want to be able to move a set of columns from Table A to Table B, where these tables are connected by a foreign key relationship.

Proposed solution

We should provide this functionality, with the following restrictions (for now):

The tables involved must be connected by a foreign key.
The columns being moved cannot be refered to by a foreign key, or be a foreign key column
The columns being moved cannot contain the primary key of a table.

Document pre-commit linter hook

We should make a document to help contributors optionally set up a pre-commit hook with the same linting that we're using in the repo. This will let them have a quick check on each commit rather than waiting till they submit a PR to see if it's going to pass the linter without problems.

Deployment tooling Checklist 2022-12

In preparation for the first release, we need tooling and documentation to help users and admins deploy and manage Mathesar in different contexts.

Tasks

Set up separate docs.mathesar.org site for documentation.

The deployment-specific tasks are organized by different deployment types, and
the intent is to do lower-numbered deployment types before higher-numbered ones.

Deployment Type 1

This deployment has the service and all DBs running in Docker containers managed by docker-compose as in our current dev setup. We would want to help the user configure things for a production environment.

Deployment Type 2

This deployment has the web service and its Django model tables DB running in Docker containers managed by docker-compose, but the user DB is assumed to be running separately either on the same underlying machine, or a different server altogether. Example: running the mathesar_service and a DB container for Django model tables on a minimal EC2 instance, and having user tables on an RDS PostgreSQL instance. In addition to the pieces of type 1 above, we should need:

Ensure installation script works in this context, modify if needed.
Smooth configuration in this context. In addition to the config for type 1,
- Ports (help avoid conflicts)
- privileged mathesar user
- privileged already-extant user for use in creating above mathesar user.
Provide a method for creating and using a separate privileged mathesar user on the User DB, starting from given privileged already-extant user on the target user DB.
Create separate docker-compose service (maybe separate file?) to launch this setup.
Document use of this setup on docs.mathesar.org

Deployment Type 3

This is the same as Type 2, except the Django tables are stored on a database not managed by Docker. This is not recommended for performance reasons (i.e., if the database is running on a different server from the web service container).

Document this setup, and pros-cons on docs.mathesar.org

Deployment Type 4

This is a 'bare-metal' (more likely, VPS) installation with the database and Django web service installed on the same machine.

Determine suitability of our ansible playbooks for this (it necessary? Could it be replaced for most users by a set of instructions?)

Further steps are dependent on outcome of above.

Deployment Type 5

This is a 'bare-metal' installation (as above), but with a DB in some remote location. Very little should change from Type 4.

Other deployment targets

Investigate one-click deployment solutions from PAASes
- Choose one that seems appropriate.
- Implement it.
Provide targeted instructions for setting up reasonable architectures on AWS, GCP, and Azure.

License the code

We do not currently have a license for the code in this repository.

We're investigating options and will add a license soon; I just wanted to track the work.

Type Inference 4: Set best types for columns of table

Problem

Different types in Mathesar will enable different operations; for example, strings could be aggregated by concatenating, but numeric types could be aggregated by summing or multiplying. So far, while we can reflect different types, we have no way to determine the type most appropriate for a table.

Proposed solution

Using the functions from #91 , #92 , and #93 , we should make a function that takes a schema, original_table_name, and new_table_name, creates a new table with typed columns of the same names as the old table, and inserts all data there. The old table can be optionally dropped.

API for Schemas

This issue covers a read-only API for list and detail views of schemas.

Previously, this issue was for a CRUID API, but I'm reducing the scope because:

Updating involves renaming schemas, which involves tables, foreignkeys etc. and is complicated/deserves its own issue.
Creation and deletion behavior should be driven by UI needs, which are yet to be defined.

Create common UI components needed for layout and table view

We will need a list of UI components while implementing the readonly table view. This ticket is to begin work on them while the design is in progress, so that development for table view can start without any blockers after design is finalized.

Initial set of UI components:

Text boxes (text, email, number, password)
Buttons - with loading indicator
Spinner
Dropdown

If possible (might be spun off into a separate ticket):

Custom select box

Reorganize to clearly separate webapp and database concerns

Django shouldn't really know more about a given user-table database than its connection info, and functions that don't need to know about Django should just take an engine as an argument. Thus, we'll split functionality that manipulates user-defined tables out of the webapp to its own library which the webapp will import. This will be easier to maintain in the long run.

Access existing PostgreSQL databases via Mathesar

Mathesar should, given appropriate credential info from a user, be able to connect to an existing database, and reflect the tables there. The reflection should include the columns of the tables, and their types (though we won't use the type info for much initially).

CSV to table import: Define UI and interactions

Design Problem

We need to define the interface components and interactions required to implement the CSV to Table Import Functionality.

Context

Importing data from a file is one of the table creation methods listed in Mathesar's roadmap. In the case of CSV to Table, the input source is a CSV file, that once processed by Mathesar, is converted into a table in a new or existing schema.

Justification

Import from files is a baseline functionality of products in the same category. We want to make Mathesar comparable to other products in the same category.
CSV is one of the most commonly used file formats for tabular data, and it's easily saved from applications like Excel or Google Sheets, making it accessible to users.

Impact

Users expect the same baseline functionality from products in the same category.

Scenarios

The input is a valid CSV file.
The input is a single CSV file.

Setup an Icon library with credits to creators

Currently the required icons for Mathesar is taken from the Noun project as per my understanding. @ghislaineguerin let me know if this is right. The licence for each of those icons is unknown. We need to check them and ensure that we can use it in our code, and include appropriate credits if applicable.

We will need to setup a publicly shared icon library in Figma, and have an icon component system in place.

UX Principles Draft

We need a set of principles that we can apply consistently when making design decisions as a team for the Mathesar project.
The resulting document should also serve as the blueprint for building a community around Mathesar design.

Rename Collections/Applications to Postgres terminology

We are going to standardize on Postgres terminology (databases/schemas/tables) rather than renaming them to more "non-technical" concepts like applications and collections to simplify the interface. This issue is to rename those concepts in the codebase.

Type Inference 2: Determine a type for a column

Problem

Proposed solution

Using the function implemented for #91 , we should then test the column against types in an algorithmic way to determine the best type prediction we can for the column

Type Inference 3: Type inference for a table

Problem

Proposed solution

Using the functions of #91 and #92, we should create a function that, given a schema and table_name, returns a list of best-match types for the columns of that table.

[Wiki] Set up linkrot detection for Mathesar wiki

Problem

The wiki is central to Mathesar design, development, and community building. We need to be able to identify broken links so that we can fix them and keep the wiki in good repair.

Proposed solution

Set up a GitHub Action on the (private) wiki repository to identify linkrot and run it once a day.

Update roadmap and product definition on wiki

We need to revise the current roadmap based on the outcomes of the wireframing exercise.

Move code over from prototype and test it.

We need to move the code over from the prototype repo into this one, and start cleaning it up and testing it. This is a meta-ticket, and smaller tickets will be created per feature as needed.

Sub-issues:

#44 , #56 , #67 , #77 , #81

Wireframes for Data Explorer

The data explorer is the group of UI components and interactions through which users view and modify their data across one or multiple tables. We want to ensure that the initial roadmap contains all the necessary features for an optimal data explorer experience.
By wireframing potential user scenarios centered around data explorer usage, we can validate our roadmap assumptions and uncover opportunities to improve them.

Readonly table view based on design handoff

We need to implement the table view as a Svelte component.

Configure flake8 & GitHub Action correctly

Our flake8 setup has a couple of issues:

Failures on the GitHub Action don't actually block merge.
We need to set up our style guide for flake8.

Handle routing takeover at client

Currently, routing is handled by Django. When the user visits the application initially, the routing should be handled by Django, and after the application loads, the routing should be taken over by client.

Accommodate frontend code in repo

Related to #53

A separate directory for frontend code and workflow actions needs to be created.

We would need:

A separate service in docker-compose to build the client files for development - logistics need to be figured out.
For production builds, we will have to build client and place it in static web server folder before creating a bundle/container.
Separate workflow for lint, test and build steps for client.

Allow users to import TSV files

Problem
Currently, the user can only import csv files into Mathesar. tsv files are pretty similar and the user should be able to import them too.

Describe the solution you'd like
The user should be able to upload a tsv file to the api/v0/data_files/ API endpoint.

Use Mathesar Bot account for GitHub workflow automation

Currently, my account is used for GitHub workflow automation. I'm going to switch it to a bot account so it's clearer what actions I'm taking.

Set up a PostgreSQL server automatically if none exists

We need to give Mathesar the ability to set up a fresh database to hold user-defined tables if no such database exists. This should (for now) not require any action on the user's part.

The new server should also have an initial user, and include the mathesar-specific types and functions in the appropriate schema.

Extract columns from table, creating new table

Problem

Database schema normalization is difficult, and not something many are used to. However, a properly normalized set of database tables enables reduction of repetition, helps enforce consistency, and more. So, we should try to help users normalize the set of tables they're working with (I'm avoiding saying schema since it's a reserved word in this project).

Proposed Solution

The db.tables module should be able to extract a set of columns from a table, and connect the resulting tables with a foreign key relationship. This would be part of a flow to help a user get their tables into 2NF. Note that we're being a bit sloppy here, since the entries in cells may yet be non-atomic.

Something like:

            TABLE 1                                TABLE 1'                 TABLE 2
|=============================|           |=======================|   |=================|
| ID  |  A  |  B  |  C  |  D  |  EXTRACT  | ID  |  A  |  B  |T2ID |   | ID  |  C  |  D  |
|=====|=====|=====|=====|=====|    C,D    |=====|=====|=====|=====|   |=====|=====|=====|
|  1  | ... | ... | ... | ... |  ------>  |  1  | ... | ... | ... |   |  1  | ... | ... |
|  2  | ... | ... | ... | ... |           |  2  | ... | ... | ... |   |  2  | ... | ... |
|  3  | ... | ... | ... | ... |           |  3  | ... | ... | ... |   |  3  | ... | ... |
|  4  | ... | ... | ... | ... |           |  4  | ... | ... | ... |   |  4  | ... | ... |

Where T2ID is the ID key of the new TABLE 2 as a foreign key.

Assumptions

In the extraction process, each table should be reduced to the unique set of tuples from the columns in the table.
Any columns which are themselves foreign key columns before the extraction should remain so.

API for Tables

This issue covers a read-only API for list and detail views of tables.

Previously, this issue was for a CRUID API, but I'm reducing the scope because:

Updating involves renaming tables, which involves foreignkeys etc. and is complicated/deserves its own issue.
Creation and deletion behavior should be driven by UI needs, which are yet to be defined.

Controlled Casting

Problem

In the course of implementing #92 , I've realized that the default casting in PostgreSQL doesn't quite serve our needs. For example, many different strings cast to booleans, but this loses information. Another example: Numeric 1s and 0s can't cast to boolean. When casting NUMERIC to INTEGER values are silently rounded. And so on.

Proposed solution

We need to define functions for all supported casting to be used in altering column types and column inference.

Additional context

This will probably involve a number of PRs and sub-issues.

Schema getter should return public, but not mathesar_types

Describe the bug

If a user wants to create a table the public schema, they can't currently, because the logic in the db.schemas.get_all_schemas function ignores it. This means when they try, an error is thrown. This is especially a problem when they've imported a DB, since most tables are in the public schema in most installations of PostgreSQL in the wild.

Expected behavior

The public schema should be available for holding mathesar tables.

To Reproduce
Please try to provide a Minimal, Complete, and Verifiable example.

Start the webapp using the README. Try to upload a CSV to the public schema. See the error.

Have a nice day!

Controlled casting for dates, times, and timestamps

Problem

Related to #126 . We want to support dates, times, and timestamps (i.e., datetimes) with our controlled casting (and eventually inference) logic.

Proposed solution

We should add functions to db.types.alteration that will handle casting appropriate types to each of DATE, TIME, or TIMESTAMP.

Additional context

See this discussion for some thoughts about Timezones: #119 .

CRUD API for Records

Records in tables should be able to be created, read, updated, and deleted via the REST API.

Setup bot for auto dependency updates for npm packages

Problem
npm dependencies need to be regularly monitored and updated to the most recent minor version.

Proposed solution
Having a gh bot to do this for us and raise a PR with the changes, would make it a lot more easier for us to maintain this in the long run.

Table Creation fails if called more than once

Describe the bug

If mathesar.database.tables.create_table is called more than once, it causes an error.

Expected behavior

It should be possible to create more than one table.

To Reproduce
Please try to provide a Minimal, Complete, and Verifiable example.
Follow the instructions to setup the web app. Try to upload CSVs to two different tables.

Additional context

Traceback:

Environment:


Request Method: POST
Request URL: http://localhost:8000/

Django Version: 3.1.7
Python Version: 3.9.2
Installed Applications:
['django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'mathesar']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware']



Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/code/mathesar/views.py", line 16, in index
    collection = create_collection_from_csv(
  File "/code/mathesar/imports/csv.py", line 25, in create_collection_from_csv
    table = create_table_from_csv(name, schema, csv_reader, engine)
  File "/code/mathesar/imports/csv.py", line 18, in create_table_from_csv
    table = create_table(name, schema, csv_reader.fieldnames, engine)
  File "/code/mathesar/database/tables.py", line 19, in create_table
    table = Table(
  File "<string>", line 2, in __new__
    <source code not available>
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/deprecations.py", line 298, in warned
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 597, in __new__
    metadata._remove_table(name, schema)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 198, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 592, in __new__
    table._init(name, metadata, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 678, in _init
    self._init_items(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 134, in _init_items
    spwd(self, **kw)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/base.py", line 1014, in _set_parent_with_dispatch
    self._set_parent(parent, **kw)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 1753, in _set_parent
    raise exc.ArgumentError(

Exception Type: ArgumentError at /
Exception Value: Column object 'mathesar_id' already assigned to Table 'Domains'

The error is caused because the create_table function tries to reuse the same default column, rather than creating copies of it. This will be fixed while bringing code over from the prototype repo.

Have a nice day!

Setup API documentation and Postman collection

Having a dedicated API documentation would make things easier for quick reference of our APIs as well as for anyone developing clients for Mathesar. Personal preference: Redoc
Postman collections make it easier to store, query, test and share APIs. Having one would make development and debugging easier. (Optional - just an idea)

Related Discussion

#875

Setup GH action to raise an issue when npm audit fails in workflow, instead of failing the workflow.

Problem
When a vulnerability is detected in any of our npm dependencies, the current GH workflow would fail at npm audit step and fail the entire workflow.

Expected
We would ideally not want the pipeline to fail due to audit failure. We would still need an indication that the npm audit step failed.

Proposed solution
We could set up a GH action that performs npm audit, and raises an issue if that fails.

Type Inference 1: Check column against a type

Problem

Proposed solution

Given a schema, table_name, column_name, and type, we need to be able to return a boolean giving whether the column can be cast to that type.

Additional context

We may need to take an optional sample size parameter to do this for large data. Performance testing will be necessary.

Add database read API, include table names in Schema API

Problem

@pavish is working on setting up our existing frontend using Svelte + the API. We need to list all tables in a given schema as well as display supported database keys in the frontend. Currently, the schema API does not show table names, and there's no way to get database keys via API at all.

Proposed solution

Create a database keys API (read-only)
Update the schema of the Schema API to include table names.

Set up GitHub Actions to run tests on push

We should be running tests every time code is pushed so that we can ensure that new code does not break tests.

Performance testing data generation

Problem

Some of the functionality of Mathesar has performance implications w.r.t. the speed with which responses can be expected from the API. For example, splitting and merging tables can be slow at large scale.

In order to understand these implications, we need to have sample data of different sizes, some very large.

Proposed solution

We should create that data via a script to avoid having to sync it in the repo or store it somewhere. This could also be included in the example notebook mentioned in #82 so that developers can see the performance of different operations.

Wireframes for Inventory Use Case

We need wireframes to validate the roadmap against a simple inventory use case.
The inventory use case defines a user's steps to set up a simple database to hold details and information about a specific collection of items.

Work with tables and views: Define UI and interactions

Design Problem

We need to define the interface components and interactions required to implement the Tables and Views Functionality.

Context

Displaying data in tables and views is part of Mathesar's roadmap. Tables and views are the main ways the users will interact with their data within Mathesar.

Justification

Tables and views are a baseline functionality of products in the same category. We want to make Mathesar comparable to other products in the same category.

Impact

Users expect the same baseline functionality from products in the same category.

Scenarios

The user has a large number of rows (+10,000)
The user has data across multiple tables
The user has set at least one relationship between tables

APIs for CSV import functionality

Problem

Our current CSV import functionality was meant to be temporary and has a few issues:

It's synchronous and blocks response until the CSV file is imported.
It does not save the CSV file.
It's built using Django Forms and cannot be used by the Svelte-based frontend.

Proposed solution

CSV imports should be possible using the API. There should be separate endpoints for the following tasks:

Uploading a CSV file and saving it to disk.
Showing the user a preview of the CSV file (with types inferred although that's not in scope for this issue and will need to wait until #93 is done)
Showing the status of the CSV upload and type inference.
Actually creating a table (eventually with types confirmed by the user, once #93 is done)

Merge tables which are connected by a Foreign Key relationship.

Problem

Perhaps a user has split a table using the functionality proposed in #67 , but they've done something wrong, or they've thought of a better set of columns to extract from the original table. They may want to put the tables back together (not just be able to view them together, but actually merge them under the hood).

Proposed solution

We should give them the ability to merge the tables back together to recover the previous state of their data.

Web service should wait for db to start accepting connections

Describe the bug
Web service starts right after db service is created and Postgres starts, but this does not ensure that Postgres is accepting connections yet. When web service starts during that interval, it is not able to establish connection and does not retry further connection attempts. This happens intermittently when freshly running docker-compose up.

Expected behavior

Upon start, web service should wait for db to start accepting connections.
If unable to establish connection/connection fails later, application should retry and re-establish connections.

Add examples to let users (and developers) easily tinker

Problem

Some users (and developers) may want to play around with Mathesar, or experiment with features contained in the db package, but not yet available through the UI or API.

Proposed solution

We should add an example Jupyter notebook showing what the db package is able to do.

The notebook should not require Django to be loaded
The notebook should be able to run outside of Docker.

Ensure database objects are synced with web application

Is your feature request related to a problem? Please describe.
Currently, if someone uses a pre-existing database in Mathesar, or updates a database outside of Mathesar, these changes are not synced with the tables and schemas stored in the web application.

Describe the solution you'd like
We should ensure that database objects are synced, both during initial database setup and regularly.

mathesar-foundation / mathesar Goto Github PK

mathesar's People

Contributors

Stargazers

Watchers

Forkers

mathesar's Issues

Tasks

Deployment Type 1

Deployment Type 2

Deployment Type 3

Deployment Type 4

Deployment Type 5

Other deployment targets

Design Problem

Context

Justification

Impact

Scenarios

Problem

Proposed Solution

Assumptions

Related Discussion

Design Problem

Context

Justification

Impact

Scenarios

Recommend Projects

Recommend Topics

Recommend Org