mathesar-foundation / mathesar Goto Github PK
View Code? Open in Web Editor NEWWeb application providing an intuitive user experience to databases.
Home Page: https://mathesar.org/
License: GNU General Public License v3.0
Web application providing an intuitive user experience to databases.
Home Page: https://mathesar.org/
License: GNU General Public License v3.0
We have a new wiki! https://wiki.mathesar.org/
I'm going to populate it with the contents of our previous notes and delete our old notes. This issue is to track that work.
Describe the bug
We need tests for the CSV upload functionality so that we can ensure we don't break it while working on other things.
Problem
While data modeling, it may be that a user would want to be able to move a set of columns from Table A to Table B, where these tables are connected by a foreign key relationship.
Proposed solution
We should provide this functionality, with the following restrictions (for now):
We should make a document to help contributors optionally set up a pre-commit hook with the same linting that we're using in the repo. This will let them have a quick check on each commit rather than waiting till they submit a PR to see if it's going to pass the linter without problems.
In preparation for the first release, we need tooling and documentation to help users and admins deploy and manage Mathesar in different contexts.
docs.mathesar.org
site for documentation.The deployment-specific tasks are organized by different deployment types, and
the intent is to do lower-numbered deployment types before higher-numbered ones.
This deployment has the service and all DBs running in Docker containers managed by docker-compose
as in our current dev setup. We would want to help the user configure things for a production environment.
dev
flags.COPY
rather than mounting codedocker-compose.yml
to use that image rather than building.docs.mathesar.org
.This deployment has the web service and its Django model tables DB running in Docker containers managed by docker-compose
, but the user DB is assumed to be running separately either on the same underlying machine, or a different server altogether. Example: running the mathesar_service
and a DB container for Django model tables on a minimal EC2 instance, and having user tables on an RDS PostgreSQL instance. In addition to the pieces of type 1 above, we should need:
mathesar
usermathesar
user.mathesar
user on the User DB, starting from given privileged already-extant user on the target user DB.docker-compose
service (maybe separate file?) to launch this setup.docs.mathesar.org
This is the same as Type 2, except the Django tables are stored on a database not managed by Docker. This is not recommended for performance reasons (i.e., if the database is running on a different server from the web service container).
docs.mathesar.org
This is a 'bare-metal' (more likely, VPS) installation with the database and Django web service installed on the same machine.
Further steps are dependent on outcome of above.
This is a 'bare-metal' installation (as above), but with a DB in some remote location. Very little should change from Type 4.
We do not currently have a license for the code in this repository.
We're investigating options and will add a license soon; I just wanted to track the work.
Problem
Different types in Mathesar will enable different operations; for example, strings could be aggregated by concatenating, but numeric types could be aggregated by summing or multiplying. So far, while we can reflect different types, we have no way to determine the type most appropriate for a table.
Proposed solution
Using the functions from #91 , #92 , and #93 , we should make a function that takes a schema
, original_table_name
, and new_table_name
, creates a new table with typed columns of the same names as the old table, and inserts all data there. The old table can be optionally dropped.
This issue covers a read-only API for list and detail views of schemas.
Previously, this issue was for a CRUID API, but I'm reducing the scope because:
We will need a list of UI components while implementing the readonly table view. This ticket is to begin work on them while the design is in progress, so that development for table view can start without any blockers after design is finalized.
Initial set of UI components:
If possible (might be spun off into a separate ticket):
Django shouldn't really know more about a given user-table database than its connection info, and functions that don't need to know about Django should just take an engine as an argument. Thus, we'll split functionality that manipulates user-defined tables out of the webapp to its own library which the webapp will import. This will be easier to maintain in the long run.
Mathesar should, given appropriate credential info from a user, be able to connect to an existing database, and reflect the tables there. The reflection should include the columns of the tables, and their types (though we won't use the type info for much initially).
We need to define the interface components and interactions required to implement the CSV to Table Import Functionality.
Importing data from a file is one of the table creation methods listed in Mathesar's roadmap. In the case of CSV to Table, the input source is a CSV file, that once processed by Mathesar, is converted into a table in a new or existing schema.
Import from files is a baseline functionality of products in the same category. We want to make Mathesar comparable to other products in the same category.
CSV is one of the most commonly used file formats for tabular data, and it's easily saved from applications like Excel or Google Sheets, making it accessible to users.
Users expect the same baseline functionality from products in the same category.
Currently the required icons for Mathesar is taken from the Noun project as per my understanding. @ghislaineguerin let me know if this is right. The licence for each of those icons is unknown. We need to check them and ensure that we can use it in our code, and include appropriate credits if applicable.
We will need to setup a publicly shared icon library in Figma, and have an icon component system in place.
We need a set of principles that we can apply consistently when making design decisions as a team for the Mathesar project.
The resulting document should also serve as the blueprint for building a community around Mathesar design.
We are going to standardize on Postgres terminology (databases/schemas/tables) rather than renaming them to more "non-technical" concepts like applications and collections to simplify the interface. This issue is to rename those concepts in the codebase.
Problem
Different types in Mathesar will enable different operations; for example, strings could be aggregated by concatenating, but numeric types could be aggregated by summing or multiplying. So far, while we can reflect different types, we have no way to determine the type most appropriate for a column.
Proposed solution
Using the function implemented for #91 , we should then test the column against types in an algorithmic way to determine the best type prediction we can for the column
Problem
Different types in Mathesar will enable different operations; for example, strings could be aggregated by concatenating, but numeric types could be aggregated by summing or multiplying. So far, while we can reflect different types, we have no way to determine the type most appropriate for the columns of a table.
Proposed solution
Using the functions of #91 and #92, we should create a function that, given a schema
and table_name
, returns a list of best-match types for the columns of that table.
Problem
The wiki is central to Mathesar design, development, and community building. We need to be able to identify broken links so that we can fix them and keep the wiki in good repair.
Proposed solution
Set up a GitHub Action on the (private) wiki repository to identify linkrot and run it once a day.
We need to revise the current roadmap based on the outcomes of the wireframing exercise.
The data explorer is the group of UI components and interactions through which users view and modify their data across one or multiple tables. We want to ensure that the initial roadmap contains all the necessary features for an optimal data explorer experience.
By wireframing potential user scenarios centered around data explorer usage, we can validate our roadmap assumptions and uncover opportunities to improve them.
We need to implement the table view as a Svelte component.
Our flake8 setup has a couple of issues:
Currently, routing is handled by Django. When the user visits the application initially, the routing should be handled by Django, and after the application loads, the routing should be taken over by client.
Related to #53
A separate directory for frontend code and workflow actions needs to be created.
We would need:
Problem
Currently, the user can only import csv
files into Mathesar. tsv
files are pretty similar and the user should be able to import them too.
Describe the solution you'd like
The user should be able to upload a tsv
file to the api/v0/data_files/
API endpoint.
Currently, my account is used for GitHub workflow automation. I'm going to switch it to a bot account so it's clearer what actions I'm taking.
We need to give Mathesar the ability to set up a fresh database to hold user-defined tables if no such database exists. This should (for now) not require any action on the user's part.
The new server should also have an initial user, and include the mathesar-specific types and functions in the appropriate schema.
Database schema normalization is difficult, and not something many are used to. However, a properly normalized set of database tables enables reduction of repetition, helps enforce consistency, and more. So, we should try to help users normalize the set of tables they're working with (I'm avoiding saying schema since it's a reserved word in this project).
The db.tables
module should be able to extract a set of columns from a table, and connect the resulting tables with a foreign key relationship. This would be part of a flow to help a user get their tables into 2NF. Note that we're being a bit sloppy here, since the entries in cells may yet be non-atomic.
Something like:
TABLE 1 TABLE 1' TABLE 2
|=============================| |=======================| |=================|
| ID | A | B | C | D | EXTRACT | ID | A | B |T2ID | | ID | C | D |
|=====|=====|=====|=====|=====| C,D |=====|=====|=====|=====| |=====|=====|=====|
| 1 | ... | ... | ... | ... | ------> | 1 | ... | ... | ... | | 1 | ... | ... |
| 2 | ... | ... | ... | ... | | 2 | ... | ... | ... | | 2 | ... | ... |
| 3 | ... | ... | ... | ... | | 3 | ... | ... | ... | | 3 | ... | ... |
| 4 | ... | ... | ... | ... | | 4 | ... | ... | ... | | 4 | ... | ... |
Where T2ID
is the ID
key of the new TABLE 2
as a foreign key.
This issue covers a read-only API for list and detail views of tables.
Previously, this issue was for a CRUID API, but I'm reducing the scope because:
Problem
In the course of implementing #92 , I've realized that the default casting in PostgreSQL doesn't quite serve our needs. For example, many different strings cast to booleans, but this loses information. Another example: Numeric 1s and 0s can't cast to boolean. When casting NUMERIC
to INTEGER
values are silently rounded. And so on.
Proposed solution
We need to define functions for all supported casting to be used in altering column types and column inference.
Additional context
This will probably involve a number of PRs and sub-issues.
Describe the bug
If a user wants to create a table the public
schema, they can't currently, because the logic in the db.schemas.get_all_schemas
function ignores it. This means when they try, an error is thrown. This is especially a problem when they've imported a DB, since most tables are in the public
schema in most installations of PostgreSQL in the wild.
Expected behavior
The public schema should be available for holding mathesar tables.
To Reproduce
Please try to provide a Minimal, Complete, and Verifiable example.
Start the webapp using the README. Try to upload a CSV to the public
schema. See the error.
Have a nice day!
Problem
Related to #126 . We want to support dates, times, and timestamps (i.e., datetimes) with our controlled casting (and eventually inference) logic.
Proposed solution
We should add functions to db.types.alteration
that will handle casting appropriate types to each of DATE
, TIME
, or TIMESTAMP
.
Additional context
See this discussion for some thoughts about Timezones: #119 .
Records in tables should be able to be created, read, updated, and deleted via the REST API.
Problem
npm dependencies need to be regularly monitored and updated to the most recent minor version.
Proposed solution
Having a gh bot to do this for us and raise a PR with the changes, would make it a lot more easier for us to maintain this in the long run.
Describe the bug
If mathesar.database.tables.create_table
is called more than once, it causes an error.
Expected behavior
It should be possible to create more than one table.
To Reproduce
Please try to provide a Minimal, Complete, and Verifiable example.
Follow the instructions to setup the web app. Try to upload CSVs to two different tables.
Additional context
Traceback:
Environment:
Request Method: POST
Request URL: http://localhost:8000/
Django Version: 3.1.7
Python Version: 3.9.2
Installed Applications:
['django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'mathesar']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware']
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
response = get_response(request)
File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/code/mathesar/views.py", line 16, in index
collection = create_collection_from_csv(
File "/code/mathesar/imports/csv.py", line 25, in create_collection_from_csv
table = create_table_from_csv(name, schema, csv_reader, engine)
File "/code/mathesar/imports/csv.py", line 18, in create_table_from_csv
table = create_table(name, schema, csv_reader.fieldnames, engine)
File "/code/mathesar/database/tables.py", line 19, in create_table
table = Table(
File "<string>", line 2, in __new__
<source code not available>
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/deprecations.py", line 298, in warned
return fn(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 597, in __new__
metadata._remove_table(name, schema)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
compat.raise_(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 198, in raise_
raise exception
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 592, in __new__
table._init(name, metadata, *args, **kw)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 678, in _init
self._init_items(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 134, in _init_items
spwd(self, **kw)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/base.py", line 1014, in _set_parent_with_dispatch
self._set_parent(parent, **kw)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/schema.py", line 1753, in _set_parent
raise exc.ArgumentError(
Exception Type: ArgumentError at /
Exception Value: Column object 'mathesar_id' already assigned to Table 'Domains'
The error is caused because the create_table
function tries to reuse the same default column, rather than creating copies of it. This will be fixed while bringing code over from the prototype repo.
Have a nice day!
Problem
When a vulnerability is detected in any of our npm dependencies, the current GH workflow would fail at npm audit
step and fail the entire workflow.
Expected
We would ideally not want the pipeline to fail due to audit failure. We would still need an indication that the npm audit
step failed.
Proposed solution
We could set up a GH action that performs npm audit, and raises an issue if that fails.
Problem
Different types in Mathesar will enable different operations; for example, strings could be aggregated by concatenating, but numeric types could be aggregated by summing or multiplying. So far, while we can reflect different types, we have no way to determine the type most appropriate for a column.
Proposed solution
Given a schema
, table_name
, column_name
, and type
, we need to be able to return a boolean giving whether the column can be cast to that type.
Additional context
We may need to take an optional sample size parameter to do this for large data. Performance testing will be necessary.
Problem
@pavish is working on setting up our existing frontend using Svelte + the API. We need to list all tables in a given schema as well as display supported database keys in the frontend. Currently, the schema API does not show table names, and there's no way to get database keys via API at all.
Proposed solution
We should be running tests every time code is pushed so that we can ensure that new code does not break tests.
Problem
Some of the functionality of Mathesar has performance implications w.r.t. the speed with which responses can be expected from the API. For example, splitting and merging tables can be slow at large scale.
In order to understand these implications, we need to have sample data of different sizes, some very large.
Proposed solution
We should create that data via a script to avoid having to sync it in the repo or store it somewhere. This could also be included in the example notebook mentioned in #82 so that developers can see the performance of different operations.
We need wireframes to validate the roadmap against a simple inventory use case.
The inventory use case defines a user's steps to set up a simple database to hold details and information about a specific collection of items.
We need to define the interface components and interactions required to implement the Tables and Views Functionality.
Displaying data in tables and views is part of Mathesar's roadmap. Tables and views are the main ways the users will interact with their data within Mathesar.
Tables and views are a baseline functionality of products in the same category. We want to make Mathesar comparable to other products in the same category.
Users expect the same baseline functionality from products in the same category.
Problem
Our current CSV import functionality was meant to be temporary and has a few issues:
Proposed solution
CSV imports should be possible using the API. There should be separate endpoints for the following tasks:
Problem
Perhaps a user has split a table using the functionality proposed in #67 , but they've done something wrong, or they've thought of a better set of columns to extract from the original table. They may want to put the tables back together (not just be able to view them together, but actually merge them under the hood).
Proposed solution
We should give them the ability to merge the tables back together to recover the previous state of their data.
Describe the bug
Web service starts right after db service is created and Postgres starts, but this does not ensure that Postgres is accepting connections yet. When web service starts during that interval, it is not able to establish connection and does not retry further connection attempts. This happens intermittently when freshly running docker-compose up.
Expected behavior
Problem
Some users (and developers) may want to play around with Mathesar, or experiment with features contained in the db
package, but not yet available through the UI or API.
Proposed solution
We should add an example Jupyter notebook showing what the db
package is able to do.
Is your feature request related to a problem? Please describe.
Currently, if someone uses a pre-existing database in Mathesar, or updates a database outside of Mathesar, these changes are not synced with the tables and schemas stored in the web application.
Describe the solution you'd like
We should ensure that database objects are synced, both during initial database setup and regularly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.