Git Product home page Git Product logo

pyodk's Introduction

opendatakit

The developer wiki (including release notes) and issues tracker are located here.

This site is primarily for developers. If you are not a developer, please visit https://getodk.org/

ODK is a free and open-source set tools which help organizations author, field, and manage mobile data collection solutions. ODK provides an out-of-the-box solution for users to:

  • Build a data collection form or survey;
  • Collect the data on a mobile device and send it to a server; and
  • Aggregate the collected data on a server and extract it in useful formats.

In addition to socio-economic and health surveys with GPS locations and images, ODK is being used to create decision support for clinicians and for building multimedia-rich nature mapping tools. See featured deployments and list of tools for more examples of what the ODK community is doing. We welcome and encourage participation from the user community.

Downloads of the tools are available at Downloads.

pyodk's People

Contributors

ayushanand18 avatar lindsay-stevens avatar lognaturel avatar tobiasmcnulty avatar yanokwa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyodk's Issues

Add client.submissions.edit to edit an existing submission

Use cases

Building block for a data cleaning pipeline that treats Central as the source of truth. User realizes one systematic error was made, they script identifying all of those errors and confirming them, then building updated submissions and submitting them using this method. They also specify a comment that looks something like "Scripted fix for fields foo, bar, baz".

Some kinds of issues I've fixed using similar building blocks:

  • bad character encoding in CSV attachment (so you have F๏ฟฝ๏ฟฝ in your submissions)
  • systematic place name misspellings (open field for administrative region, you want to normalize "Banaadir", "Bandir" to "Banadir")
  • error in select underlying value that was fixed in later form version (so submissions from early form version have ff for female instead of f).

Proposed inputs

instance_id: str
form_id: str
comment: str
xml: str - a string representation of the new XML to submit. Must have the correct deprecatedID and a new instanceID

project_id: Optional[int] = None

Proposed output

Exception in case of any error state.

None in case of success. It could be an object representing the submission metadata after the update but I expect this would generally be ignored.

Encode URL syntax characters in form IDs

Software and hardware versions

pyodk v0.3.0, Python v3.11.3

Problem description

When using a form ID in a URL, pyODK doesn't encode all the characters I'd expect it to. For example, when I run the following:

client.forms.update(project_id=1, form_id='x/y', definition='my_form.xml')

I see that a request is sent to /v1/projects/1/forms/x/y/draft, not /v1/projects/1/forms/x%2Fy/draft.

It seems to me like pyODK encodes many characters, but doesn't encode the following URL syntax characters: ; / ? : @ & = + $ , #

I didn't look at how pyODK uses submission instance IDs in URLs, but those also need to be encoded.

Add entity update

https://docs.getodk.org/central-api-entity-management/#updating-an-entity

Proposed parameters:

  • dataset name (required)
  • entity uuid (required)
  • one of boolean force or integer base_version (one required)
  • label (string, optional)
  • data (dictionary, optional)

I thought about defaulting force to True but that feels dangerous to me. I think it's best for a user to have to make an explicit decision around that.

I also thought about combining force and base_version but can't come up with anything that feels natural.

Add example of filtering a repeat table

Recommendation from a user:

repeats = client.submissions.get_table(form_id='simple_repeat', table_name='Submissions.observation', filter="$root/Submissions/__system/reviewState ne 'rejected'")

obs_df = pd.json_normalize(data=repeats['value'], sep='/')

obs_df.head(3)

Publish 1.0 release

We have been using sub-1.0 releases as we experimented with the structure of the library and got user feedback. We haven't received any negative feedback on the approach we're taking, only requests to add more functionality. I think we should release as 1.0 to signal that we're committed to maintaining what exists so far.

I don't necessarily see a clear release trigger. Maybe we go ahead and do it once #72 is merged?

One thing we could want to address before is #77

Add client.app_users.create to create a new App User and optionally assign forms to it

Use cases

It's common to want to bulk-configure App Users and currently the Central frontend does not make that easy. We also know some folks are getting user lists from external systems so it will always be something that is worth automating.

Sample script: https://gist.github.com/lognaturel/d538a40901aad8e5057bf0aeb8081ea6

Proposed inputs

display_names: List[str] - list of display names for App Users to create

forms: Optional[List[str]] = None - list of form ids. If provided, the App Users are assigned each of the forms.
project_id: Optional[int] = None

Proposed output

A list of objects representing the new App Users or at least the App User tokens. A common next step will be using the tokens to build QR codes. See below for a question about what to do when display name already exists.

Questions

Where should this go?

Should we perhaps introduce client.users or client.access and put everything access-related in one place? We have to consider:

  • App Users, project subresource
  • Web Users, toplevel
  • Public Access Links, form subresource

client.access.create_app_users(["Foo"], ["bar", "baz"]) feels pretty good. But would users know to look for access?

Another option would be to put everything related to App Users in the project service since App Users are nested.

client.projects.create_app_users seems it could be more intuitive.

Should it be responsible for verifying whether there's a displayName match?

Currently displayName is not required by Central to be unique. In practice, it's probably always desirable that it is because they're being treated as usernames. My current thinking is that the most useful thing to do is to make this method a "get or create". First fetch the full list of app users, see if there are matches, and only create App Users with new display names. This is what the sample script above does. If we do it that way, I think we should include the existing App Users in the array that's returned.

New example script for client.forms.update

A recent ODK forum post (links to this blog, and this gist) shows how to use a Google app script create an interactive button that exports the current spreadsheet XLSForm to XLSX and publishes it in Central.

It would be out of scope to incorporate a library or code to work with Google Sheets directly pyODK. However the above would be a good motivating example that pyODK can help with.

While there's no public "create new form" endpoint, the recently added client.forms.update could be used to push incremental updates to Central. For the initial form creation, the client.post method could be used.

Support entities

Should we also create an entity_list module by analogy to the form/submission separation?

Broken links in docs

I'm noticing a couple of broken links in the docs:

  • Clicking "HTTP verb methods" from the body of the overview page doesn't change the page.
  • On the page on "HTTP verb methods", clicking the example link on the bottom of the page results in a 404.

Add forms.update parameter to set version

In #6, we had explored including a version_updater: Optional[Callable[string, string]] parameter and then excluded it at implementation time.

I have received a request for it:

In pyODK, the attachment update is defaulted to timestamp if definition is None. Can this be make over by optionally specify a custom string which derived from a function (where previous version 'V00' is changed to 'V01' when definition is None).

As far as I know, the user had not seen our initial plans so it's interesting that the request came right away.

The method should throw an exception if both a definition and version_updater are specified.

If version_updater is specified, the method should create the draft, get the version from the draft metadata, and then call version_updater on it.

Add support for multiple servers in config

I work with lots of different servers and it's a pain to constantly edit .pyodk_config.toml. I'd love to be able to switch profiles on the fly.

Given this .pyodk_config.toml, I'd like to be able to call Client(profile="demo") to use the demo url/user/pass and Client(profile="test") to use the test profile.

[demo]
base_url = "https://demo.getodk.cloud"
username = "alice"
password = "bob"

[default]
base_url = "https://www.example.com"
username = "my_user"
password = "my_password"
default_project_id = 123

[test]
base_url = "https://test.getodk.cloud"
username = "foo"
password = "bar"

Note that I have [default] for the default profile. Our docs for the single server config currently uses [central], so maybe if there is only one profile, that is the default regardless of what the header is. If there are two or more, then the default needs to be called [default]. Then we can update the docs to use [default].

Should we do something more thoughtful when a new Central API method is called against an older server?

A user can call a pyodk method that uses an endpoint that doesn't exist on their Central server. Currently, they'll receive a 404 when that happens. There can be more subtle cases like using query parameters that don't exist or relying on behavior that has evolved a bit.

We could have a concept of a required Central version and check that against the server to provide better errors.

I am currently feeling like it's not necessary but I do think about it every time we add new functionality. Maybe this is something we could capture our current philosophy around in the README? What do you think @lindsay-stevens?

Encode Unicode in X-XlsForm-FormId-Fallback

Software and hardware versions

pyodk 0.3.0, Python v3.11.3

Problem description

I'm noticing that pyODK doesn't encode the X-XlsForm-FormId-Fallback header. Central expects the header to be ASCII. Unicode is expected to be URL-encoded. (pyxform-http is the one to decode it.) This came up in Central in getodk/central#196.

That said, I'm not sure to what extent this is a real problem. I tried using client.forms.update() to send an XLSForm with Unicode in its filename, and pyODK seemed happy to send a Unicode header. If the Central API and pyxform-http are happy to receive a Unicode header, then the only issue would be filenames that contain % (filenames for which the filename and the URL-decoded filename are not the same).

Design and provide convenience wrapper for getting all submission data

Currently client.submissions.get_table has an interface that closely matches the backend API and it returns a dictionary that matches the structure of the raw JSON.

Some high-level things to consider for the parameters:

  • make it easier to specify a date range to filter submissions by
  • make it easier to specify a list of review states to include
  • unify json and csv downloads -- they're not very different from a user perspective
  • pull media if a path to store it is specified
  • add the Submissions. prefix for repeats (see repeats example)

Ideally the result could use type information in some way. Some ideas we've discussed:

  • provide a companion endpoint to get types and document how to get that into pandas
  • return json['value']: that would be the naturally-expected json structure
  • return a normalized and typed pandas dataframe (downside: lib depends on pandas)
  • deserialize into dynamically generated classes (downside: complex, and is it really what people want?)

Add client.forms.update to publish form updates (form def + optional attachments and attachments only)

Use cases

Some folks make periodic updates to their form definitions on disk and/or in a shared network resource (e.g Google Drive) and would like their published forms on Central to always reflect those changes.

Some folks have entities in CSV/XML/GeoJSON attachments that they need to periodically update based on form submissions, updates to an external system, etc. In that case they want to update their entity lists but keep the same form definition.

Sample script: https://gist.github.com/lognaturel/2b251e54a3f3fe1e233435019b15ee17

Proposed inputs

Variant 1 to update the form definition and optionally some attachments:

form_id: str
xlsform: str - path to XLSForm

project_id: Optional[int] = None
attachments: Optional[List[str]] = None - list of paths to files to attach to the form

Variant 2 to update attachments and keep existing form definition:

form_id: str
attachments: List[str] - list of paths to files to attach to the form

version_updater: Optional[Callable[string, string]] - if specified, the currently-published form version is fetched, it's passed into the function, and the result is used as the version for publish. If not specified, datetime.now().isoformat() is used as the new version.
project_id: Optional[int] = None

Proposed outputs

Exception in case anything goes wrong at any point -- XLSForm validation error, draft publishing error, etc.

I don't think any return value is necessary for the first variant. For the second variant, at least the version should be returned. In both cases it would be ok to return some kind of object representing form metadata though I think it would generally be ignored (except maybe for the version).

Add form creation

I'd love to be able to call a method where I pass in a projectID and a path to an file (XLS or XML) along with ignoreWarnings and publish parameters to programmatically create a form.

I'd expect both of these parameters to default to true because that's my expectation of a form publishing integration, but I'm open to having my mind changed.

I expect the method to set the correct Content-Type based on the file at the path. As we do with pyxform-http, we should set an X-XlsForm-FormId-Fallback to a uuid.

Allow building a client without resource management

Discussion at #4

The goal is to be able to document client = Client() as the default way to build a client. Like ckanapi, we will document using with for those who are comfortable with that syntax and who are working in a context in which it makes sense (i.e. an analysis-oriented Jupyter notebook).

Since people have started using the library and have used client = Client().open() it would be ideal to either continue to support that as well or throw a helpful error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.