Git Product home page Git Product logo

csv-to-firestore's Introduction

CSV to Firestore

The CSV to Firestore solution takes a CSV file from a Cloud Storage bucket, parses it and sends it to Firestore. The solution is automatically triggered when a new file is uploaded in the Cloud Storage bucket. In order to serve a variety of applications, the solution allows you to (1) select which cloud bucket to use, (2) specify to which collection to send the data and (3) if you want to use a specific column as document id.

Parsing Example

CSV file with id as document id:

id, product_name, price_usd
1,television,399
2,water bottle,15
3,glass mug,5

Data in Firestore:

Screenshot of data in Firestore

Deployment

The cloud function requires the collection id the be specified in the filename as the following: "filename[collection=YOUR_COLLECTION_ID].csv" Optionally it is also possible to add [key=YOUR_COLUMN_FOR_DOCUMENT_ID] to the filename to specify which column to use for the document id. If no column is specified, firestore will create a random id.

Retrieve the repository by running the following command:

git clone https://github.com/Google/csv-to-firestore

Complete and run the following command to deploy the cloud function.

gcloud functions deploy csv_to_firestore \
  --runtime python39 \
  --trigger-resource YOUR_TRIGGER_BUCKET_NAME \
  --trigger-event google.storage.object.finalize \
  --entry-point csv_to_firestore_trigger \
  --source PATH_TO_SOURCE_CODE \
  --memory=1024MB \
  --set-env-vars=UPLOAD_HISTORY=TRUE/FALSE,EXCLUDE_DOCUMENT_ID_VALUE=TRUE/FALSE \
  --timeout=540

Complete the following parameters in the command:

  1. YOUR_TRIGGER_BUCKET_NAME: The path of the cloud storage bucket that triggers the cloud function.
  2. PATH_TO_SOURCE_CODE: The path to the folder that contains main.py and requirements.txt ( use . for the current directory )
  3. UPLOAD_HISTORY: TRUE or FALSE depending on if you want to create a separate collection that keeps file upload history.
  4. EXCLUDE_DOCUMENT_ID_VALUE: TRUE or FALSE. When a document id is specified in the filename the solution stores a value, such as "id" in both the document id and the data in this document. If this is not desired, set this EXCLUDE_DOCUMENT_ID_VALUE to TRUE so that it is only stored as a document id
  5. Optionally you can specify the region or other parameters, see documentation here: https://cloud.google.com/sdk/gcloud/reference/functions/deploy

Note: After deploying the Cloud Function the logs might display a "OpenBLAS WARNING". This is the result of some of the used packages and does not influence the functionality of the Cloud Function.

Deploying BQ Export to Firestore

If you have your data on BigQuery and want to set up an automated workflow to export this table to Firestore you can follow the following instructions.

  1. Install Terraform
  2. Set up the variables for terraform in the example.tfvars file. See variables.tf for description of each variable.
  3. Run: terraform init from the terraform directory
  4. Run: terraform plan -var-file="example.tfvars" to see the planned changes
  5. Run: terraform apply -var-file="example.tfvars" to deploy the Cloud Function and set up the workflow.
  6. (optional) If you want to maintain your Terraform state on GCP instead of locally; navigate to the backend.tf file and uncomment the resource. Fill in the bucket name "resource.google_storage_bucket.backend" and run terraform init to sync terraform with GCS rather than local state saving and recovery.

Guided tutorial for deployment

Open in Cloud Shell

Disclaimer

This is not an officially supported Google product. Please be aware that bugs may lurk, and that we reserve the right to make small backwards-incompatible changes. Feel free to open bugs or feature requests, or contribute directly (see CONTRIBUTING.md for details).

csv-to-firestore's People

Contributors

halleyinteractive avatar psnelg avatar tdsymonds avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csv-to-firestore's Issues

Incorrect reference to Terraform Variable for bq-to-gcs - Code breaker

Hey,

When in bq_to_csv_tf, the wrong reference is made to the variable gcs_export_bucket, but it should be csv_key_column

Code line 41:
file_name = "bq_export[collection=${var.fs_collection}][key=${var.gcs_export_bucket}].csv"

Should be:
file_name = "bq_export[collection=${var.fs_collection}][key=${var.csv_key_column}].csv"

Many thanks!

Deploying Issue for CSV to Firestore

We are deploying using the example command in the documentation.

And we see

ERROR: (gcloud.functions.deploy) wrong collection: expected [storage.objects], got [storage.buckets], for path

Is this our issue or a version issue or a source issue?

We'd like to be able to deploy the function.

The full debug trace is shown below:

I'm not sure if it matters but the local python installed is 3.9.18

python -V
Python 3.9.18

$ gcloud functions deploy csv_to_firestore --runtime python39 --trigger-resource gs://citytri-marketing.appspot.com/config --trigger-event google.storage.object.finalize --entry-point csv_to_firestore_trigger --source . --memory=1024MB --set-env-vars=UPLOAD_HISTORY=FALSE,EXCLUDE_DOCUMENT_ID_VALUE=FALSE --timeout=540 --verbosity debug
DEBUG: Running [gcloud.functions.deploy] with arguments: [--entry-point: "csv_to_firestore_trigger", --memory: "1024MB", --runtime: "python39", --set-env-vars: "OrderedDict([('UPLOAD_HISTORY', 'FALSE'), ('EXCLUDE_DOCUMENT_ID_VALUE', 'FALSE')])", --source: ".", --timeout: "540", --trigger-event: "google.storage.object.finalize", --trigger-resource: "gs://citytri-marketing.appspot.com/config", --verbosity: "debug", NAME: "csv_to_firestore"]
DEBUG: Starting new HTTPS connection (1): cloudfunctions.googleapis.com:443
DEBUG: https://cloudfunctions.googleapis.com:443 "GET /v2/projects/web3-marketing/locations/us-central1/functions/csv_to_firestore?alt=json HTTP/1.1" 404 None
In a future Cloud SDK release, new functions will be deployed as 2nd gen  functions by default. This is equivalent to currently deploying new  with the --gen2 flag. Existing 1st gen functions will not be impacted and will continue to deploy as 1st gen functions.
You can preview this behavior in beta. Alternatively, you can disable this behavior by explicitly specifying the --no-gen2 flag or by setting the functions/gen2 config property to 'off'.
To learn more about the differences between 1st gen and 2nd gen functions, visit:
https://cloud.google.com/functions/docs/concepts/version-comparison
DEBUG: (gcloud.functions.deploy) wrong collection: expected [storage.objects], got [storage.buckets], for path [gs://citytri-marketing.appspot.com/config]
Traceback (most recent call last):
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 998, in Execute
    resources = calliope_command.Run(cli=self, args=args)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 815, in Run
    resources = command_instance.Run(args)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/v1/util.py", line 387, in CatchHTTPErrorRaiseHTTPExceptionFn
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/command_lib/functions/util.py", line 88, in Run
    return self._RunV1(args)
           ^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/surface/functions/deploy.py", line 145, in _RunV1
    return command_v1.Run(args, track=self.ReleaseTrack())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/command_lib/functions/v1/deploy/command.py", line 460, in Run
    trigger_params = trigger_util.GetTriggerEventParams(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/command_lib/functions/v1/deploy/trigger_util.py", line 216, in GetTriggerEventParams
    return _GetEventTriggerEventParams(trigger_event, trigger_resource)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/command_lib/functions/v1/deploy/trigger_util.py", line 167, in _GetEventTriggerEventParams
    trigger_resource = storage_util.BucketReference.FromUrl(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/api_lib/storage/storage_util.py", line 166, in FromUrl
    return cls(resources.REGISTRY.Parse(url, collection='storage.buckets')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/core/resources.py", line 1201, in Parse
    return self.ParseStorageURL(line, collection=collection)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ralph/Downloads/google-cloud-sdk/lib/googlecloudsdk/core/resources.py", line 1103, in ParseStorageURL
    raise WrongResourceCollectionException('storage.objects', collection,
googlecloudsdk.core.resources.WrongResourceCollectionException: wrong collection: expected [storage.objects], got [storage.buckets], for path [gs://citytri-marketing.appspot.com/config]
ERROR: (gcloud.functions.deploy) wrong collection: expected [storage.objects], got [storage.buckets], for path [gs://citytri-marketing.appspot.com/config]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.