Git Product home page Git Product logo

ecommerce-workshop's Introduction

eCommerce Observability in Practice

This is a repo demonstrating applying observability principals to an eCommerce app.

In this hypothetical scenario, we've got a Spree website, that a team has started adding microservices to. In it's current state, the application is broken.

storedog

We'll take that broken application, instrument it with Datadog, and then deploy a fix. After deploying a fix, we'll look into Datadog to ensure our deploy worked, and that our systems are actually functioning properly.

Structure of the repository

This repository is used to build the Docker images to run the application in the different states. The folders that build each of the images are the following:

  • ads-service- The advertisement microservice with a couple of injected sleeps.
  • ads-service-fixed- The advertisement microservice with the sleeps removed.
  • ads-service-errors- The advertisement microservice that will return 500 errors on the /ads endpoint
  • ads-service-versions- The advertisement microservice on several versions showcasing the deployment functionality
  • discounts-service- The discounts microservice with an N+1 query and a couple of sleeps.
  • discounts-service-fixed- The discounts microservice with the N+1 query fixed and the sleeps removed.
  • store-frontend-broken-no-instrumentation- The Spree application in a broken state and with no instrumentation. This is the initial scenario.
  • store-frontend-broken-instrumented- The Spree application in a broken state but instrumented with Datadog APM. This is the second scenario.
  • store-frontend-instrumented-fixed- The Spree application instrumented with Datadog APM and fixed. This is the final scenario.
  • traffic-replay- Looping replay of live traffic to send requests to frontend (see Creating Example Traffic for details)

Feel free to follow along with the scenario, or to run the application locally.

Building the docker images

Follow the specific guide for building the images

Deploying the application

The deploy folder contains the different tested ways in which this application can be deployed.

Enabling Real User Monitoring (RUM)

Real User Monitoring is enabled for the docker-compose-fixed-instrumented.yml docker compose and the Kubernetes frontend.yaml deployment.

To enable it, you'll need to log into Datadog, navigate to RUM Applications, and create a new application.

Once created, you'll get a piece of Javascript with an applicationId and a clientToken.

Pass these environment variables to docker-compose:

$ DD_API_KEY=<YOUR_API_KEY> DD_CLIENT_TOKEN=<CLIENT_TOKEN> DD_APPLICATION_ID=<APPLICATION_ID> POSTGRES_USER=<POSTGRES_USER> POSTGRES_PASSWORD=<POSTGRES_PASSWORD> POSTGRES_HOST="db" docker-compose -f docker-compose-fixed-instrumented.yml up

Or uncomment the following lines in the frontend.yaml if in Kubernetes, adding your applicationID and your clientToken:

# Enable RUM
# - name: DD_CLIENT_TOKEN
#   value: <your_client_token>
# - name: DD_APPLICATION_ID
#   value: <your_application_id>

After the site comes up, you should be able to navigate around, and then see your Real User Monitoring traffic show up.

Creating Example Traffic To Your Site

The scenario uses GoReplay to spin up traffic our own (dysfunctional) stores, and then diagnose and fix them with replayed live traffic.

This way, we don't have to manually click around the site to see all the places where our site is broken.

Containerized replay

Building and running manually

Example traffic can be perpetually sent via the traffic-replay container. To build and run it via Docker and connect it to your running cluster in docker-compose (by default the docker-compose_default network is created).

cd traffic-replay
docker build -t traffic-replay .
docker run -i -t --net=docker-compose_default --rm traffic-replay

By default this container will send traffic to the host http://frontend:3000 but can be customized via environment variables on the command line or in the below example via Docker Compose. This facilitates the use of load balancers or breaking apart the application.

environment:
  - FRONTEND_HOST=loadbalancer.example.com
  - FRONTEND_PORT=80

Running via Docker Compose

We automatically build new traffic-replay containers on every release and you can spin up the traffic-replay container with your Docker Compose cluster by adding the config as an override via the example below.

POSTGRES_USER=postgres POSTGRES_PASSWORD=postgres docker-compose -f deploy/docker-compose/docker-compose-broken-instrumented.yml -f deploy/docker-compose/docker-compose-traffic-replay.yml up

Any of the other docker compose configurations can work with this traffic container just by adding another -f deploy/docker-compose/docker-compose-traffic-replay.yml to the compose command.

Viewing Our Broken Services in Datadog

Once we've spun up our site, and ship traffic with gor, we can then view the health of the systems we've deployed. But first, let's see the structure of our services by visiting the Service Map.

Datadog Service Map

Here, we can see we've got a store-frontend service, that appears to connect to SQLite as its datastore. Downstream, we've got a discounts-service, along with an advertisements-service, both of which connect to the same PostgreSQL server.

With this architecture in mind, we can then head over to the Services page, and see where the errors might be coming from. Is it one of the new microservices?

Datadog Services List

Looking at the services list, we can sort by either latency, or errors. Looking at all our services, it appears only the store-frontend has errors, with an error rate of ~5%.

By clicking on the store-frontend, we can then drill further and see which endpoints are causing trouble. It seems like it's more than one:

View Trace

We could click one level down and view one of the endpoints that's generating a trace. We can also head over to the Traces page, and sort by error traces:

Trace Errors

With this, we've got a line number that appears to be generating our errors. Checking across multiple traces, we can see the same behavior. It looks like the new advertisement call was pushed to the wrong file.

Finding a Bottleneck

Once we've applied the fix for the wrong file, we still see slow behavior. We can drill down into a service and see where the bottleneck is by sorting via latency on the Services Page:

Bottleneck

There are a couple of sleeps in both the discounts service and the advertisments service.

Removing the lines that contains:

time.sleep(2.5)

will fix the performance issue.

The code with the leftover sleeps lives in discounts-service and ads-service, and the fixed versions live in discounts-service-fixed and ads-service-fixed.

Finding an N+1 Query

In the discounts-service, there is an N+1 query:

N+1 Query

The problem is a lazy lookup on a relational database.

By changing the line:

discounts = Discount.query.all()

To the following:

discounts = Discount.query.options(joinedload('*')).all()

We eager load the discount_type relation on the discount, and can grab all information without multiple trips to the database:

N+1 Solved

The N+1 query example lives in discounts-service/, and the fixed version lives in discounts-service-fixed/.

Using different versions of the advertisement service to showcase the deploy comparison feature

It might be useful for certain workshops and demos (i.e. things related to canary deployments, feature flags, general APM ones) to showcase different versions of the advertisements services, and to compare the versions using Datadog.

To enable this scenario using Kubernetes:

  1. Deploy the ecommerce application in your cluster:
kubectl apply -f deploy/generic-k8s/ecommerce-app
  1. Delete the original advertisements deployment:
kubectl delete -f deploy/generic-k8s/ecommerce-app/advertisements.yaml
  1. Deploy version 1.0 of the advertisements service:
kubectl apply -f deploy/generic-k8s/ecommerce-ads-versions/advertisements_v1.yaml

This version deploys version 1.0 of the advertisements version. This version always show a blue ads banner, also showing an advertisement number 1.x.

  1. Deploy version 2.0 of the advertisement service:
kubectl apply -f deploy/generic-k8s/ecommerce-ads-versions/advertisements_v2.yaml

This version deploys version 2.0 of the advertisements version. This version always show a red ads banner, also showing an advertisement number 2.x. It also has an added latency of 0.5 seconds.

This keeps both deployments (v1 and v2) in the cluster, making a simple 50/50 canary deployment. As the service selectors are the same, 50% of the time, it will show an advertisement coming from version 1.0 and sometimes it will serve a banner coming from version 2.0.

2 versions of the ads service

  1. Compare versions during the canary

Show in Datadog that even though the new version deploys the correct feature, it adds extra latency that we didn't see on previous versions.

To compare both versions, you can use the "Deployments" section of the advertisements service:

Version 2.0 extra latency

  1. Deploy a patch release to fix the issue
kubectl apply -f deploy/generic-k8s/ecommerce-ads-versions/advertisements_v2_1.yaml

This versions has the same functionality as 2.0 but it doesn't have the increased latency. This replaces version 2.0.

  1. Compare versions in Datadog

You can now compare version 1.0 and 2.1 and see that the increased latency is gone and things seems to be working fine.

Version 2.1 solves extra latency

How to run synthetics locally

  1. Install @datadog/datadog-ci via NPM or Yarn globally on your local machine:
npm install -g @datadog/datadog-ci
yarn global add @datadog/datadog-ci
  1. Obtain the API and APP Key from the DD corpsite account:
  2. From the project root, run the following: DD_API_KEY="<API_KEY>" DD_APP_KEY="<APP_KEY>" make synthetics-start

To add a new test:

  1. Generate a synthetics test via the DD app
  2. Grab the public ID of the test (found in the top left of the synthetic page or in the URL) and add it to storedog.synthetics.json.

ecommerce-workshop's People

Contributors

arapulido avatar bbonislawski avatar bdq avatar benmorganio avatar burningion avatar calas avatar cmar avatar damianlegawiec avatar danabrit avatar davidnorth avatar geekoncoffee avatar huoxito avatar jdutil avatar jhawthorn avatar jordan-brough avatar krzysiek1507 avatar lbrapid avatar mafi88 avatar martinisoft avatar mauazua avatar nishant-cyro avatar parndt avatar paulcc avatar peterberkenbosch avatar priyank-gupta avatar radar avatar reinaris avatar romul avatar schof avatar tanmay3011 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ecommerce-workshop's Issues

Problems building store-frontend-broken-no-instrumentation

On attempt to build the store-frontend-broken-no-instrumentation it looks like there's an error with a pulled depedency

docker build .
[+] Building 13.6s (11/13)
 => [internal] load build definition from Dockerfile                       0.0s
 => => transferring dockerfile: 694B                                       0.0s
 => [internal] load .dockerignore                                          0.0s
 => => transferring context: 2B                                            0.0s
 => [internal] load metadata for docker.io/library/ruby:2.5.1              0.7s
 => [1/9] FROM docker.io/library/ruby:2.5.1@sha256:ac6661b87cf49af14b1930  0.0s
 => [internal] load build context                                          0.9s
 => => transferring context: 28.66MB                                       0.9s
 => CACHED [2/9] RUN apt-get update -qq &&   apt-get install -y build-ess  0.0s
 => CACHED [3/9] COPY . /spree                                             0.0s
 => CACHED [4/9] WORKDIR /spree                                            0.0s
 => CACHED [5/9] RUN bundle install                                        0.0s
 => CACHED [6/9] RUN bundle update sassc && bundle exec rake store-fronte  0.0s
 => ERROR [7/9] RUN cd store-frontend && bundle update sassc              11.8s
------
 > [7/9] RUN cd store-frontend && bundle update sassc:
#9 0.530 fatal: Not a git repository (or any of the parent directories): .git
#9 0.537 fatal: Not a git repository (or any of the parent directories): .git
#9 0.543 fatal: Not a git repository (or any of the parent directories): .git
#9 0.546 fatal: Not a git repository (or any of the parent directories): .git
#9 0.550 fatal: Not a git repository (or any of the parent directories): .git
#9 0.554 fatal: Not a git repository (or any of the parent directories): .git
#9 0.559 fatal: Not a git repository (or any of the parent directories): .git
#9 0.749 The dependency tzinfo-data (>= 0) will be unused by any of the platforms Bundler is installing for. Bundler is installing for ruby but the dependency is only for x86-mingw32, x86-mswin32, x64-mingw32, java. To add those platforms to the bundle, run `bundle lock --add-platform x86-mingw32 x86-mswin32 x64-mingw32 java`.
#9 0.751 Fetching https://github.com/spree/spree_gateway.git
#9 1.666 Fetching https://github.com/spree/spree_auth_devise.git
#9 3.011 Fetching gem metadata from https://rubygems.org/...........
#9 6.438 Fetching gem metadata from https://rubygems.org/.............
#9 8.264 Fetching gem metadata from https://rubygems.org/.............
#9 10.85 Resolving dependencies....
#9 11.35 Your bundle is locked to mimemagic (0.3.3), but that version could not be found
#9 11.35 in any of the sources listed in your Gemfile. If you haven't changed sources,
#9 11.35 that means the author of mimemagic (0.3.3) has removed it. You'll need to update
#9 11.35 your bundle to a version other than mimemagic (0.3.3) that hasn't been removed
#9 11.35 in order to install.
------
executor failed running [/bin/sh -c cd store-frontend && bundle update sassc]: exit code: 7
make: *** [build] Error 1

Add to cart does not work in embedded synthetic browser test recorder

Clicking on the Add to Cart button for a product in the embedded synthetic browser test recorder results in "Your cart is empty."

I see no relevant errors in the browser console, logs, or traces. I suspect it might be a cookie problem related to being in an iframe. I dug around in the code to find where I could set SameSite=None on the token and guest_token cookies, but failed.

Release ecommerce-workshop 1.0

We are going to start versioning our containers and repository so the training material that depends on this project can stay stable and predictable. The following checklist will become part of the RELEASING.md doc which will eventually be fully automated. For now it'll be a very manual procedure because what is life without pain before automation to take that away? 🤔

Checklist:

  • Verify you are logged in to dockerhub as ddtraining (Look in the menu for Docker Desktop, you should see a ddtraining item just above the "Quit Docker Desktop" option). There is no reliable way to check this programmatically right now 😢)
  • Change the CHANGELOG.md entry by changing [unreleased] to [1.0.0] YYYY-MM-DD
  • Commit the changelog with a message: git add CHANGELOG.md && git commit --message="Release version 1.0.0"
  • Create a new git tag git tag --annotate 1.0.0 --message="Release 1.0.0"
  • Push the tag and commit up git push && git push --tags
  • Go to the tags page and click the three dots to the right to "Create release". In the prompt, copy and paste the CHANGELOG.md notes for the 1.0.0 release in the body and put "1.0.0" for the title.
  • Pull down the latest advertisements image docker pull ddtraining/advertisements:latest
  • Tag the advertisements latest image with the new version number docker tag ddtraining/advertisements:latest ddtraining/advertisements:1.0.0
  • Push the advertisements image up docker push ddtraining/advertisements:1.0.0
  • Pull down the latest advertisements-fixed image docker pull ddtraining/advertisements-fixed:latest
  • Tag the advertisements latest image with the new version number docker tag ddtraining/advertisements-fixed:latest ddtraining/advertisements-fixed:1.0.0
  • Push the advertisements-fixed image up docker push ddtraining/advertisements-fixed:1.0.0
  • Pull down the latest discounts image docker pull ddtraining/discounts:latest
  • Tag the discounts latest image with the new version number docker tag ddtraining/discounts:latest ddtraining/discounts:1.0.0
  • Push the discounts image up docker push ddtraining/discounts:1.0.0
  • Pull down the latest discounts-fixed image docker pull ddtraining/discounts-fixed:latest
  • Tag the discounts-fixed latest image with the new version number docker tag ddtraining/discounts-fixed:latest ddtraining/discounts-fixed:1.0.0
  • Push the discounts-fixed image up docker push ddtraining/discounts-fixed:1.0.0
  • Pull down the latest storefront image docker pull ddtraining/storefront:latest
  • Tag the storefront latest image with the new version number docker tag ddtraining/storefront:latest ddtraining/storefront:1.0.0
  • Push the storefront image up docker push ddtraining/storefront:1.0.0
  • Pull down the latest storefront-fixed image docker pull ddtraining/storefront-fixed:latest
  • Tag the storefront-fixed latest image with the new version number docker tag ddtraining/storefront-fixed:latest ddtraining/storefront-fixed:1.0.0
  • Push the storefront-fixed image up docker push ddtraining/storefront-fixed:1.0.0

Add attack container

As a course developer in the datadog training platform I can access a Kali Linux or similar container with attack tools. The container should feature flag spin up with the rest of the environment and provide the ability to run specific automation.

  • Brute force attack
  • Dirbuster

The container bootstrap should be such that we can extend the number of attacks and containers impacted.

Compliance Bugz

Identify and implement one or more compliance bugs that can be placed into the ecommerce app. These could be unauthenticated APIs, TLS Misconfiguration, SSH Misconfiguration, etc. Feature flag these on for specific scenarios.

These will be used as part of the security foundations training scenario.

Discount codes should work

Currently, applying a discount code to a shopping cart does not work; the UI displays a "code not found" error. It would be awesome if storedog actually used the discounts service to reduce checkout totals.

Use case: a synthetic browser test demo that would scrape the code and value from the home page and apply the code to the cart. It would then do the math to confirm that the new cart total is correct.

Convert generic k8s manifests to HELM chart

To make things a bit more agnostic with Kubernetes deployments, it will be helpful to package the ecommerce deployment story into a (potentially publishable?) HELM chart.

Update Image Building Process

Right now we're pointing to images built in personal accounts. Let's move this to a shared account for all learning labs, and put it in 1Password.

Update/correct logging for store-frontend

Extra log line are ingested and displayed in DD Logs UI for the store-frontend. Chatted with Aaron and he says that logging needs to be updated for the service.

Example of logs for fixed store-frontend:
basev1-fixed-frontend

Example of logs for broken store-frontend:
basev1-broken-frontend

Add N+1 Query Example

Replace the sleep errors in the app with N+1 query per this suggestion:

Maybe we can make it an N+1 query? Because there would be a hint on the flame graph as well so that could be worked into the steps to look there, notice it's repeatedly hitting the db, and then go look? More realistic than a sleep in production, and also showcases how datadog APM gets you to a root cause

Expose Microservice Routes as Environment Variables

For now, the Docker image build process assumes we're using docker-compose's network names as shown here:

https://github.com/DataDog/ecommerce-workshop/blob/master/store-frontend-broken-instrumented/store-frontend/app/controllers/discounts_controller.rb#L3

and here:

@ads = JSON.parse(Net::HTTP.get_response(URI('http://advertisements:5002/ads')).body)

Expose these (now assumed) network routes as environment variables so we don't have to use docker-compose or build custom images with specific routes for workshops.

RUM not working

Hello
We have the storedog deployed on GKE. Using the "ecommerce-workshop/deploy/gcp/gke/" ".yaml"'s except the datadog-agent.yaml. The frontend.yaml is configured like this.

image

But we cant see the RUM on Datadog. We are on the "datadoghq.eu" site.

It would be possible to add the "season replay" too?

Many thanks

Expose and increase Shopping Cart functionality

For an upcoming course about custom span tags, custom method instrumentation, and generating metrics from spans....we need access to some interesting functionality to tell the story.

Currently the narrative we are looking to hit is that of a problem with the shopping cart. The shopping cart should sometimes fail, sometimes be successful through checkout, and also allow for custom tagging to expose cart totals.

The current cart implementation doesn't lend to demonstrating this functionality, nor being able to tag spans or methods associated.

Be explicit with env tags

A few people have noted different env settings depending on the service. The env tag is explicitly set in the Ruby service, but not the downstream Python services.

Let's make sure every service has an explicitly set environment, so we have consistent services.

Errors in discounts/ads not logging properly

Noticed that the exceptions in the python services are not being logged as a single line. Seeing that APM captures the exception properly but the log gets split across multiple lines.

Screen Shot 2021-02-02 at 10 20 35 AM

Migrate repo to main default branch from master

For better inclusivity in the modern tech world, we should be ditching old terminology like master/slave for main/leader/follower everywhere we can. This is the new default for Github already and migrating will have bigger implications for downstream consumers of this repo since it wasn't setup with main by default.

Logging fails

The application should be deployable with logs in an unstructured state. The user should be able to modify the logging library or formats in order to provide a normalized log structure.

Unable to deploy Store Fronted Application

Attempting to deploy the stack locally using Docker Compose with the help of a Make command.
make local-start
Getting below errors:

2023-10-29 17:33:50 /usr/local/bundle/gems/puma-3.12.6/lib/puma/dsl.rb:43:in `read': No such file or directory @ rb_sysopen - config/puma.rb (Errno::ENOENT)
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/lib/puma/dsl.rb:43:in `_load_from'
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/lib/puma/configuration.rb:194:in `block in load'
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/lib/puma/configuration.rb:194:in `each'
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/lib/puma/configuration.rb:194:in `load'
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/lib/puma/launcher.rb:61:in `initialize'
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/lib/puma/cli.rb:71:in `new'
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/lib/puma/cli.rb:71:in `initialize'
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/bin/puma:8:in `new'
2023-10-29 17:33:50     from /usr/local/bundle/gems/puma-3.12.6/bin/puma:8:in `<top (required)>'
2023-10-29 17:33:50     from /usr/local/bundle/bin/puma:23:in `load'
2023-10-29 17:33:50     from /usr/local/bundle/bin/puma:23:in `<main>

Screenshot 2023-10-29 at 17 33 32
Screenshot 2023-10-29 at 17 34 00

Add a "production" mode to the Rails app

Right now, the app is running in development mode.

This allows for live reloading within workshop environments, and makes it possible to edit code and instrument without needing to rebuild containers.

As the Kubernetes environment already uses prebuilt containers, we should add images that are running in the production Rails environment.

ARM build support

The Datadog agent supports ARM64 v8 so it would be handy to have ARM64 docker builds along side the x64 builds we ship now.

RUM config `env` value should come from environment var

The RUM configuration for Storedog is hardcoded to production. Proposal for template (there are a few):

sed -i 's/production/<%= ENV["DD_ENV"] %>/g' ./store-frontend-instrumented-fixed/store-frontend/app/views/spree/layouts/spree_application.html.erb

Auto-generate RUM data

Script a headless browser to interact with the Storedog UI. Perhaps part of the traffic container? Selenium, PhantomJS, etc.

Tracer contexts are not working in logs

A new bug introduced with #80 is making the dd-trace-rb library not emit any tracing data. During startup there is this log message which might be a regression in the auto-instrumentation library.

W, [2021-03-30T21:25:44.904436 #8]  WARN -- ddtrace: [ddtrace] Unabe to enable Datadog Trace context, Logger #<SemanticLogger::Logger:0x0000556c61b81ac8> is not supported

I will file a bug with the tracer and work with them to get this resolved. For now, it's blocking the 1.0 release.

Discounts and Ads addresses and ports should be configurable

Right now the storefront app has discounts address hardcoded to discounts:5001 and ads to advertisements:5002. Ideally, both the address and ports should be configurable, as this is very specific to the current docker-compose & k8s deployment configuration

Getting NoMethodError after deployment using generic-k8s method

Full disclosure, I haven't used this project in quite some time and my fork of the code became outdated with deprecated ruby dependency packages. While trying to get a more recent snapshot of the code, I deployed the project using the latest code from the main branch. The default page loads eventually after a couple of refreshes and the DB is populated with records and schema. However the problem is I get this error message when clicking the Cart button or trying to navigate to the /admin url.

image

I have tried using all three ads-service folders source code (broken, fixed, errors) with the same result. Also tried alternate code for the other services (discount and store-frontend) with no success.

Something to note is I'm not running Datadog on my cluster presently, but i would imagine this would be fine. I do see errors reported in the logs.

[advertisements] 2021-08-06 06:55:40,688 ERROR [ddtrace.internal.writer] [writer.py:202] [dd.service=advertisements-service dd.env=development dd.version=1.0.1 dd.trace_id=0 dd.span_id=0] - failed to send traces to Datadog Agent at http://172.18.0.5:8126
[advertisements] Traceback (most recent call last):
[advertisements]   File "/usr/local/lib/python3.9/site-packages/ddtrace/internal/writer.py", line 200, in _send_payload
[advertisements]     response = self._put(payload, headers)
[advertisements]   File "/usr/local/lib/python3.9/site-packages/ddtrace/internal/writer.py", line 165, in _put
[advertisements]     conn.request("PUT", self._endpoint, data, headers)
[advertisements]   File "/usr/local/lib/python3.9/http/client.py", line 1257, in request
[advertisements]     self._send_request(method, url, body, headers, encode_chunked)
[advertisements]   File "/usr/local/lib/python3.9/http/client.py", line 1303, in _send_request
[advertisements]     self.endheaders(body, encode_chunked=encode_chunked)
[advertisements]   File "/usr/local/lib/python3.9/http/client.py", line 1252, in endheaders
[advertisements]     self._send_output(message_body, encode_chunked=encode_chunked)
[advertisements]   File "/usr/local/lib/python3.9/http/client.py", line 1012, in _send_output
[advertisements]     self.send(msg)
[advertisements]   File "/usr/local/lib/python3.9/http/client.py", line 952, in send
[advertisements]     self.connect()
[advertisements]   File "/usr/local/lib/python3.9/http/client.py", line 923, in connect
[advertisements]     self.sock = self._create_connection(
[advertisements]   File "/usr/local/lib/python3.9/socket.py", line 843, in create_connection
[advertisements]     raise err
[advertisements]   File "/usr/local/lib/python3.9/socket.py", line 831, in create_connection
[advertisements]     sock.connect(sa)
[advertisements] ConnectionRefusedError: [Errno 111] Connection refused
[advertisements] 2021-08-06 06:55:43,022 INFO [werkzeug] [_internal.py:113] [dd.service=advertisements-service dd.env=development dd.version=1.0.1 dd.trace_id=0 dd.span_id=0] - 10.244.1.5 - - [06/Aug/2021 06:55:43] "GET /ads HTTP/1.1" 200 -
[advertisements] 2021-08-06 06:55:43,025 INFO [bootstrap] [ads.py:25] [dd.service=advertisements-service dd.env=development dd.version=1.0.1 dd.trace_id=1143986986509166125 dd.span_id=3336431127800903886] - attempting to grab banner at 2.jpg
[advertisements] 2021-08-06 06:55:43,027 INFO [werkzeug] [_internal.py:113] [dd.service=advertisements-service dd.env=development dd.version=1.0.1 dd.trace_id=0 dd.span_id=0] - 10.244.1.5 - - [06/Aug/2021 06:55:43] "GET /banners/2.jpg HTTP/1.1" 200 -

Any help would be greatly appreciated. Happy to provide more details if necessary.

Thanks!
-Ash

File integrity monitoring

Depends on the attack box story #71 . The attack box should ssh in to one or more containers and modify a file on disk using vim or nano. Probably .ssh/authorized_keys ...

That container should be running the Datadog agent with file integrity monitoring enabled to empower the student to observe and analyze the finding.

Cross-publish images to Google Container Registry

To help make things quicker to spin up across platforms, we should cross-publish all containers to Google Container Registry (GCR) under the datadog-community project. This might be an easy lift in updating our workflows to publish to both, but I don't know the complexity of it yet.

Update ddtrace Gem in storefornt to lates tversion

The current version is 0.54.2 but we used 0.4.1 and 0.50.0

src/broken-instrumented.patch:+gem 'ddtrace', '>= 0.4.1'
src/broken-instrumented.patch:+    ddtrace (0.50.0)
src/broken-instrumented.patch:+  ddtrace (>= 0.4.1)
src/instrumented-fixed.patch:+gem 'ddtrace', '>= 0.4.1'
src/instrumented-fixed.patch:+    ddtrace (0.50.0)
src/instrumented-fixed.patch:+  ddtrace (>= 0.4.1)

Unable to export profile: ddtrace.profiling.exporter.ExportError: HTTP Error 403

Any idea what could be an issues with container. I am using latest version of repo.
`
[root@ip-172-31-12-118 traffic-replay]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
54e11f2afb60 traffic-replay "/bin/sh -c 'wait-fo…" 8 minutes ago Up 8 minutes distracted_liskov
7912f14cb332 ddtraining/storefront:2.1.0 "sh docker-entrypoin…" 12 minutes ago Up 12 minutes 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp docker-compose-frontend-1
013da90c3c49 ddtraining/discounts:2.1.0 "/usr/bin/dumb-init …" 12 minutes ago Exited (2) 12 minutes ago docker-compose-discounts-1
6dd6bc41963a ddtraining/advertisements:2.1.0 "ddtrace-run flask r…" 12 minutes ago Up 12 minutes 0.0.0.0:5002->5002/tcp, :::5002->5002/tcp docker-compose-advertisements-1
8698bafba68a datadog/agent:7.29.0 "/init" 12 minutes ago Up 12 minutes (unhealthy) 8125/udp, 0.0.0.0:8126->8126/tcp, :::8126->8126/tcp docker-compose-agent-1
c8718bf8adfa postgres:11-alpine "docker-entrypoint.s…" 12 minutes ago Up 12 minutes 5432/tcp docker-compose-db-1
[root@ip-172-31-12-118 traffic-replay]# docker logs 013da90c3c49
Starting OpenBSD Secure Shell server: sshd.
rsyslogd: imklog: cannot open kernel log (/proc/kmsg): Operation not permitted.
rsyslogd: activation of module imklog failed [v8.1901.0 try https://www.rsyslog.com/e/2145 ]
Starting enhanced syslogd: rsyslogd.
Usage: flask run [OPTIONS]
Try 'flask run --help' for help.

Error: Invalid value for '--port' / '-p': is not a valid integer
2022-06-17 03:07:25,128 ERROR [ddtrace.profiling.scheduler] [scheduler.py:52] [dd.service=discounts-service dd.env=development dd.version=1.0 dd.trace_id=0 dd.span_id=0] - Unable to export profile: ddtrace.profiling.exporter.ExportError: HTTP Error 403
. Ignoring.
Failed to start my_second_process: 2
[root@ip-172-31-12-118 traffic-replay]#

`

Re-write the timeline of the repo?

Another optimization we can make to this project is to correct the timeline of the git history. Originally this repo was created as a copy of a clone of https://github.com/spree/spree which brought 20,000+ commits of history from that repo which are in no way related to the work our project. The history in this repo starts at 4f09169 and I propose we remove all the history up to and including f8e7b4e.

This will benefit the project by drastically cutting down the overall history you need to synchronize any time you clone the project to contribute code. It will also reduce all the Katacoda scenario image sizes with less changes to synchronize.

Hardcoded agent on localhost in advertisements and discounts

Seeing this error a lot from discounts and advertisements in a docker-compose environment. These services do send traces correctly, but something tries to connect to the agent on localhost instead of datadog, the agent hostname. This results in lots of errors in logs that are distracting.

ERROR:ddtrace.internal.writer:failed to send traces to Datadog Agent at http://localhost:8126, 2 additional messages skipped

Support Azure deployments with Terraform

To start unifying the deployment story we should convert the various deployment methods into terraform modules so we can target different platforms and have a more out of the box experience. This also includes ample documentation to make it easier for anyone to deploy this project where they need to.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.