Comments (33)
Our microservice architecture has contributed to small changes often taking much longer than they would otherwise.
A classic example is when we need to make an API change in a service like Exchange. You need to update Exchange + deploy, Metaphysics + deploy, Reaction + release, Force + deploy). God forbid you have a schema mismatch or you're doing a deprecation which then takes 2x as many PRs.
If you want to test the changes locally end-to-end first, you need to go through all the other pain that was mentioned above re: linking/setting up projects! So it tends to compound.
from readme.
Harder to quantify, but maybe there's something about the amount of time it takes to feel up to date on slack?
from readme.
Regarding meetings, it's less about quantity for me and more about batching. The days when I have meetings distributed throughout the day are hard from a context-switching perspective, and I feel like I can only focus on shallow work. The days when I have my meetings clustered together, I have less context-switching and I feel like I can get deeper work done. I much prefer the days when they are clustered, so maybe we could say "prefer scheduling meetings before 1pm", or something.
This almost might just be personal preference, and I'm willing to accept that.
from readme.
I'd definitely say Test Suites and CI. Specifically I think of Volt and Gravity. I can imagine my own productivity would have been much higher when working on Volt and Gravity if I didn't have to wait ~40min to see my change getting into staging.
This in general impacts how we treat these services, Volt probably gets less frequent deploys just because PR -> Production time is too long and makes any Prod deploy slightly riskier. You know that a hot fix is at least 40minutes away.
from readme.
I think it would be valuable to consider the intention and purpose of the various meetings people have to (or are encouraged to) attend each week. Could some of this communication and information dissemination be asynchronous (Slack, email, Notion, etc). Thinking of teams that hold daily standups, especially ones that consistently last longer than 10-15 minutes, is that the most efficient use of time? Could more time be spent grooming/preparing for Sprint planning and Sprint retros and less time spent in these actual meetings?
from readme.
Test suite duration. Not immediately clear if and how long people sit around to wait for the suite to finish, but perhaps measuring it leads to insights after all.
from readme.
My personal pet peeve is when work that's essentially done doesn't get promptly released. If it was worth doing, it should be worth releasing promptly. When things stall in staging, they also require further context switches to properly QA, migrate, and verify. All those can be handled more efficiently closer to the work.
from readme.
I spend a non-trivial amount of time waiting for my local force to load its first page.
from readme.
PR creation to deploy time. This is a large over-arching one that we could probably dig into further as we start gaining insights, but an example of that could be how often people have to fix merge conflicts before a PR ends up getting merged.
from readme.
I'll re-iterate something that came out of the Local Discovery retro, which was that we were sometimes spending hours waiting for Emission CI builds to event start. We wasted far more engineering time than the few hundred dollars that we could have paid Circle CI to get iOS build concurrency. Our decision was, at the first sign of this happening again, to upgrade our plan.
I really like this framework of thinking of ways to reduce wasted engineering time, and the iOS Circle CI upgrade perfectly encapsulates how we can spend slightly more money to unlock a lot of engineering productivity. Sounds like there are some other instances (Gravity and Volt have already been mentioned).
from readme.
I find that I've only upvoted the social ones (Slack, meetings) and not really any of the technical ones just because context-switching is far more disruptive to my feeling of productivity that any incremental time spent waiting on systems (those pauses for me end up in any case being times to check Slack and email, or just take a breather).
from readme.
yarn install
time, and especially rm -rf node_modules && yarn install
Same applies to bundle install
and pod install
.
from readme.
Time people spend getting yarn link
etc to setup up correctly. Perhaps this gives us clear insight into how much we need a well functioning yarn workspaces
setup?
from readme.
Dev environment setup and configuration, for example in linking Eigen/Emission or Reaction/Force.
from readme.
Dev environment setup and configuration, for example in linking Eigen/Emission or Reaction/Force.
@ashleyjelks Perhaps you’re touching on the same thing as I did above? #185 (comment)
from readme.
Dev environment setup and configuration, for example in linking Eigen/Emission or Reaction/Force.
@alloy @ashleyjelks This is something we noticed in the LD/City Guides project: we routinely hit a problem where CocoaPods would fail to pod install
(complaining, usually, about tipsi-stripe
). A fix for this has been PR'd and is pending review: CocoaPods/CocoaPods#8676 This is a great case for how contributing to our tools, on Artsy time, can make our engineering team overall more productive 👍
from readme.
I would +1 @erikdstock's comment
I haven't seen any mention of it yet but I would say that for me personally pair programming is a net productivity gain.
from readme.
Sometimes the tooling for Relay misleads you with errors. I've not yet learned when I can ignore a red squiggly line from Relay and when I can't, so I've spent a few wild goose chases trying to make them go away...only to find out from someone else that I don't really need to fix them.
@ashfurrow pointed out this example: https://artsyproduct.atlassian.net/browse/PLATFORM-1365
And right now, I'm dealing with some errors on this PR. Things seem to work fine in the browser and running yarn relay
, but there are red squiggly lines in VS Code.
from readme.
@ashfurrow it is volt. I don't have much in depth knowledge about what could be improved. @starsirius and others spend a lot of effort improving it but the old tests in volt from 4-5 years ago are just very expensive to run.
This morning I literally removed a single comment line from a PR that was green and after 5 hours haven't yet been able to merge the PR: 👇 this is more of an exception because of the release of the new version of chrome and @starsirius quickly fixed that 🙏 🙏
from readme.
I want to separate out the issues we have now with long CI wait time, and we can address them individually:
- We don't have sufficient containers on CircleCI to build projects in parallel during peak hours.
- Tests in some projects take significantly longer, e.g. running entire test suite takes ~15 mins for Volt, and ~25 mins for Gravity.
We can probably get some insights from CircleCI about both, i.e. time spent when a job was simply queued and time spent when executing. Also want to mention the CircleCI workflow might also contribute the total wait time given our limited containers.
from readme.
So here’s a list of just some of the things I think we could consider:
I like how you all are creating single comments and then we can upvote those, so going to split up my list too. I’ll start with:
Meetings. Not sure yet how to think about that one, but I’m sure we could optimise something there–and I think it’s everyone’s favourite to have less of.
from readme.
Recently after rejoining gallery team - not negligible time spent addressing subscription corrections that have to be done through console. The issue there is mainly trying to understand what happens that the adjustments are needed. Granted those usually not standard cases - so tricky to automate.
from readme.
Git commit hook duration. Probably everybody waits for that to finish instead of doing something else in the meantime.
from readme.
have to be done through console
@oxaudo We certainly could consider starting to record time of opening a (production) console to closing it and then see if we can/need to dig into that further?
from readme.
Test & deploy time don't often affect the services I mainly work in so I feel less inclined to comment on those, but I can remember them being pain points in the past.
I'd echo the comments of @ashleyjelks, @pepopowitz and @alloy above about meetings. In my ideal world they would be scheduled in a continuous block and would have a clear purpose/product/outcome. Additionally some meetings (L&L, S&T) are definitely valuable but hard to weigh against immediate demands and deadlines- in that moment they feel like extracurricular activities and contribute to my feeling of meeting bloat (the result is that I only attend those I'm definitely interested in, but I feel guilty about it).
I also feel pretty strongly that standups should be very brief yesterday->today->blockers
updates and should not last longer than 1 minute per person (this seems like something every team at every company has to balance though). Followups can happen immediately after.
I haven't seen any mention of it yet but I would say that for me personally pair programming is a net productivity gain.
from readme.
Dev environment setup and configuration, for example in linking Eigen/Emission or Reaction/Force.
@ashleyjelks Perhaps you’re touching on the same thing as I did above? #185 (comment)
Yep, its a duplicate issue. I'm happy to delete this one!
from readme.
just to back up my concern with time spent waiting for CI , this is a build for one liner PR thats waiting for getting to staging for 2hrs now.
in this specific example, follow up works for updating Fulcrum, MP and Reaction are blocked by this.
Its hard to predict CI's traffic pattern and these cases don't happen too often, but they happen enough times to make it considerable amount of waiting time total.
from readme.
I agree with almost every single point above but for me the most annoying thing is test run time.
I think slow tests sometimes have more implications than just wait time. For example a lot of times when reviewing a PR I am hesitant about asking a minor change because correcting a minor problem would block the entire org waiting for circleci time.
This part of this blog post might sound a little bit extreme but I think we all agree the 'mean time to recovery' is very important and we might be too much focused on increasing 'mean time between failures':
Do we ever release bugs to production? Of course. But mean time to recovery is usually more important than mean time between failures. If we deploy something that’s broken, we can often roll back within minutes. And since we ship very incremental changes, the average bug is often limited in impact. Bugs in production are often related to code that was written in the last few days, so it’s fresh in mind and can be fixed quickly.
from readme.
@sepans which projects have you worked with that have the longest test times? Does this mostly cause a problem locally or on CI? (Or both?)
That's a really interesting blog post – I think it's very apropos.
from readme.
I'll put together a list of these sorted by votes and review in what ways we could track the time spent on these issues.
from readme.
Ok, Peril is going to keep adding back the RFC label. I'll just close this as accepted and will follow-up with resolution steps.
from readme.
Turns out [RFC]
was added to the title, which is why peril kept adding that. Re-opening this now that I've removed that.
from readme.
Closing this issue as stale
from readme.
Related Issues (20)
- RFC: Implement Dependency Rotation HOT 8
- [RFC] Feedback Friday time reschedule HOT 2
- RFC: Catch more WTFs during onboarding HOT 2
- RFC: Protect main/master branches HOT 5
- RFC: We are all solely responsible for ensuring that we are not disturbed outside of working hours HOT 16
- RFC: Incrementally adopt I18n library in Rails projects HOT 11
- RFC: Adopt Codecov at Artsy, starting with Gravity HOT 8
- RFC: Adopt inclusive language for repository naming as well as allow/deny lists HOT 12
- RFC: Rename product slack channels to `prd-*` HOT 17
- RFC: Host one Hackathon per quarter in 2022 HOT 8
- RFC: Host one Codebase Refinement per quarter in 2022 HOT 11
- RFC: Officially recommend against using GraphQL Stitching in Gravity HOT 19
- RFC: Reusable components HOT 21
- RFC: Updating Best Practices Documentation HOT 10
- RFC: Retiring Torque HOT 1
- RFC: Feature Flags Naming Conventions / Maintenance HOT 14
- RFC: disallow squashing and rebasing on PRs HOT 17
- Want access of Web & Mobile best practices documentation
- RFC: More Relaxed CodePush Usage for Folio HOT 4
- RFC: Consolidate Eigen feature flags HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from readme.