Comments (4)
It's possible that you don't have enough resources to run a crawl with the default settings. (We can look at setting these lower by default).
The defaults are:
# base memory per for 1 browser
crawler_memory_base: 1024Mi
# number of browser workers per crawler instances
crawler_browser_instances: 2
You can set these to lower, such as crawler_browser_instances to 1 in your local.yaml and try lowering the memory base if that is the issue.
To find out exactly, you can run kubectl describe pods -n crawlers
and look for any info in the Events
section at the bottom, that may explain what is happening in k8s.
from browsertrix-cloud.
@ikreymer tis is the output of kubectl describe pods -n crawlers
im not quiet sure what this means : Warning FailedScheduling 7m17s default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
i used the default config
full output:
`Name: crawl-manual-20240701124018-a670e2cd-591-0
Namespace: crawlers
Priority: 0
Priority Class Name: crawl-pri-0
Service Account: default
Node: docker-desktop/192.168.65.3
Start Time: Mon, 01 Jul 2024 20:40:21 +0800
Labels: crawl=manual-20240701124018-a670e2cd-591
role=crawler
Annotations: metacontroller.k8s.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"labels":{"crawl":"manual-20240701124018-a670e2cd-591","role":"crawler"},"name":"crawl-manual-...
Status: Pending
IP:
IPs:
Controlled By: CrawlJob/crawljob-manual-20240701124018-a670e2cd-591
Containers:
crawler:
Container ID:
Image: docker.io/webrecorder/browsertrix-crawler:latest
Image ID:
Port:
Host Port:
Command:
crawl
--config
/tmp/crawl-config.json
--workers
2
--redisStoreUrl
redis://redis-manual-20240701124018-a670e2cd-591.redis.crawlers.svc.cluster.local/0
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 2254857830400m
Requests:
cpu: 1500m
memory: 1879048192
Liveness: http-get http://:6065/healthz delay=15s timeout=1s period=120s #success=1 #failure=3
Environment Variables from:
shared-crawler-config ConfigMap Optional: false
storage-default Secret Optional: false
Environment:
HOME: /crawls/home
CRAWL_ID: manual-20240701124018-a670e2cd-591
WEBHOOK_URL: redis://redis-manual-20240701124018-a670e2cd-591.redis.crawlers.svc.cluster.local/0/crawls-done
STORE_PATH: c3843ef0-e5e1-492c-b3f5-8d460db5b7bf/
STORE_FILENAME: @[email protected]
STORE_USER: bc8406e0-4eb1-4a07-b9d3-b18a0eddbf72
WARC_PREFIX: my-organization-www-bing-com
Mounts:
/crawls from crawl-data (rw)
/tmp/crawl-config.json from crawl-config (ro,path="crawl-config.json")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-njdfq (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
crawl-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: crawl-config-a670e2cd-5919-44a5-b772-0700bbc8d248
Optional: false
crawl-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: crawl-manual-20240701124018-a670e2cd-591-0
ReadOnly: false
kube-api-access-njdfq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nodeType=crawling:NoSchedule
Events:
Type Reason Age From Message
Warning FailedScheduling 7m17s default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Normal Scheduled 7m15s default-scheduler Successfully assigned crawlers/crawl-manual-20240701124018-a670e2cd-591-0 to docker-desktop
Normal Pulling 7m14s kubelet Pulling image "docker.io/webrecorder/browsertrix-crawler:latest"
Name: redis-manual-20240701124018-a670e2cd-591
Namespace: crawlers
Priority: 0
Service Account: default
Node: docker-desktop/192.168.65.3
Start Time: Mon, 01 Jul 2024 20:40:21 +0800
Labels: crawl=manual-20240701124018-a670e2cd-591
role=redis
Annotations: metacontroller.k8s.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"labels":{"crawl":"manual-20240701124018-a670e2cd-591","role":"redis"},"name":"redis-manual-20...
Status: Pending
IP:
IPs:
Controlled By: CrawlJob/crawljob-manual-20240701124018-a670e2cd-591
Containers:
redis:
Container ID:
Image: redis
Image ID:
Port:
Host Port:
Args:
/redis-conf/redis.conf
--appendonly
yes
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 200Mi
Requests:
cpu: 10m
memory: 200Mi
Liveness: exec [bash -c res=$(redis-cli ping); [[ $res = 'PONG' ]]] delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [bash -c res=$(redis-cli ping); [[ $res = 'PONG' ]]] delay=10s timeout=5s period=10s #success=1 #failure=3
Environment:
Mounts:
/data from redis-data (rw)
/redis-conf from shared-redis-conf (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sqx8p (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
shared-redis-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: shared-redis-conf
Optional: false
redis-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: redis-manual-20240701124018-a670e2cd-591
ReadOnly: false
kube-api-access-sqx8p:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nodeType=crawling:NoSchedule
Events:
Type Reason Age From Message
Normal Scheduled 7m14s default-scheduler Successfully assigned crawlers/redis-manual-20240701124018-a670e2cd-591 to docker-desktop
Normal Pulling 7m14s kubelet Pulling image "redis"`
from browsertrix-cloud.
Were you able to solve the issue? It seems like it wasn't able to allocate the storage space, were you using the default settings?
from browsertrix-cloud.
@ikreymer
yeah its working now,
it turns out k8s dosent work well with vpn
from browsertrix-cloud.
Related Issues (20)
- [Chore]: Remove last remaining reference to "archive"
- [Feature]: Disable behaviors for QA
- Enforce storage quota and execution minutes quota exceeded in the same way HOT 2
- [Chore]: Clean up home page component
- [Bug]: crawl waiting for resources indefinitely
- [Bug]: Browser title bar doesn't update to reflect page
- [Chore]: Migrate to `BtrixElement`
- Feature: Send automated emails to org admins when quotas are nearly reached
- [Change]: Move exec minute history table to the admin billing tab
- [Feature]: Default workflow configuration per org
- [Bug]: Fix failed login attempts expiry
- [Bug]: Orgs list is visible to non-superadmin HOT 1
- [Bug]: Missing warc-files HOT 1
- Return user details with org info in backend API login endpoint response
- chore: Use user info from login response
- API endpoint for workflow config defaults HOT 2
- Allow org admins to set default workflow configs HOT 1
- [Bug]: Browser profile description overflow in dropdown
- [Feature]: Create job "channels" with separate and different numbers af harvesterinstances
- [Feature]: total disk usage in Overview also in local browsertrix installations HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from browsertrix-cloud.