Git Product home page Git Product logo

Comments (4)

ikreymer avatar ikreymer commented on August 16, 2024

It's possible that you don't have enough resources to run a crawl with the default settings. (We can look at setting these lower by default).

The defaults are:

# base memory per for 1 browser
crawler_memory_base: 1024Mi

# number of browser workers per crawler instances
crawler_browser_instances: 2

You can set these to lower, such as crawler_browser_instances to 1 in your local.yaml and try lowering the memory base if that is the issue.

To find out exactly, you can run kubectl describe pods -n crawlers and look for any info in the Events section at the bottom, that may explain what is happening in k8s.

from browsertrix-cloud.

Brbrbr1995 avatar Brbrbr1995 commented on August 16, 2024

@ikreymer tis is the output of kubectl describe pods -n crawlers

im not quiet sure what this means : Warning FailedScheduling 7m17s default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
i used the default config

full output:
`Name: crawl-manual-20240701124018-a670e2cd-591-0
Namespace: crawlers
Priority: 0
Priority Class Name: crawl-pri-0
Service Account: default
Node: docker-desktop/192.168.65.3
Start Time: Mon, 01 Jul 2024 20:40:21 +0800
Labels: crawl=manual-20240701124018-a670e2cd-591
role=crawler
Annotations: metacontroller.k8s.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"labels":{"crawl":"manual-20240701124018-a670e2cd-591","role":"crawler"},"name":"crawl-manual-...
Status: Pending
IP:
IPs:
Controlled By: CrawlJob/crawljob-manual-20240701124018-a670e2cd-591
Containers:
crawler:
Container ID:
Image: docker.io/webrecorder/browsertrix-crawler:latest
Image ID:
Port:
Host Port:
Command:
crawl
--config
/tmp/crawl-config.json
--workers
2
--redisStoreUrl
redis://redis-manual-20240701124018-a670e2cd-591.redis.crawlers.svc.cluster.local/0
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 2254857830400m
Requests:
cpu: 1500m
memory: 1879048192
Liveness: http-get http://:6065/healthz delay=15s timeout=1s period=120s #success=1 #failure=3
Environment Variables from:
shared-crawler-config ConfigMap Optional: false
storage-default Secret Optional: false
Environment:
HOME: /crawls/home
CRAWL_ID: manual-20240701124018-a670e2cd-591
WEBHOOK_URL: redis://redis-manual-20240701124018-a670e2cd-591.redis.crawlers.svc.cluster.local/0/crawls-done
STORE_PATH: c3843ef0-e5e1-492c-b3f5-8d460db5b7bf/
STORE_FILENAME: @[email protected]
STORE_USER: bc8406e0-4eb1-4a07-b9d3-b18a0eddbf72
WARC_PREFIX: my-organization-www-bing-com
Mounts:
/crawls from crawl-data (rw)
/tmp/crawl-config.json from crawl-config (ro,path="crawl-config.json")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-njdfq (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
crawl-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: crawl-config-a670e2cd-5919-44a5-b772-0700bbc8d248
Optional: false
crawl-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: crawl-manual-20240701124018-a670e2cd-591-0
ReadOnly: false
kube-api-access-njdfq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nodeType=crawling:NoSchedule
Events:
Type Reason Age From Message


Warning FailedScheduling 7m17s default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Normal Scheduled 7m15s default-scheduler Successfully assigned crawlers/crawl-manual-20240701124018-a670e2cd-591-0 to docker-desktop
Normal Pulling 7m14s kubelet Pulling image "docker.io/webrecorder/browsertrix-crawler:latest"

Name: redis-manual-20240701124018-a670e2cd-591
Namespace: crawlers
Priority: 0
Service Account: default
Node: docker-desktop/192.168.65.3
Start Time: Mon, 01 Jul 2024 20:40:21 +0800
Labels: crawl=manual-20240701124018-a670e2cd-591
role=redis
Annotations: metacontroller.k8s.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Pod","metadata":{"labels":{"crawl":"manual-20240701124018-a670e2cd-591","role":"redis"},"name":"redis-manual-20...
Status: Pending
IP:
IPs:
Controlled By: CrawlJob/crawljob-manual-20240701124018-a670e2cd-591
Containers:
redis:
Container ID:
Image: redis
Image ID:
Port:
Host Port:
Args:
/redis-conf/redis.conf
--appendonly
yes
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 200Mi
Requests:
cpu: 10m
memory: 200Mi
Liveness: exec [bash -c res=$(redis-cli ping); [[ $res = 'PONG' ]]] delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [bash -c res=$(redis-cli ping); [[ $res = 'PONG' ]]] delay=10s timeout=5s period=10s #success=1 #failure=3
Environment:
Mounts:
/data from redis-data (rw)
/redis-conf from shared-redis-conf (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sqx8p (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
shared-redis-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: shared-redis-conf
Optional: false
redis-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: redis-manual-20240701124018-a670e2cd-591
ReadOnly: false
kube-api-access-sqx8p:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nodeType=crawling:NoSchedule
Events:
Type Reason Age From Message


Normal Scheduled 7m14s default-scheduler Successfully assigned crawlers/redis-manual-20240701124018-a670e2cd-591 to docker-desktop
Normal Pulling 7m14s kubelet Pulling image "redis"`

from browsertrix-cloud.

ikreymer avatar ikreymer commented on August 16, 2024

Were you able to solve the issue? It seems like it wasn't able to allocate the storage space, were you using the default settings?

from browsertrix-cloud.

Brbrbr1995 avatar Brbrbr1995 commented on August 16, 2024

@ikreymer
yeah its working now,
it turns out k8s dosent work well with vpn

from browsertrix-cloud.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.