Git Product home page Git Product logo

ytsync's Introduction

YTSync Tool

Build Status

This tool serves lbry to parse youtube channels that want their content mirrored on LBRY.

The tool downloads the entire set of public videos from a given channel, publishes them to LBRY and populates our private database in order to keep track of what's publishes. With the support of said database, the tool is also able to keep all the channels updated.

Requirements

  • lbrynet SDK https://github.com/lbryio/lbry-sdk/releases (We strive to keep the latest release of ytsync compatible with the latest major release of the SDK)
  • a lbrycrd node running (localhost or on a remote machine) with credits in it
  • internal-apis (you cannot run this one yourself)
  • python3-pip
  • yt-dlp (pip3 install -U yt-dlp)
  • ffmpeg (latest)

Setup

  • make sure daemon is stopped and can be controlled through systemctl (find example below)
  • extract the ytsync binary anywhere
  • create and fill config.json using this example

systemd script example

/etc/systemd/system/lbrynet.service

[Unit]
Description="LBRYnet daemon"
After=network.target

[Service]
Environment="HOME=/home/lbry"
ExecStart=/opt/lbry/lbrynet start
User=lbry
Group=lbry
Restart=on-failure
KillMode=process

[Install]
WantedBy=multi-user.target

Instructions

Publish youtube channels into LBRY network automatically.

Usage:
  ytsync [flags]

Flags:
      --after int                   Specify from when to pull jobs [Unix time](Default: 0)
      --before int                  Specify until when to pull jobs [Unix time](Default: current Unix time) (default 1669311891)
      --channelID string            If specified, only this channel will be synced.
      --concurrent-jobs int         how many jobs to process concurrently (default 1)
  -h, --help                        help for ytsync
      --limit int                   limit the amount of channels to sync
      --max-length int              Maximum video length to process (in hours) (default 2)
      --max-size int                Maximum video size to process (in MB) (default 2048)
      --max-tries int               Number of times to try a publish that fails (default 3)
      --no-transfers                Skips the transferring process of videos, channels and supports
      --quick                       Look up only the last 50 videos from youtube
      --remove-db-unpublished       Remove videos from the database that are marked as published but aren't really published
      --run-once                    Whether the process should be stopped after one cycle or not
      --skip-space-check            Do not perform free space check on startup
      --status string               Specify which queue to pull from. Overrides --update
      --status2 string              Specify which secondary queue to pull from.
      --takeover-existing-channel   If channel exists and we don't own it, take over the channel
      --update                      Update previously synced channels instead of syncing new ones
      --upgrade-metadata            Upgrade videos if they're on the old metadata version
      --videos-limit int            how many videos to process per channel (leave 0 for automatic detection)

Running from Source

Clone the repository and run make

License

This project is MIT licensed. For the full license, see LICENSE.

Contributing

Contributions to this project are welcome, encouraged, and compensated. For more details, see CONTRIBUTING.

Security

We take security seriously. Please contact [email protected] regarding any security issues. Our PGP key is here if you need it.

Contact

The primary contact for this project is Niko Storni ([email protected]).

Additional Info and Links

ytsync's People

Contributors

e4drcf avatar lyoshenka avatar nikooo777 avatar strikerrus avatar tiger5226 avatar tzarebczan avatar ykris45 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ytsync's Issues

Retry failed channel sync

YT sync channels go into failed mode on api hiccups or other temporary reason. We should retry these failed channels automatically.

Address old review comments

Premiered / Live stream videos fail

If a video is premiered or live streamed, they end up in a failed status with "download error: no compatible format available for this video". I'm guess they come in through the API, but aren't actually available yet. We need to reprocess these automatically or skip them until they are available (I think the api should be able to tell us).

Some examples:
https://www.youtube.com/watch?v=drmXDZ7Jad4 (premiered)
https://www.youtube.com/watch?v=7fd_LLYIauM (live)

Support transfer process

Support transfer of claims and channels for YouTubers who have requested their content to be sent to their local wallets.
Main features:

  1. batch send claims/channels to supplied address (lbryio/lbry-sdk#1821)
  2. process tips - either abandon all tips or claim_send them to their new address, We could also tip their channel as another option - so they see the total amount of tips received (not sure if this is good or bad).

Allow sync after sending youtube wallet to users

Currently, the duplicate checker may modify claims incorrectly if the YouTuber were to edit/publish/abandon some claims locally. We should be able to send them their wallets and continue publishing from YT.

Sync Khan Academy content from YouTube

Believe we should be able to sync this legally by attributing the correct copyright.

https://www.youtube.com/user/khanacademy/videos

https://www.khanacademy.org/about/tos

"7.5 Crediting Khan Academy. If You distribute, publicly perform or display, transmit, publish, or otherwise make available any Licensed Educational Content or any derivative works thereof, You must also provide the following notice prominently along with such Licensed Educational Content or derivative work thereof: “All Khan Academy content is available for free at www.khanacademy.org”.

Support existing LBRY publisher on /youtube signup flow

@tzarebczan commented on Tue Jul 24 2018

In order to support existing LBRY creators who migrate to YouTube Sync (non-public/internal api issue - https://github.com/lbryio/internal-apis/issues/471), we would need to capture additional data like their channel signing certificate (needs to be transferred/stored securely) and wallet address to start publishing their content automatically.

@alyssaoc we may want to create an epic (or turn this issue into an epic) if and when we are ready to spec it out.

Address old review comments

Consider republishing malformed content

For some time period over the last two weeks, videos from ytsync were malformed, reducing their streamability.

  1. Determine quantity of videos affected and date range.
  2. Determine if there is a clear way to identify which videos are and are not affected
  3. If feasible/practical, update the stream hashes for this content

queueing and reprocessing failures

A video may fail to sync for many different reasons and may not retry until a new video is published or we have the creator complain to us. We should put all the "legit" failures into a queue and have at least 1 server reprocess those.

Ensure ytsync continuous processing / better alerts

We ran into an issue recently where a bunch of existing channels were not being synced because they were in failed status and also because of some server issues. We should be able to re-try sync in some of these scenarios and provide better errors/notifications when things stop processing server side.

  • Alert us on slack when ytsync dies
  • Create dashboard to overview failed channels
  • Alert us when the youtube sync wallet runs out of credits
  • Address remaining failed videos (details below)

Main error to address Initial wallet setup failed! Manual Intervention is required.: channel_claim_id: cannot be blank.

Delay syncing video if newly posted / not post processed

@tzarebczan commented on Tue Sep 11 2018

I think we may sometimes run into a scenario where we sync a video before it is post-processed in 720P+ on YouTube so we end up syncing a lower quality version. We should check when the video was posted, and if it's very recent (not sure what the timing for post-processing looks like), delay download/publish until the next iteration.

Consider how to set copyright / ask publisher

@kauffj commented on Tue May 15 2018

Some publishers do not wish to retain copyright or wish to publish under creative commons or similar.

If this information is available via YouTube, we should match it.

If not, we should ask but provide a sane default (e.g. it could be on the lbry.io/youtube/token status page).

Steps:

  • When publishing, fetch status.license from youtube api (https://developers.google.com/youtube/v3/docs/videos#status.license). If license is creativeCommon, publish under the creative commons license (check which one youtube uses and use the same). If its youtube, do whatever we do now.
  • add a note on the youtube/token page explaining how we'll be setting the license on published content, and tell the user to email us if they want a different treatment for their content. use our support email

Acceptance Criteria

Definition of Done

  • Tested against acceptance criteria
  • Tested against the assumptions of the user story
  • The project builds without errors
  • Unit tests are written and passing
  • Tests on devices/browsers listed in the issue have passed
  • QA performed & issues resolved
  • Refactoring completed
  • Any configuration or build changes documented
  • Documentation updated
  • Peer Code Review performed

Update all YT claims with new metadata

This will serve as the ytsyc issue to implement the below and an epic to track types/sdk required changes. When the sdk tickets are filed, I'll update it.

  1. Setup a process to perform claim updates, preferably in batch mode
    (lbryio/lbry-sdk#1821). If easy update mode is available (lbryio/lbry-sdk#1423), use it, if not, read from existing claim data. This also includes grabbing the sdblob for each claim in order to populate some of the new file related fields.

  2. From the sdblob, grab the filename and file size to populate name/size in lbryio/types#9.

  3. Grab video length from the YouTube api to populate length in lbryio/types#9

  4. Grab published date from YouTube api to populate releaseTime in lbryio/types#13

  5. Grab category/tags from YouTube api to populate in lbryio/types#15 or lbryio/types#16

  6. Grab language from YouTube api and populate language on claim metadata

  7. Research other relevant youtube API data that may require new types/sdk entries.

speed up publishing of new videos

Right now we have a pull-type service where the channel needs to be processed to determine if there are newly published videos. It can take anywhere from 1 minute to 24 hours.

add support for priced content

@nikooo777 commented on Wed Jul 04 2018

Some content publishers want to have their videos published for a price.
For this reason ytsync must fetch the following parameters from the API: fee_amount, fee_currency, fee_address and then supply them when calling the publish method.
This issue depends on the implementation on the API side first.


@tzarebczan commented on Tue Jul 31 2018

We recently synced some content without a price for a creator who was promised we'd sync at 1 LBC.


@nikooo777 commented on Thu Aug 02 2018

and I agree that's a problem. I will try to get this in my next sprint

for ytsync, when daemon errors, include error type in message. for example, InsufficientFundsError has an empty message.

Acceptance Criteria

Definition of Done

  • Tested against acceptance criteria
  • Tested against the assumptions of the user story
  • The project builds without errors
  • Unit tests are written and passing
  • Tests on devices/browsers listed in the issue have passed
  • QA performed & issues resolved
  • Refactoring completed
  • Any configuration or build changes documented
  • Documentation updated
  • Peer Code Review performed

Replace YouTube links with LBRY links

This would involve:

  • Parsing all the video files and descriptions up front before performing any publishes.
  • Ordering and performing the publishes in an order so that URLs can be replaced.

If there are circular references in video descriptions, I do not see a way to do this without performing an additional edit on some publishes.

Support publishing with new metadata

Requires similar issues to be completed on types/lbrysdk as #18
Not sure if we want to start publishing with new metadata after it's immediately available or once the full set is ready (if we publish immediately, we'll need to update the claims later with other metadata)

  1. From the download file, grab the filename and file size to populate name/size in lbryio/types#9.

  2. Grab video length from the YouTube api (or from lbry-sdk process if that's easier) to populate length in lbryio/types#9

  3. Grab published date from YouTube api to populate releaseTime in lbryio/types#13

  4. Grab category/tags from YouTube api to populate in lbryio/types#15 or lbryio/types#16

  5. Grab language from YouTube api and populate language on claim metadata

  6. Research other relevant youtube API data that may require new types/sdk entries.

  7. Store any of these fields in the synced videos table as needed

Derivable URLs for Creator Partnerships

As part of creator partnerships @robvsmith is forming, creators are going to be embedding LBRY URLs directly in YouTube descriptions.

Preferably, creators would be able to know the LBRY URL of that video at the time of publishing to YouTube (and thus before picked up by sync).

Potential solutions to this should be discussed and reviewed before implementation.

Switch thumbnail hosting away from berk.ninja

@kauffj commented on Tue May 22 2018

This is not a good choice long-term. Ideally it would be spee.ch.


@nikooo777 commented on Tue Oct 02 2018

I'd like to have a discussion on this.
I disagree that we should use spee.ch for thumbnails, but I agree that we should use a better domain in place of berk.ninja

I can discuss this at the office with Jeremy but my TL;dr would be that if we're not doing it in a completely decentralized way then we should just stick to a centralized solution that we know works fine and will work fine.
using spee.ch would only cause a burden because for each single claim we need a second claim just for the thumbnail, we'd be polluting the blockchain with content that is strictly necessary (possibly considered metadata) of another claim, on top of that, we'd depend on spee.ch being online and working forever (both the domain and the infrastructure) rather than just worrying about the domain and the data being on S3 (easily moved/replaced).

I don't see this solution scaling well and could become a huge problem in the future, so either we bundle the thumbnail with the claim (so that the thumbnail itself is part of the blobs associated) or we use the centralized solution.

edit: this ended up not being a tl;dr....


@lyoshenka commented on Wed Oct 03 2018

I agree with niko - putting this on spee.ch means we're making two claims for each upload. It also means if spee.ch goes down, all of the thumbnails will break. Are we committed to spee.ch being up with some SLA? I thought we still use it for testing things, running advanced SDK builds, etc.

transfer kinks (support user with duplicate videos)

  • videos published before transfer are not marked as transferred - this causes the video to fail in transferring. This happens if there are new videos after we got an address/pubkey

  • support list needs to use the default account id - otherwise it will try to abandon supports for transferred videos and fail to sync.

  • if the process breaks / needs restart, supports need to be resent manually

  • 571b35609b90cffceafee38242a6c48f9499c276 had duplicate synced_videos, so the counts were wrong

  • when we try to delete some content after transfer, things fail - https://lbryians.slack.com/archives/D5W0D8ZJP/p1569617590246000

claim too large

This could be happening due to a large # of tags with a combination of a long description (I know we truncate)

https://lbryians.slack.com/archives/CACSTN9SL/p1564663088289500

'the transaction was rejected by network rules.\n\n16: bad-txns-claimscriptsize-toolarge\n[0100000001cdcc791725317c775b77e37c7f2879d90d916176697c3b0fd266d9c802b0edb9010000006b483045022100ec9d07981fbafe98c5bf74261c0c350028e081e3071fc6156b23d0e5c013e42c02200dc41e49ff8d4f80a4552c92a9752626421cacc9daa9a6c4ee5da4d174a24a34012102c1735560213182766f37c2ee2920c61bbe3275e99b4ac1242c8e1a017621dbd7ffffffff0240420f0000000000fd3022b5246d6173732d707562672d737569636964652d707562672d7468652d6c6f6b737761722d334dec2101e752db183a80641aa6401f7665f6f080d69bc0a9a6f60015dd37bee723cec9f55f2367b766d6e5a7f9260894a1f7ae739850f8dda9e5c7a3f2f9d2a3c981bacdb2e94512f0309100b29f2d941a96e2a45794cca10ac9010a94010a30b1ac510e882aae5c54897ead660c1cc743e67e9214a8c743dc3d869a0e536124083585e18b213a67aefb945631b31407121e6d6173732d707562672d737569636964652d707562672d7468652e6d703418b7a5e4092209766964656f2f6d703432309ae835089b040ebfb4f7b5ae199eeae4ca51b1e8e1a83b07c06add43c85331db6597d4c79f3baf729aaecc2bc11c330d1a1f436f7079726967687465642028636f6e74616374207075626c69736865722928d9a184e0055a0908d60610d60318fb0142534d617373205055424720537569636964653f20e0a49ce0a4bee0a4a8e0a4b2e0a587e0a4b5e0a4be20e0a4b9e0a58820e0a4afe0a587207075626720e0a497e0a587e0a4ae202121546865206c6f6b737761724ac23ee0a4ade0a4bee0a4b0e0a4a420e0a4aee0a58720e0a4b9e0a4b020e0a486e0a4afe0a58720e0a4a6e0a4bfe0a4a820e0a495e0a58be0a48820e0a4a820e0a495e0a58be0a48820e0a497e0a587e0a4ae20e0a49ae0a4b0e0a58de0a49ae0a4be20e0a4aee0a587e0a48220e0a4ace0a4a8e0a58020e0a4b0e0a4b9e0a4a4e0a58020e0a4b9e0a588e0a5a4e0a49ce0a58b20e0a4afe0a581e0a4b5e0a4bee0a49320e0a495e0a58b20e0a485e0a4aae0a4a8e0a58020e0a493e0a4b020e0a486e0a495e0a4b0e0a58de0a4b7e0a4bfe0a4a420e0a495e0a4b020e0a495e0a58720e0a4a7e0a580e0a4b0e0a58720e0a4a7e0a580e0a4b0e0a58720e0a489e0a4a8e0a495e0a58b20e0a485e0a4aae0a4a8e0a58720e0a4a8e0a4bfe0a49ce0a58020e0a59be0a4a8e0a4a6e0a497e0a58020e0a4b8e0a58720e0a4a6e0a582e0a4b0e0a58020e0a4ace0a4a8e0a4be20e0a4a6e0a587e0a4a4e0a58020e0a4b9e0a588e0a5a4e0a48fe0a49520e0a490e0a4b8e0a4be20e0a4b9e0a58820e0a497e0a587e0a4ae205055424720e0a4b9e0a58820e0a49ce0a58b2020e0a4aae0a4bfe0a49be0a4b2e0a58720e0a495e0a581e0a49b20e0a4a6e0a4bfe0a4a8e0a58be0a48220e0a4b8e0a58720e0a4afe0a581e0a4b5e0a4bee0a49320e0a495e0a58720e0a4b8e0a4bfe0a4b020e0a49ae0a59d20e0a4ace0a58be0a4b220e0a4b0e0a4b9e0a58020e0a4b9e0a588e0a5a4e0a487e0a4b820e0a497e0a587e0a4ae20e0a495e0a58b20e0a4b2e0a587e0a495e0a4b020e0a4afe0a581e0a4b5e0a4bee0a49320e0a495e0a58720e0a4ace0a580e0a49a20e0a48fe0a49520e0a485e0a4b2e0a49720e0a4b9e0a58020e0a4a4e0a4b0e0a4b920e0a495e0a58020e0a4a6e0a580e0a4b5e0a4bee0a4a8e0a497e0a58020e0a4a6e0a587e0a496e0a4a8e0a58720e0a495e0a58b20e0a4aee0a4bfe0a4b220e0a4b0e0a4b9e0a58020e0a4b9e0a588e0a5a4e0a4b5e0a587e0a4b8e0a58720e0a4a4e0a58b207075626720e0a4aae0a582e0a4b0e0a58720e0a4ade0a4bee0a4b0e0a4a420e0a495e0a58720e0a4afe0a581e0a4b5e0a4bee0a493e0a48220e0a495e0a58720e0a4ace0a580e0a49a20e0a49be0a4bee0a4afe0a4be20e0a4b9e0a581e0a48620e0a4b9e0a58820e0a4b2e0a587e0a495e0a4bfe0a4a820e0a496e0a4bee0a4b8e0a495e0a4b020e0a495e0a58720e0a4ace0a482e0a497e0a4b2e0a58be0a4b020e0a495e0a58720e0a4afe0a581e0a4b5e0a4bee0a493e0a48220e0a495e0a4be20e0a4b2e0a4a420e0a4ace0a4a820e0a497e0a4afe0a4be20e0a4b9e0a588e0a5a4e0a487e0a4b8e0a495e0a58020e0a4b2e0a4a420e0a489e0a4a8e0a495e0a58020e0a4abe0a4bfe0a49ce0a4bfe0a495e0a4b220e0a4b9e0a587e0a4b2e0a58de0a4a520e0a4b8e0a58720e0a4b2e0a587e0a495e0a4b020e0a4aee0a587e0a482e0a49fe0a4b220e0a4b9e0a587e0a4b2e0a58de0a4a520e0a4aae0a4b020e0a4ade0a58020e0a485e0a4b8e0a4b020e0a4a1e0a4bee0a4b220e0a4b0e0a4b9e0a58020e0a4b9e0a588e0a5a4c2a0e0a4a8e0a580e0a482e0a4a620e0a495e0a58020e0a4aae0a4b0e0a587e0a4b6e0a4bee0a4a8e0a5802c20e0a485e0a4b8e0a4b220e0a49ce0a4bfe0a482e0a4a6e0a497e0a58020e0a4b8e0a58720e0a4a6e0a582e0a4b0e0a5802c20e0a495e0a589e0a4b2e0a587e0a49c20e0a4b520e0a4b8e0a58de0a495e0a582e0a4b220e0a4b8e0a58720e0a4b2e0a497e0a4bee0a4a4e0a4bee0a4b020e0a490e0a4ace0a58de0a4b8e0a587e0```

Reprocess premieres and live streams automatically

Right now the channel won't get reprocessed and live streams/premieres won't be uploaded if they failed due to not being live / or while the live stream was in process. Only time they are reprocessed is when a new video is published.

This could be done as part of #25

for ytsync, detect when lbrycrd is not running and error appropriately

@lyoshenka commented on Fri Mar 16 2018

Acceptance Criteria

Definition of Done

  • Tested against acceptance criteria
  • Tested against the assumptions of the user story
  • The project builds without errors
  • Unit tests are written and passing
  • Tests on devices/browsers listed in the issue have passed
  • QA performed & issues resolved
  • Refactoring completed
  • Any configuration or build changes documented
  • Documentation updated
  • Peer Code Review performed

automatically reprocess videos with certain errors

There are a bunch of videos in failed state with "publish error: Error in daemon: Not enough funds to cover this transaction." / "download error: unexpected EOF" (not sure why this one is happening)

Fix "abandoned" status syncs

Looks like a bunch of youtube syncs were set to abandoned status prematurely because they were not rewards approved. Not sure how to best handle this yet - but we may have some YT sync users that get set to rewards disabled initially, and then set to approved. Some of these were synced, and then set to abandoned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.