lbryio / ytsync Goto Github PK

View Code? Open in Web Editor NEW

42.0 14.0 15.0 25.75 MB

An overly complex tool to mirror youtube content to LBRY

Home Page: https://lbry.com/youtube

License: MIT License

Go 92.42% Makefile 0.35% Shell 5.69% Dockerfile 1.42% C 0.11%

ytsync's Issues

Figure out why some videos time out and retry automatically?

New error as of a few weeks ago, affecting 30-50 videos per day:

download error: ERROR: unable to download video data: <urlopen error [Errno 110] Connection timed out>```

Support publishing to transferred channels

We want to support publishing to channels that are transferred to their owners so that they don't have the burden of doing it themselves yet still own the content.

Reprocess premieres and live streams automatically

Right now the channel won't get reprocessed and live streams/premieres won't be uploaded if they failed due to not being live / or while the live stream was in process. Only time they are reprocessed is when a new video is published.

This could be done as part of #25

Consider how to set copyright / ask publisher

@kauffj commented on Tue May 15 2018

Some publishers do not wish to retain copyright or wish to publish under creative commons or similar.

If this information is available via YouTube, we should match it.

If not, we should ask but provide a sane default (e.g. it could be on the lbry.io/youtube/token status page).

Steps:

When publishing, fetch status.license from youtube api (https://developers.google.com/youtube/v3/docs/videos#status.license). If license is creativeCommon, publish under the creative commons license (check which one youtube uses and use the same). If its youtube, do whatever we do now.
add a note on the youtube/token page explaining how we'll be setting the license on published content, and tell the user to email us if they want a different treatment for their content. use our support email

Acceptance Criteria

Definition of Done

prevent downloading of live streams too early

This still happens for some reason, and other times it's blocked: https://lbry.tv/@Crypto-Tips:b/will-alts-outperform-btc-gen-z-rebellion:1

Replace YouTube links with LBRY links

This would involve:

Parsing all the video files and descriptions up front before performing any publishes.
Ordering and performing the publishes in an order so that URLs can be replaced.

If there are circular references in video descriptions, I do not see a way to do this without performing an additional edit on some publishes.

Premiered / Live stream videos fail

If a video is premiered or live streamed, they end up in a failed status with "download error: no compatible format available for this video". I'm guess they come in through the API, but aren't actually available yet. We need to reprocess these automatically or skip them until they are available (I think the api should be able to tell us).

Some examples:
https://www.youtube.com/watch?v=drmXDZ7Jad4 (premiered)
https://www.youtube.com/watch?v=7fd_LLYIauM (live)

Autocrop YouTube thumbnails (eliminate black spaces)

See this page as an example.

imagemagick mogrify -trim -bordercolor black -fuzz x% may get you most of the way there, but it's worth running over a decent sample set to ensure it doesn't fail on images that are already pretty black.

for ytsync, when daemon errors, include error type in message. for example, InsufficientFundsError has an empty message.

Acceptance Criteria

Definition of Done

automatically reprocess videos with certain errors

There are a bunch of videos in failed state with "publish error: Error in daemon: Not enough funds to cover this transaction." / "download error: unexpected EOF" (not sure why this one is happening)

Sync channels more promptly

At least for popular channels (either by YT status, IAPI subs, or both)

Consider republishing malformed content

For some time period over the last two weeks, videos from ytsync were malformed, reducing their streamability.

Determine quantity of videos affected and date range.
Determine if there is a clear way to identify which videos are and are not affected
If feasible/practical, update the stream hashes for this content

Support publishing with new metadata

Requires similar issues to be completed on types/lbrysdk as #18
Not sure if we want to start publishing with new metadata after it's immediately available or once the full set is ready (if we publish immediately, we'll need to update the claims later with other metadata)

From the download file, grab the filename and file size to populate name/size in lbryio/types#9.
Grab video length from the YouTube api (or from lbry-sdk process if that's easier) to populate length in lbryio/types#9
Grab published date from YouTube api to populate releaseTime in lbryio/types#13
Grab category/tags from YouTube api to populate in lbryio/types#15 or lbryio/types#16
Grab language from YouTube api and populate language on claim metadata
Research other relevant youtube API data that may require new types/sdk entries.
Store any of these fields in the synced videos table as needed

Retry failed channel sync

YT sync channels go into failed mode on api hiccups or other temporary reason. We should retry these failed channels automatically.

transfer kinks (support user with duplicate videos)

videos published before transfer are not marked as transferred - this causes the video to fail in transferring. This happens if there are new videos after we got an address/pubkey
support list needs to use the default account id - otherwise it will try to abandon supports for transferred videos and fail to sync.
if the process breaks / needs restart, supports need to be resent manually
571b35609b90cffceafee38242a6c48f9499c276 had duplicate synced_videos, so the counts were wrong
when we try to delete some content after transfer, things fail - https://lbryians.slack.com/archives/D5W0D8ZJP/p1569617590246000

Derivable URLs for Creator Partnerships

As part of creator partnerships @robvsmith is forming, creators are going to be embedding LBRY URLs directly in YouTube descriptions.

Preferably, creators would be able to know the LBRY URL of that video at the time of publishing to YouTube (and thus before picked up by sync).

Potential solutions to this should be discussed and reviewed before implementation.

Support transfer process

Support transfer of claims and channels for YouTubers who have requested their content to be sent to their local wallets.
Main features:

batch send claims/channels to supplied address (lbryio/lbry-sdk#1821)
process tips - either abandon all tips or claim_send them to their new address, We could also tip their channel as another option - so they see the total amount of tips received (not sure if this is good or bad).

for ytsync, detect when lbrycrd is not running and error appropriately

@lyoshenka commented on Fri Mar 16 2018

Acceptance Criteria

Definition of Done

Update status correctly for failed videos that reprocessed

Videos that are marked failed, but actually process and later transferred are still marked Failed.

Should be easy to fix / cleanup old ones (where it has publish id):

Clean up eevblog channels

I believe he has about 2 or 3 channels from old sycs - we need to fix this up.

support more characters for synced channels

See https://github.com/lbryio/lbry-redux/blob/362d764c4c0de23032b6871b4d54207dc548028c/src/lbryURI.js#L7 for what the app / sdk are using

UI for editing YouTube content

@nikooo777 how feasible would it be to allow @robvsmith and other growth team members to edit YouTube publishes and channels? Ideally via the same UI in app.

speed up publishing of new videos

Right now we have a pull-type service where the channel needs to be processed to determine if there are newly published videos. It can take anywhere from 1 minute to 24 hours.

Support existing LBRY publisher on /youtube signup flow

@tzarebczan commented on Tue Jul 24 2018

In order to support existing LBRY creators who migrate to YouTube Sync (non-public/internal api issue - https://github.com/lbryio/internal-apis/issues/471), we would need to capture additional data like their channel signing certificate (needs to be transferred/stored securely) and wallet address to start publishing their content automatically.

@alyssaoc we may want to create an epic (or turn this issue into an epic) if and when we are ready to spec it out.

Fix "abandoned" status syncs

Looks like a bunch of youtube syncs were set to abandoned status prematurely because they were not rewards approved. Not sure how to best handle this yet - but we may have some YT sync users that get set to rewards disabled initially, and then set to approved. Some of these were synced, and then set to abandoned.

Port ytsync from lbrynet 0.24 to 0.30

@nikooo777 commented on Wed Oct 03 2018

We want to start using the new daemon.

To do that that we need to figure out the API calls that changed and fix them.

Some videos coming in with lower quality (still being processed?)

https://www.youtube.com/watch?v=qJ7YuscmPsk
lbry://why-i-m-quitting-online-coaching#559a41f0602288a8d130324c6541d26c2f689d3c

Maybe we are trying to download the video while it's still processing? Does the api tell us?

Thumbnail error: error creating thumbnail

There are about 26 videos (one for a channel I was recently looking into), with this error: thumbnail error: error creating thumbnail: Failed retrieving thumbnail with status code: 404

Here are 2 samples: https://www.youtube.com/watch?v=JXxhafcHF4Y
https://www.youtube.com/watch?v=kKXqxsXv5sw

Update all YT claims with new metadata

This will serve as the ytsyc issue to implement the below and an epic to track types/sdk required changes. When the sdk tickets are filed, I'll update it.

Setup a process to perform claim updates, preferably in batch mode
(lbryio/lbry-sdk#1821). If easy update mode is available (lbryio/lbry-sdk#1423), use it, if not, read from existing claim data. This also includes grabbing the sdblob for each claim in order to populate some of the new file related fields.
From the sdblob, grab the filename and file size to populate name/size in lbryio/types#9.
Grab video length from the YouTube api to populate length in lbryio/types#9
Grab published date from YouTube api to populate releaseTime in lbryio/types#13
Grab category/tags from YouTube api to populate in lbryio/types#15 or lbryio/types#16
Grab language from YouTube api and populate language on claim metadata
Research other relevant youtube API data that may require new types/sdk entries.

add support for priced content

@nikooo777 commented on Wed Jul 04 2018

Some content publishers want to have their videos published for a price.
For this reason ytsync must fetch the following parameters from the API: fee_amount, fee_currency, fee_address and then supply them when calling the publish method.
This issue depends on the implementation on the API side first.

@tzarebczan commented on Tue Jul 31 2018

We recently synced some content without a price for a creator who was promised we'd sync at 1 LBC.

@nikooo777 commented on Thu Aug 02 2018

and I agree that's a problem. I will try to get this in my next sprint

Switch thumbnail hosting away from berk.ninja

@kauffj commented on Tue May 22 2018

This is not a good choice long-term. Ideally it would be spee.ch.

@nikooo777 commented on Tue Oct 02 2018

I'd like to have a discussion on this.
I disagree that we should use spee.ch for thumbnails, but I agree that we should use a better domain in place of berk.ninja

I can discuss this at the office with Jeremy but my TL;dr would be that if we're not doing it in a completely decentralized way then we should just stick to a centralized solution that we know works fine and will work fine.
using spee.ch would only cause a burden because for each single claim we need a second claim just for the thumbnail, we'd be polluting the blockchain with content that is strictly necessary (possibly considered metadata) of another claim, on top of that, we'd depend on spee.ch being online and working forever (both the domain and the infrastructure) rather than just worrying about the domain and the data being on S3 (easily moved/replaced).

I don't see this solution scaling well and could become a huge problem in the future, so either we bundle the thumbnail with the claim (so that the thumbnail itself is part of the blobs associated) or we use the centralized solution.

edit: this ended up not being a tl;dr....

@lyoshenka commented on Wed Oct 03 2018

I agree with niko - putting this on spee.ch means we're making two claims for each upload. It also means if spee.ch goes down, all of the thumbnails will break. Are we committed to spee.ch being up with some SLA? I thought we still use it for testing things, running advanced SDK builds, etc.

Allow sync after sending youtube wallet to users

Currently, the duplicate checker may modify claims incorrectly if the YouTuber were to edit/publish/abandon some claims locally. We should be able to send them their wallets and continue publishing from YT.

Compare size and bitrate between 720/0180

The bitrate on the 720 exports is quite low (~600 kbps), so we should consider moving to 1080p if if it's better and the file size is not that much larger.

Track country / language in youtube data

So that we can identify where creators come from and have the right people talk to them

Ensure vanity channel URLs for higher value YouTubers

We had discussed the possibility of ensuring that YouTube channel
(at least for those with X amount of subs) lbry URLs resolve at their vanity names. This recently came up as a source of confusion on Twitter: https://twitter.com/eevblog/status/1080609595289042945

This shouldn't be too hard to setup as part of the YT sync process.

queueing and reprocessing failures

A video may fail to sync for many different reasons and may not retry until a new video is published or we have the creator complain to us. We should put all the "legit" failures into a queue and have at least 1 server reprocess those.

Allow Lbryian to see who is on queue to be sync

This will make us be at ease to see that our channel or video is on the queue.

Support new SDK pagination for many API commands

See https://github.com/lbryio/lbry-sdk/releases/tag/v0.44.0

breaking API change for all list commands which now all always paginate - for simple upgrade in client code without implementing pagination logic use page: 1 page_size: 1000 or similar and just extract the list in items

Delay syncing video if newly posted / not post processed

@tzarebczan commented on Tue Sep 11 2018

I think we may sometimes run into a scenario where we sync a video before it is post-processed in 720P+ on YouTube so we end up syncing a lower quality version. We should check when the video was posted, and if it's very recent (not sure what the timing for post-processing looks like), delay download/publish until the next iteration.

claim too large

This could be happening due to a large # of tags with a combination of a long description (I know we truncate)

https://lbryians.slack.com/archives/CACSTN9SL/p1564663088289500

'the transaction was rejected by network rules.\n\n16: bad-txns-claimscriptsize-toolarge\n[0100000001cdcc791725317c775b77e37c7f2879d90d916176697c3b0fd266d9c802b0edb9010000006b483045022100ec9d07981fbafe98c5bf74261c0c350028e081e3071fc6156b23d0e5c013e42c02200dc41e49ff8d4f80a4552c92a9752626421cacc9daa9a6c4ee5da4d174a24a34012102c1735560213182766f37c2ee2920c61bbe3275e99b4ac1242c8e1a017621dbd7ffffffff0240420f0000000000fd3022b5246d6173732d707562672d737569636964652d707562672d7468652d6c6f6b737761722d334dec2101e752db183a80641aa6401f7665f6f080d69bc0a9a6f60015dd37bee723cec9f55f2367b766d6e5a7f9260894a1f7ae739850f8dda9e5c7a3f2f9d2a3c981bacdb2e94512f0309100b29f2d941a96e2a45794cca10ac9010a94010a30b1ac510e882aae5c54897ead660c1cc743e67e9214a8c743dc3d869a0e536124083585e18b213a67aefb945631b31407121e6d6173732d707562672d737569636964652d707562672d7468652e6d703418b7a5e4092209766964656f2f6d703432309ae835089b040ebfb4f7b5ae199eeae4ca51b1e8e1a83b07c06add43c85331db6597d4c79f3baf729aaecc2bc11c330d1a1f436f7079726967687465642028636f6e74616374207075626c69736865722928d9a184e0055a0908d60610d60318fb0142534d617373205055424720537569636964653f20e0a49ce0a4bee0a4a8e0a4b2e0a587e0a4b5e0a4be20e0a4b9e0a58820e0a4afe0a587207075626720e0a497e0a587e0a4ae202121546865206c6f6b737761724ac23ee0a4ade0a4bee0a4b0e0a4a420e0a4aee0a58720e0a4b9e0a4b020e0a486e0a4afe0a58720e0a4a6e0a4bfe0a4a820e0a495e0a58be0a48820e0a4a820e0a495e0a58be0a48820e0a497e0a587e0a4ae20e0a49ae0a4b0e0a58de0a49ae0a4be20e0a4aee0a587e0a48220e0a4ace0a4a8e0a58020e0a4b0e0a4b9e0a4a4e0a58020e0a4b9e0a588e0a5a4e0a49ce0a58b20e0a4afe0a581e0a4b5e0a4bee0a49320e0a495e0a58b20e0a485e0a4aae0a4a8e0a58020e0a493e0a4b020e0a486e0a495e0a4b0e0a58de0a4b7e0a4bfe0a4a420e0a495e0a4b020e0a495e0a58720e0a4a7e0a580e0a4b0e0a58720e0a4a7e0a580e0a4b0e0a58720e0a489e0a4a8e0a495e0a58b20e0a485e0a4aae0a4a8e0a58720e0a4a8e0a4bfe0a49ce0a58020e0a59be0a4a8e0a4a6e0a497e0a58020e0a4b8e0a58720e0a4a6e0a582e0a4b0e0a58020e0a4ace0a4a8e0a4be20e0a4a6e0a587e0a4a4e0a58020e0a4b9e0a588e0a5a4e0a48fe0a49520e0a490e0a4b8e0a4be20e0a4b9e0a58820e0a497e0a587e0a4ae205055424720e0a4b9e0a58820e0a49ce0a58b2020e0a4aae0a4bfe0a49be0a4b2e0a58720e0a495e0a581e0a49b20e0a4a6e0a4bfe0a4a8e0a58be0a48220e0a4b8e0a58720e0a4afe0a581e0a4b5e0a4bee0a49320e0a495e0a58720e0a4b8e0a4bfe0a4b020e0a49ae0a59d20e0a4ace0a58be0a4b220e0a4b0e0a4b9e0a58020e0a4b9e0a588e0a5a4e0a487e0a4b820e0a497e0a587e0a4ae20e0a495e0a58b20e0a4b2e0a587e0a495e0a4b020e0a4afe0a581e0a4b5e0a4bee0a49320e0a495e0a58720e0a4ace0a580e0a49a20e0a48fe0a49520e0a485e0a4b2e0a49720e0a4b9e0a58020e0a4a4e0a4b0e0a4b920e0a495e0a58020e0a4a6e0a580e0a4b5e0a4bee0a4a8e0a497e0a58020e0a4a6e0a587e0a496e0a4a8e0a58720e0a495e0a58b20e0a4aee0a4bfe0a4b220e0a4b0e0a4b9e0a58020e0a4b9e0a588e0a5a4e0a4b5e0a587e0a4b8e0a58720e0a4a4e0a58b207075626720e0a4aae0a582e0a4b0e0a58720e0a4ade0a4bee0a4b0e0a4a420e0a495e0a58720e0a4afe0a581e0a4b5e0a4bee0a493e0a48220e0a495e0a58720e0a4ace0a580e0a49a20e0a49be0a4bee0a4afe0a4be20e0a4b9e0a581e0a48620e0a4b9e0a58820e0a4b2e0a587e0a495e0a4bfe0a4a820e0a496e0a4bee0a4b8e0a495e0a4b020e0a495e0a58720e0a4ace0a482e0a497e0a4b2e0a58be0a4b020e0a495e0a58720e0a4afe0a581e0a4b5e0a4bee0a493e0a48220e0a495e0a4be20e0a4b2e0a4a420e0a4ace0a4a820e0a497e0a4afe0a4be20e0a4b9e0a588e0a5a4e0a487e0a4b8e0a495e0a58020e0a4b2e0a4a420e0a489e0a4a8e0a495e0a58020e0a4abe0a4bfe0a49ce0a4bfe0a495e0a4b220e0a4b9e0a587e0a4b2e0a58de0a4a520e0a4b8e0a58720e0a4b2e0a587e0a495e0a4b020e0a4aee0a587e0a482e0a49fe0a4b220e0a4b9e0a587e0a4b2e0a58de0a4a520e0a4aae0a4b020e0a4ade0a58020e0a485e0a4b8e0a4b020e0a4a1e0a4bee0a4b220e0a4b0e0a4b9e0a58020e0a4b9e0a588e0a5a4c2a0e0a4a8e0a580e0a482e0a4a620e0a495e0a58020e0a4aae0a4b0e0a587e0a4b6e0a4bee0a4a8e0a5802c20e0a485e0a4b8e0a4b220e0a49ce0a4bfe0a482e0a4a6e0a497e0a58020e0a4b8e0a58720e0a4a6e0a582e0a4b0e0a5802c20e0a495e0a589e0a4b2e0a587e0a49c20e0a4b520e0a4b8e0a58de0a495e0a582e0a4b220e0a4b8e0a58720e0a4b2e0a497e0a4bee0a4a4e0a4bee0a4b020e0a490e0a4ace0a58de0a4b8e0a587e0```

figure out why all content doesn't transfer sometimes

This came up a few times recently when users have complained. They can't edit or delete the content.

Address old review comments

@nikooo777 commented on Thu Aug 02 2018

A few dirty things were merged in master to not leave the PR hanging any longer. They should be addressed:

@alyssaoc commented on Thu Nov 01 2018

Issue moved to lbryio/ytsync #4 via ZenHub

Delay sync until account manually approved

This way we won't sync something we didn't intend to.

Better handle API quota usage overshoots

don't fail channels when we run out of quota, instead wait until 0000PT and retry

Add support for two bit rates per stream: "low" 480p and "high" 1080p

The SDK part of this is defined here: lbryio/lbry-sdk#2879

The changes in ytsync would be:

download a 480p and a 1080p versions of the video
call stream_create/stream_update with --file_path_high_def passing the 1080p file and --file_path_low_def passing the 480p file

giving up after 0 fragment retries

https://lbryians.slack.com/archives/CACSTN9SL/p1564664667305700

Should this say 0? From other things I read, it should have a value of 10.

Sync Khan Academy content from YouTube

Believe we should be able to sync this legally by attributing the correct copyright.

https://www.youtube.com/user/khanacademy/videos

https://www.khanacademy.org/about/tos

"7.5 Crediting Khan Academy. If You distribute, publicly perform or display, transmit, publish, or otherwise make available any Licensed Educational Content or any derivative works thereof, You must also provide the following notice prominently along with such Licensed Educational Content or derivative work thereof: “All Khan Academy content is available for free at www.khanacademy.org”.

Ensure ytsync continuous processing / better alerts

We ran into an issue recently where a bunch of existing channels were not being synced because they were in failed status and also because of some server issues. We should be able to re-try sync in some of these scenarios and provide better errors/notifications when things stop processing server side.

Alert us on slack when ytsync dies
Create dashboard to overview failed channels
Alert us when the youtube sync wallet runs out of credits
Address remaining failed videos (details below)

Main error to address Initial wallet setup failed! Manual Intervention is required.: channel_claim_id: cannot be blank.

Support transferring of deleted channels / channels showing 0 videos

Today, a channel in "deleted on youtube" status cannot be transferred. There's a couple people in this boat requesting their channel.

Address old review comments

@nikooo777 commented on Thu Aug 02 2018

A few dirty things were merged in master to not leave the PR hanging any longer. They should be addressed:

lbryio / ytsync Goto Github PK

ytsync's Issues

Acceptance Criteria

Definition of Done

Acceptance Criteria

Definition of Done

Acceptance Criteria

Definition of Done

Recommend Projects

Recommend Topics

Recommend Org