Nature Poster

Developed by Logan Maupin

Nature Poster is an automatic Facebook posting bot I developed with Python as a side project.

If you have any questions or suggestions on how to improve the code, feel free to reach out to me!

Link to Facebook page Nature Poster makes posts to

🌱 🌲 🌿 🌳

What is Nature Poster?

Nature Poster is an automatic bot scripted to periodically make nature-themed photography and video posts to the Beautiful Worlds Facebook account.

Example posts:

Photograph of a snowy tree branch

Video of a fish and a turtle near coral

The posts include a short, simple description of the image or video and a URL link to the original content source. All images and videos are sourced exclusively from Pexels, a royalty free site for high resolution stock media.

How does it work?

Pexels API GET Request

Nature Poster searches for and retrieves photos/videos via GET requests to the Pexels API.

Example endpoint with arbitrarily selected query:

GET https://api.pexels.com/v1/search?query=sunset&per_page=15

The script uses the pexels_api Python client library to make requests.

Request parameters have been configured so that the API response always returns 15 photos or videos as a single page's worth of results. The script has access to further pages if no result from the initial batch converts to a successful Facebook post.

Example response for single photo:

{
  "id": 2014422,
  "width": 3024,
  "height": 3024,
  "url": "https://www.pexels.com/photo/brown-rocks-during-golden-hour-2014422/",
  "photographer": "Joey Farina",
  "photographer_url": "https://www.pexels.com/@joey",
  "photographer_id": 680589,
  "avg_color": "#978E82",
  "src": {
    "original": "https://images.pexels.com/photos/2014422/pexels-photo-2014422.jpeg",
    "large2x": "https://images.pexels.com/photos/2014422/pexels-photo-2014422.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940",
    "large": "https://images.pexels.com/photos/2014422/pexels-photo-2014422.jpeg?auto=compress&cs=tinysrgb&h=650&w=940",
    "medium": "https://images.pexels.com/photos/2014422/pexels-photo-2014422.jpeg?auto=compress&cs=tinysrgb&h=350",
    "small": "https://images.pexels.com/photos/2014422/pexels-photo-2014422.jpeg?auto=compress&cs=tinysrgb&h=130",
    "portrait": "https://images.pexels.com/photos/2014422/pexels-photo-2014422.jpeg?auto=compress&cs=tinysrgb&fit=crop&h=1200&w=800",
    "landscape": "https://images.pexels.com/photos/2014422/pexels-photo-2014422.jpeg?auto=compress&cs=tinysrgb&fit=crop&h=627&w=1200",
    "tiny": "https://images.pexels.com/photos/2014422/pexels-photo-2014422.jpeg?auto=compress&cs=tinysrgb&dpr=1&fit=crop&h=200&w=280"
  },
  "liked": false,
  "alt": "Brown Rocks During Golden Hour"
}

A query to the search endpoint returns as data an object with a list of 15 photo objects like the one above.

Selecting a search term

The bot randomly selects a search term from a list of nature-related terms in a SQLite3 database table named Photo_Search_Terms.

Processing results and criteria

Once the list of images or videos from the API has been fetched, the bot loops through it. For each image or video, it sequentially cycles through criteria to determine if the image or photo meets the requirements to be posted.

Photo processing is accomplished via a method belonging to the Pexels_Photo_Processing class in the Nature_Poster_Photos module. Video processing is accomplished via a method belonging to the Pexels_Video_Posting class in the Nature_Poster_Videos module.

At the first failed criterion, the bot discards the image or video under consideration, and the loop shifts to the next item.

These criteria include:

Photo or video description parsed from URL cannot contain a prohibited word from the "bad words" list stored in our database. A bad word for the bot is not only an inappropriate or curse word, but also a word like "man", because that would indicate the photo or video has a focus other than something nature-related.
Photo itself cannot contain a prohibited word either as a caption or displayed on any object, structure, or item of apparel, for the same two reasons mentioned in the above criterion.
Photo or video must have an acceptable file extension, i.e., jpg, png, etc., if photo.
Photo or video cannot be a duplicate post; it must not already be in either the database table Nature_Bot_Logged_FB_Posts or Nature_Bot_Logged_FB_Posts_Videos.
Hash string of downloaded image must not already be in the database. This criterion is necessary because the same photo can be reposted on Pexels with a different ID.
Video file size must be smaller than 1 GB, and photo file size must be smaller than 4 MB.
Video duration must be shorter than 20 minutes long, to comply with Facebook post limitations.

An instance of an image with text that would be picked up by OCR:

Optical Character Recognition

OCR is a process carried out to detect and convert text in an image to an easily usable data type like a string.

The bot integrates the Pytesseract OCR package and the cv2 computer vision package, which in conjunction work to detect characters in our images.

cv2 functions are used to load an image from a file, then convert that image to a different color space. This second step is key.

Lighting conditions are an extremely important factor in computer vision and are usually the difference between OCR algorithms failing or succeeding to detect characters.

Our OCR image processing result is a string whose words are then checked for a prohibited word, i.e., "man", "woman", "car", etc.

image_text = Image_Processing.ocr_text("image.jpg")

if Text_Processing.there_are_badwords(image_text, bad_words_list):
      continue

Image hashing

Image hashing is the process of creating a hash value (a fixed length output code string mapped from some input) based on the visual data of an image.

Similar images that look nearly identical will have the same hash.

The bot uses an image hashing library to create a hash for every successful and discarded image. Hashes are stored alongside other post data for a successful post in the same database table row. They are used for comparison with hashes of images under review as one of two measures to prevent duplicates.

Example of an image hash being generated:

return str(imagehash.dhash(Image.open(filename)))

"9f172786e71f1e00"

Posting to Facebook

If an image or video candidate survives all the foregoing criteria, then the bot attempts to post it to Facebook with a network request to the Facebook API.

If the attempt fails, the bot moves on to the next image or video and processes it through the same sequence of relevant filtering criteria.

The script allows for five attempts before stopping the loop through the media list. Such a decision is a crucial fail-safe meant to ensure the script does not become trapped in an infinite loop if Facebook servers are unresponsive due to an ongoing issue.

Creating the caption

After a successful media post to Facebook, the bot still needs to programmatically construct the post caption, including a description, a Pexels URL, and a P.S. section with a link to the bot's Github repository.

The photo and caption on a Facebook post cannot be created simultaneously with a single POST request. For this reason, the bot must first create the image post, then edit and append the caption.

# edit caption of existing fb post we just made
fb.put_object(parent_object=f'{fb_page_id}_{post_id}', connection_name='',message=f'Description: {photo_description}\n\nPexels image link: {photo_permalink}\n\n'
f'P.S. This Facebook post was created by a bot. To learn more about how it works, '
f'check out the GitHub page here: {GitHub_Link}')

Another example caption:

Inserting data into sqlite3 db

Post officially complete, the bot then inserts metadata about the post (media description, media URL, hash string, file size, Facebook post ID, etc.) into the Nature_Bot_Logged_FB_Posts or Nature_Bot_Logged_FB_Posts_Videos table.

Post data recorded in these tables will be checked against in subsequent post attempts to prevent duplicates.

Issues and areas for improvement

The script does not keep a record of checked and disqualified photos or videos. Therefore these same media will be checked again in the future if the exact search term is selected once more. This is obviously inefficient. A solution might be to store the IDs of encountered and rejected media somewhere in the database, though this might become too storage intensive.
If the bot attempts to post a qualified photo or video and Facebook is down for some reason, then the photo or video will be discarded. The bot will not save it for a resumed attempt once Facebook server issues are resolved.
There is presently no way of ascertaining whether the photo or video post has been manually deleted from Facebook. Manual deletion would happen in the case that the bot made a post and the image or video did not fit the nature theme. This information is important for training a machine learning model so it has data on which media were deleted and can make decisions based off this dataset. The only potential solution we can think of at the moment would be too impractical.

Stop Duplicate Videos from posting

So, we have a solution for this with images by use of image hashing. However, this is not as viable for videos because a video is a just a collection of a crap ton of images. It may not be feasible to hash every single frame and store that list in a database. Despite this, there are 2 possible solutions I've seen already made for this problem. Both of which have obvious flaws.

The first solution is the video hash library and it doesn't do well with shorter videos (which most of the videos posted are), and also if the video is timeshifted at all, it won't do as well to find duplicate videos.

The second solution is video fingerprinting, which is basically the same thing with a slightly different approach, but again mostly the same problem of timeshifting.

Image hashing doesn't just look for perfect duplicates, it can find similar images as well. So if the video is shifted by only a few frames it might be okay and still work to just take a hash every second and compare that with the other video's hash every second. But still could as not a viable strategy so we can't fully rely on it.

Here's the issue: It's too costly and inefficient to hash every single frame a video, store that in the database, and then compare every hashed frame of the target video with every other thousands of hash strings in the database.

Ideally you'd only hash a few frames of the video but this doesn't account for the timeshifting problem of what if the user just adds a 5.23 second intro or something and now all the frames are off and don't line up when hashed, thus defeating the video duplication check.

If we could just hash every single frame, the problem would be solved. But even if you scale down each frame to a super small resolution and hash all of those, it will still take a metric crap ton of time to hash one video let alone hundreds. Not to mention, storing all those hash strings in the database will increase the database size by like 1000x which is a very costly change for such a minor gain.

If anyone has a better solution to this problem, please let me know as I continue to research it. I'm even looking into some advanced mathematical algorithms and even machine learning for a solution to this problem.

The solution to this problem could not only work for this project but could be applied to any other video hosting / analysis service out there. So it seems highly sought after.

voltaic314 / nature-poster Goto Github PK

nature-poster's Introduction

Nature Poster

Developed by Logan Maupin

What is Nature Poster?

Example posts:

How does it work?

Pexels API GET Request

Selecting a search term

Processing results and criteria

Optical Character Recognition

Image hashing

Posting to Facebook

Creating the caption

Inserting data into sqlite3 db

Issues and areas for improvement

nature-poster's People

Contributors

Stargazers

Watchers

Forkers

nature-poster's Issues

Recommend Projects

Recommend Topics

Recommend Org