Git Product home page Git Product logo

lemmy-explorer's Introduction

publish-pages

Data Dumps: https://data.lemmyverse.net/

This project provides a simple way to explore Lemmy Instances and Communities.

List of Communities

The project consists of four modules:

  1. Crawler (NodeJS, Redis) /crawler
  2. Frontend (ReactJS, MUI Joy, TanStack) /frontend
  3. Deploy (Amazon CDK v2) /cdk
  4. Data Site (GitHub Pages) /pages

FAQ

Q: How can I set a link to automatically set the home instance?

You can append home_url and (optionally) home_type to the URL to set the home instance and type.

?home_url=lemmy.example.com ?home_url=kbin.example.com&home_type=kbin

home_type supports "lemmy" and "kbin" (default is "lemmy")

Q: How does discovery work?

It uses a seed list of communities and scans the equivalent of the /instances federation lists, and then creates jobs to scan each of those servers.

Additionally, instance tags and trust data is fetched from Fediseer.

Q: How does the NSFW filter work?

The NSFW filter is a client-side filter that filters out NSFW communities and instances from results by default. The "NSFW Toggle" checkbox has thress states that you can toggle through:

State Filter Value
Default Hide NSFW false
One Click Include NSFW null
Two Clicks NSFW Only true

When you try to switch to a non-sfw state, a popup will appear to confirm your choice. You can save your response in your browsers cache and it will be remembered.

Q: How long till my instance shows up?

How long it takes to discover a new instance can vary depending on if you post content that's picked up by one of these servers.

Since the crawler looks at lists of federated instances, we can't discover instances that aren't on those lists.

Additionally, the lists are cached for 24 hours, so it can take up to 24 hours for an instance to show up after it's been discovered till it shows up.

Q: Can I use your data in my app/website/project?

I do not own any of the data retrieved by the crawler, it is available from public endpoints on the source instances.

You are free to pull data from the GitHub pages site:

Lemmyverse Data Site

Please don't hotlink the files on the public website https://lemmyverse.net/

Q: How often is the data updated?

Currently, I upload a Redis dump generated by the crawler each night to s3, GitLab builds the JSON dump from that.

Data is also available from the artifacts of this action. You can also download Latest ZIP (using nightly.link)

dist-json-bundle.zip file contains the data in JSON format:

  • communities.full.json - list of all communities
  • instances.full.json - list of all instances
  • overview.json - metadata and counts

Crawler

Crawler README

Frontend

Frontend README

Data Site

Data Site README

Deploy

The deploy is an Amazon CDK v2 project that deploys the crawler and frontend to AWS.

config.example.json has the configuration for the deploy.

then run cdk deploy --all to deploy the frontend to AWS.

Similar Sites

Lemmy Stats Pages

Thanks / Related Lemmy Tools

Credits

Logo made by Andy Cuccaro (@andycuccaro) under the CC-BY-SA 4.0 license.

lemmy-explorer's People

Contributors

golfinq avatar hkayn avatar poolitzer avatar sunaurus avatar tgxn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lemmy-explorer's Issues

Can we flag https://lemmit.online/ as as suspicious instance?

Can we flag https://lemmit.online/ as a suspicious instance? It's an instance run by a bot and it mass reposts stuff from reddit which is basically spam. I don't want to see communites from this instance while searching for sublemmies because sublemmies on this instance are VERY LOW quality and you won't get any engagement when you do interact with those communities by commenting on stuff.

Or could we at least get an option to block choosen instances by ourselves so user can decide what he doesn't want to see?

Communities vanishing from the browser

Hi!

There seems to be an issue with the community browser where some communities get removed although they still exist on the instance.

I am not sure if this problem is with the specific Lemmy instance, the specific community or the lemmyverse browser and I am not sure how to troubleshoot it.

The community is feddit.de/c/cozygames.
It definitely did appear on the browser before.

Other communities from the feddit instance are still able to be found with your browser. On feddit I can find the community when I look for "local" or "all" communities.

Thank you so much and I hope this is posted to the correct place. :)

banner caching

they are pretty slow to load from all these federated servers...

should process and cache the banner

  • download banner,m proces to correct dimensions
  • minify to jpeg etc
  • store ???

[Feature Request] Suggest communities for user

This project has a great opportunity to help with one of the most requested features for Lemmy.

Many users are saying that there should be a way to somehow combine same-name communities across instances.

  • Add new section to website that has a field for @[email protected] or https://instance.tld/u/username
  • get / scrape communities that user is subscribed to
  • return list of communities with the same name on other known instances.
  • (option to hide/show already subscribed-to communities)

Something like this would be an awesome tool that would greatly help with discoverability.

Instance Lists

Create Lists

  • Blocked/Banned Servers (servers that appear is banned federation servers)

  • Allowed (servers that appear in the allowed servers)

  • Default Servers (servers that appear to be trusted/big/good moderation/no spam)

Lemmy instance not showing on list

Hello! I stood up a Lemmy instance on June 25th. I’ve been checking for a few days and it hasn’t shown up here yet. Is there anything I need to do on the server for the crawler to discover it? Instance name is techy.news. Thank you!

Trending communities

It would be cool if there was something like a trending section. Trending communities could be the communities that have increased activity in the last x days. Or something like that.

Lemmy JS SDK Integration

Allow Configuring Instance/Username/Password to determine user subscriptions and allow users to one-click subscribe on their instance.

Export JSON or CSV

Is there a way to export data (instances or communities lists) to a JSON or other format file?

Sustainable Data Distribution

I want to publish:

  • Redis data dump
  • data/**.json exports

Somewhere other than Git, as they don't need long-term versioning.

I also don't really want people pulling data from CloudFront.
image

Currently thinking an S3 bucket for job storage, and publishing the dumps to GitHub Artifacts or Releases?

Show "Last Updated"

  • For overall site (when was the json updated) ("Last Crawled")
  • For the Individual Instances/Communities ("Data Age")

Error when filtering instances or communities

I had to open the site in incognito mode because I couldn't access it anymore because of this bug.

As soon as I start typing in the filter input field the website turns completely white (blank) and I get this error in the console:

Uncaught ReferenceError: retinstancesurn is not defined
    e https://lemmyverse.net/bundle.js:5
    T https://lemmyverse.net/bundle.js:5
    T https://lemmyverse.net/bundle.js:5
    Ki https://lemmyverse.net/bundle.js:2
    useMemo https://lemmyverse.net/bundle.js:2
    Sb https://lemmyverse.net/bundle.js:5
    xi https://lemmyverse.net/bundle.js:2
    Ls https://lemmyverse.net/bundle.js:2
    Ml https://lemmyverse.net/bundle.js:2
    yu https://lemmyverse.net/bundle.js:2
    gu https://lemmyverse.net/bundle.js:2
    vu https://lemmyverse.net/bundle.js:2
    au https://lemmyverse.net/bundle.js:2
    ou https://lemmyverse.net/bundle.js:2
    S https://lemmyverse.net/bundle.js:2
    Y https://lemmyverse.net/bundle.js:2

As soon as I got this error I was unable to revert to the site without filtering. Deleting cookies didn't help. Only opening it in an incognito tab.

grafik

Additional Overview Page

  • Show Totals (Instances/Communities)
  • Show All Fediverse service counts
  • Show Lemmy Version Breakdown
  • Show Total Comments/Posts across all instances

Crawler to Re-Try Failures

Failed instance polls should result in a re-try 1 hour later.

If they fail again they should be added to a "failed connect" table and excluded from jobs.

[Feature Request] Allow selecting of a "home" instance and show appropriate URLs to other communities

One of the biggest issues with Lemmy right now is that it's not easy to browse communities from other instances and subscribe to them in one-click. More often than not you can only browse an instance's communities directly from that instance, in which case you have to copy the [email protected] identifier and search for it on your home instance. It's quite a chore.

This site is wonderful for searching and browsing other instances and communities, but it suffers the same problem - you have to do the copy->paste->search dance to join the communities displayed.

Ideally, you should be able to pick whatever your home instance is (Or even just type in the base URL) and all of the various urls will be displayed as the https://my.home.instance/c/[email protected] format instead of just the [email protected] format.

It'd be a simple change but make finding and joining new communities 2 clicks instead of several.

Lemmy Server Location

Ideally ability to filter communities by Country/Server Location would be ideal

Data Ideas:

  • Perform IP Lookup on Server (CF/CDN can get classisifed as "Global"
  • Perform ping from crawler - Would only give "distance from Australia"
  • Use data from Feddit Explorer - They have time basic ping response times available.
  • Use a service/lambda to ping servers from multiple locations

Negative search terms in community search

e.g. a search like "cats -dogs" will show communities that have "cats" in their searchable fields, filtering out communities that have "dogs" in their searchable fields.

Public API

Hello! 👋 great work on Lemmy Explorer! This is really more of a question/request than an issue, but I was curious if the API is able/allowed to be used within other apps (ie as the search functionality in a lemmy client)?

Bug: Crawler Data Normalization

  • BaseUrl's should be transformed to lower case, spaces removed.

  • should have no @ - strip if anything before
    image

  • Before scanning url, ensure it has no spaces, and no https:// and is not * and aklso not blank.
    image
    image

Feature request: Setting for Lemmy instance to open

Currently the lemmy communities open on their respective instances upon clicking. It would be great if user was able to provide their instance of choice to open all communities in, so that upon clicking on [email protected] the website would open with URL users-lemmy-instance.tld/c/[email protected] rather than lemmy.server/c/community

Happy to create a PR for that today or tomorrow.

page for "choose best server"

allow people to quickly navigate to an appropriate server they could join

use methods to determine if it's "good"

  • ranking in federation (not appearing on any known blocked instances)
  • ranking by active users (not top 3 by users?)
  • randomize suggested server for each visitor

[Feature Request] "bot-infested" instance detection

There is currently a huge amount of bots signing up on smaller instances, especially instances with no captcha + no e-mail verification.

It would be quite useful to be able to detect and filter for such instances on lemmyverse.net, perhaps by checking for discrepancies between user counts and post counts? Or maybe by checking for instances with massive user growths but without a similar growth in post count?

Sorting/Scoring System For Instances

Discussed in https://github.com/tgxn/lemmy-explorer/discussions/23

Originally posted by tgxn June 14, 2023
Because we need to determine if an instance is "good" there needs to be a way to score each instance based on data we have about it.

Currently, my thinking/implementation looks at the lists of federated sites, and scores each instance based on the amount of other instances that refer to it (in the linked, allowed and blocked lists).

Scoring is applied by the following rules:

Instances

      let score = 0;
      if (linkedFederation[siteBaseUrl]) {
        score += linkedFederation[siteBaseUrl];
      }
      if (allowedFederation[siteBaseUrl]) {
        score += allowedFederation[siteBaseUrl] * 2;
      }
      if (blockedFederation[siteBaseUrl]) {
        score -= blockedFederation[siteBaseUrl] * 10;
      }

Communities

Uses the same base score as instances, and then adjusts based on a posts per subscriber metric.

      let score = 0;
      if (linkedFederation[siteBaseUrl]) {
        score += linkedFederation[siteBaseUrl];
      }
      if (allowedFederation[siteBaseUrl]) {
        score += allowedFederation[siteBaseUrl] * 2;
      }
      if (blockedFederation[siteBaseUrl]) {
        score -= blockedFederation[siteBaseUrl] * 10;
      }

      // also score based subscribers
      score = score * community.counts.subscribers;

These rules are obviously not ideal, as I'd need to run some more analysis to determine if they are tuned correctly.

I'm also thinking that it might be worthwhile to log an "uptime" or "first seen" score also to determine if it's been around/up for a while.

[Feature Request] Add Universal Link to open remote communities in a user's home instance

Add a Universal Link option in the format of /c/[email protected] so that communities will automatically open in the user's home instance.

Currently there are 2 options on a community listing:

  1. A url to the community's home instance
  2. A quick copy function for ![email protected]

The first is great for locating a community's source, but bad if the community is located on an instance other than your home instance.

The second is a remedy for the first's issue, allowing a user to copy the community string so they can place it in their home instance's search field

My recommended 3rd option would skip over the copy/paste step mandatesld by option no.2 , and allow the user to immediately open the remote community in their home instance. After which, the user can just press subscribe.

Bug: Search results don't reset to 1st page

If you are not on page 1 (e.g. Browse to page 2 of community results) and then enter a term into the search box which produces less than 100 results, then the screen just shows a blank result set. You can navigate back to page 1 to see the results, but it's not immediately obvious to the user what the problem is.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.