Git Product home page Git Product logo

linkcrawler's Introduction

LinkCrawler

Simple C# console application that will crawl the given webpage for broken image-tags and hyperlinks. The result of this will be written to output. Right now we have these outputs: console, csv, slack.

Example run with console output: Example run with console output

Why?

Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located.

Build

Clone repo ๐Ÿ‘‰ open solution in Visual Studio ๐Ÿ‘‰ build ๐Ÿ‘Š

AppVeyor is used as CI, so when code is pushed to this repo the solution will get built and all tests will be run.

Branch Build status
develop Build status
master Build status

AppSettings

Key Usage
BaseUrl Base url for site to crawl
SuccessHttpStatusCodes HTTP status codes that are considered "successful". Example: "1xx,2xx,302,303"
CheckImages If true, <img src=".." will be checked
ValidUrlRegex Regex to match valid urls
Slack.WebHook.Url Url to the slack webhook. If empty, it will not try to send message to slack
Slack.WebHook.Bot.Name Custom name for slack bot
Slack.WebHook.Bot.IconEmoji Custom Emoji for slack bot
OnlyReportBrokenLinksToOutput If true, only broken links will be reported to output.
Slack.WebHook.Bot.MessageFormat String format message that will be sent to slack
Csv.FilePath File path for the CSV file
Csv.Overwrite Whether to overwrite or append (if file exists)
Csv.Delimiter Delimiter between columns in the CSV file (like ',' or ';')

Ther also is a <outputProviders> that controls what output should be used.

Output to file

LinkCrawler.exe >> crawl.log will save output to file. Slack

Output to slack

If configured correctly, the defined slack-webhook will be notified about broken links. Slack

##How I use it I have it running as an Webjob in Azure, scheduled every 4 days. It will notify the slack-channel where the editors of the website dwells.

Creating a webjob is simple. Just put your compiled project files (/bin/) inside a .zip, and upload it. Slack

Schedule it.

Slack

The output of a webjob is available because Azure saves it in log files. Slack

Read more about Azure Webjobs: https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/

Read more about Slack incoming webhooks: https://api.slack.com/incoming-webhooks

linkcrawler's People

Contributors

hmol avatar mgroves avatar niklashansen avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.