Git Product home page Git Product logo

opengraph-net's People

Contributors

feodorfitsner avatar ghorsey avatar ghorsey-opt avatar perosb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opengraph-net's Issues

ParseUrlAsync() creates new HttpClient per call

public static async Task<OpenGraph> ParseUrlAsync(Uri url, string userAgent = "", bool validateSpecification = false, int timeout = 90000, CancellationToken cancellationToken = default)

Everytime method is called new HttpDownloader in instantantiated and the GetPageAsync() is called.

public async Task<string> GetPageAsync(CancellationToken cancellationToken = default)

This method creates the HttpClient per call. Due to Microsoft docs this is not the best practice.

I think new signature for ParseUrlAsync() is needed, that would have an HttpClient or IHttpClientFactory as parameter.

Security Checking of ParseUrl

Edge case issue:
Maybe this already exists but I didn't see it in the code.

Are you doing any security checking of fields being parsed? Should developers being doing their own security checks of parsed data?

For example, if a "title", "url", or any field is being pulled from third party website, with a XSS attack payload within it, and that field is then being injected into the HTML, is this going to cause an XSS exploit?

Thanks for the library. It's very useful.

Retrieving base tags does not work for some URLs

Hello!
I'm trying to get title, description and image using OpenGraph-Net but fail for some reason I don't understand. Could you help me?

For example I tried to get data for https://google.com. You can see the code below.
изображение

Below you can see what the graph object looks like after the invocation.
изображение
As you can see, I'm missing all the data I need. And as I understand it, I should get something.

If I try to get data for https://youtube.com instead, I only get the image, whereas title and description remain the same.
изображение

But as for https://yandex.com it is different. I can get all the information I need.
изображение

I am using OpenGraph-Net of v.3.2.4 in ASP.NET Core WebApi application (.NET Core 2.1). Locally I run application on Win 10. The problem persists in staging environment which is on Windows Server 2019.

Any help is much appreciated! Thanks in advance!

Is there a way to specify a timeout so I can continue with other work more quickly if the target site times out?

Is your feature request related to a problem? Please describe.
I have a problem where I'm retrieving a list of open-graph responses in an API. If one site is down it breaks my whole API; I'd like to be able to provide a timeout so that if it takes more than a few seconds to provide a response I can move on to the next one in the list.

Describe the solution you'd like
An overload, e.g. of OpenGraph.ParseUrlAsync(url, timeout); might be nice.

YouTube video returns no meta

When I try to get the meta data from a youtube video it is empty. It has worked before, but suddenly stopped working. Other url's works as promised.
image

If I render the html from "OriginalHtml" it returns this:
image

As said YouTube used to work, but now it doesn't - both on localhost or on the live server. Have both my IP's been blocked? Have you ever experienced that and do you have any suggestions on how to fix it?

Thanks!

Support for NETCore

Is there any roadmap date for adding support for .netcore in near future?

Deployment issue

Describe the bug
Not a bug technically speaking but I'm using OpenGraph 3.2.3 in a .NET Core 2.2 project, works well while on the dev pcs (Win2012, VisualStudio 20147). However, when published on our web server, it fails to find the DLL.
2020-06-17 17:08:43.904 +02:00 [ERR] b8d479ce-f635-4fd9-a4e3-87892fdaf998 Something went wrong: System.IO.FileNotFoundException: Could not load file or assembly 'OpenGraphNet, Version=3.0.0.0, Culture=neutral, PublicKeyToken=null'. The system cannot find the file specified. File name: 'OpenGraphNet, Version=3.0.0.0, Culture=neutral, PublicKeyToken=null' at FlairAPI.Controllers.v1.BlackListController.GetDataFromUrl(String base64Url) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) at FlairAPI.Controllers.v1.BlackListController.GetDataFromUrl(String base64Url) at lambda_method(Closure , Object , Object[] ) at Microsoft.Extensions.Internal.ObjectMethodExecutor.Execute(Object target, Object[] parameters) at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments) at System.Threading.Tasks.ValueTask1.get_Result()
at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeActionMethodAsync()
at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeNextActionFilterAsync()
at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context)
at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeInnerFilterAsync()
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResourceFilter()
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Rethrow(ResourceExecutedContext context)
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeFilterPipelineAsync()
at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeAsync()
at Microsoft.AspNetCore.Builder.RouterMiddleware.Invoke(HttpContext httpContext)
at Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware.Invoke(HttpContext context)
at Swashbuckle.AspNetCore.SwaggerUI.SwaggerUIMiddleware.Invoke(HttpContext httpContext)
at Swashbuckle.AspNetCore.Swagger.SwaggerMiddleware.Invoke(HttpContext httpContext, ISwaggerProvider swaggerProvider)
at Microsoft.AspNetCore.Cors.Infrastructure.CorsMiddleware.InvokeCore(HttpContext context)
at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
at Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware.Invoke(HttpContext context)
at FlairAPI.Middlewares.GlobalExceptionMiddleware.InvokeAsync(HttpContext httpContext)
`
I checked on de deployment folder and the DLL is there, as well as the HtmlAgilityPack.dll
Any idea ? If this is the wrong place for this kind of question, feel free to close this.

unable to get graph even with user agent specified

Hello,

We are using you package (V3.2.6) to successfully get open graph from user entered URL.
With the following URL, it doesn't work.

https://www.nasdaq.com/boardvantage/board-portal?utm_medium=ppc@utm_source=google&utm_term=boardroom%20software&gclid=Cj0KCQjwh_eFBhDZARIsALHjlKf3nzPwT2d6QCIYjHbN5UVrpwJvD0gixyDMuUI66RdSpWjfCzEqoT8aAkXfEALw_wcB

With Curl, I make it work by adding a user agent from Chrome.

curl --user-agent 'Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.88 Mobile Safari/537.36' 'https://www.nasdaq.com/boardvantage/board-portal?utm_medium=ppc@utm_source=google&utm_term=boardroom%20software&gclid=Cj0KCQjwh_eFBhDZARIsALHjlKf3nzPwT2d6QCIYjHbN5UVrpwJvD0gixyDMuUI66RdSpWjfCzEqoT8aAkXfEALw_wcB' | head -20

Even if I add a user agent in the call to
graph = await OpenGraph.ParseUrlAsync(url,userAgent);
it still doesn't work

The user agent is the one from the calling browser or
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
by default.

Can you help me?

Regards,

Michael

graph Description property missing?

Describe the bug

var graph = await OpenGraph.ParseUrlAsync(myUrl);
var description = graph.Description;

the graph object does not have a property for Description...

OpenGraph-Net v3.2.4 from Nuget

Follow HTTP redirects

Is your feature request related to a problem? Please describe.
When requests return an HTTP redirects like 301 status codes, ParseUrlAsync throws an exception

Describe the solution you'd like
set AllowAutoRedirect to true on HttpWebRequest

The remote server returned an error: (301) Moved Permanently.

Describe the bug
OpenGraph fetching thows where the target url return a 301 status code

To Reproduce
Steps to reproduce the behavior:

await OpenGraph.ParseUrlAsync("https://news.google.com/__i/rss/rd/articles/CBMigwFodHRwczovL3d3dy5zdWRpbmZvLmJlL2lkMzEzNzM4L2FydGljbGUvMjAyMS0wMS0yMi9sZS1jb21pdGUtZGUtY29uY2VydGF0aW9uLWRlYnV0ZS1kZS1ub3V2ZWxsZXMtbWVzdXJlcy1wbHVzLXJlc3RyaWN0aXZlcy1xdWlkLWRlc9IBAA?oc=5")

If this url does not work in the future, this kind of url come from : https://news.google.com/news/rss/?ned=fr_be&gl=BE&hl=fr

Expected behavior
I would have expected it fetch the OpenGraph located at :

https://www.sudinfo.be/id313738/article/2021-01-22/comite-de-concertation-de-nouvelles-mesures-pour-lutter-contre-le-covid-une

Screenshots
image

Desktop (please complete the following information):

  • OS: Linux
  • .Net 5.0
  • Version : <PackageReference Include="OpenGraph-Net" Version="3.2.4" />

Nullreference exception thrown in GetPageAsync

Describe the bug
The call to GetPageAsync in HttpDownLoader throws an exception when I try to retrieve https://marketplace.visualstudio.com/items?itemName=sdras.vue-vscode-extensionpack. It throws a NullReference exception on line 146 when executing if (response.ContentEncoding.ToLower().Contains("gzip")).

To Reproduce
HttpDownloader downloader = new HttpDownloader("https://marketplace.visualstudio.com/items?itemName=sdras.vue-vscode-extensionpack", "test", "test");
string html = await downloader.GetPageAsync();

Expected behavior
Not to throw.

Desktop (please complete the following information):

  • OS: MacOS
  • Browser Chrome
  • Version 3.1.0

Accessing the raw HTML

Would you mind providing a property that would provide the HTML response string or HtmlDocument object. The reason is, there is other metadata I'd like to pull from the response and it would be nice to not have to make another request to get it. It would be convenient to inspect the original string.

Check scheme

Would you be open to adding a scheme check at the beginning of your ParseUrl methods that accept strings? This is more of a convenience/usability enhancement so we don't have to keep checking for schemes in our code

 if (!Regex.IsMatch(url, @"^https?:\/\/", RegexOptions.IgnoreCase))
                        url = "http://" + url;

OpenGraph.ParseUrl throws NullReferenceException

Hello!
I've encountered a problem with the method ParseUrl. When I provide a URL, which leads to a not existing server, for which web browser would show This site can't be reached, ParseUrl method throw NullReferenceException with message Object reference not set to an instance of an object.

Expected behavior
In my opinion, the expected behavior is to maybe throw HttpRequestException with the message Response status code does not indicate success: 400 (Bad Request)., because with NullReferenceException there is no information what happened inside when parsing URL.

Desktop (please complete the following information)
OS: macOS
.Net 5.0
Version : <PackageReference Include="OpenGraph-Net" Version="3.2.6" />

Some websites do have og:image "html encoded"

I ran into an issue with some website having og:image "html encoded".

For instance
https://www.periscope.tv/w/1DXxyZZZVykKM
=> image when parsing is not visible...

=> just did this trick to avoid that issue (OpenGraph.cs) :
var theVal = (property ?? "").Equals("image", StringComparison.InvariantCultureIgnoreCase)
? System.Web.HttpUtility.HtmlDecode((value ?? ""))
: value;
result.openGraphData.Add(property, theVal);

Let me know if this is a good thing and if you think this could help for other cases (other websites doing that).

Build error [Visual Studio]

When i try build project i get error: Users/alex/Downloads/voat-Core/src/external/Source/ghorsey/OpenGraph-Net/CSC: Error CS8102: Public signing was specified and requires a public key, but no public key was specified. (CS8102) (OpenGraph-Net)

54372387-cdf23580-46ad-11e9-8233-7bc138d6e76b

How to fix its problem?

Thank you.

encoding issues

using nuget 1.2.0.1

-- For instance :
var ogTags = OpenGraph.ParseUrl("https://vk.com/wall-41600377_66756");
Console.WriteLine(ogTags["description"]);

=> encoding is not correct.

It seams github source code is correct, is it that nuget.org is not up to date ? Can you please push a newer version on nuget ?

Thanks.

Using OpenGraph-NET on Xamarin.Forms > string.Replace issue?

Describe the bug
I was trying to add OpenGraph-NET to my Xamarin.Forms app to do some parsing of OG data. I added the latest NuGet to my cross platform project, which is a .NET Standard 2.0 project. When calling it like so:

var graph = OpenGraph.ParseUrl(item.ReferenceUrl);

I get the following error:

image

I can trace this to the conditional compilation part here: https://github.com/ghorsey/OpenGraph-Net/blob/develop/src/OpenGraphNet/OpenGraph.cs#L361 I have tried to upgrade my project to .NET Standard 2.1, but this did not change the behavior. I also verified that the compiler directive is there. Feels like I'm missing something here, but I'm not quite sure what and was hoping you might know more.

To Reproduce
Steps to reproduce the behavior:

  1. Create a blank Xamarin.Forms project.
  2. Add the latest NuGet.
  3. Try to parse a URL no the first page.
  4. See error

Expected behavior
A successfully parsing OpenGraph object.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.