Git Product home page Git Product logo

Comments (10)

saucecontrol avatar saucecontrol commented on May 29, 2024 1

This looks good. We're already taking the performance hit for loading up the metadata to read the Orientation tag, so it makes sense to expose the rest.

Implementation-wise, it's easy to pass the metadata queries through to WIC for now. I'll put some thought into what that might look like with a managed metadata reader.

from photosauce.

iamcarbon avatar iamcarbon commented on May 29, 2024

Made a few changes when considering pluggable metadata readers.

  • Killed the enumerator
  • Made main API surface platform agnostic and required an explicit cast to use the WIC functions

from photosauce.

saucecontrol avatar saucecontrol commented on May 29, 2024

Well, I've finally had a chance to dig into this, and I've got a few observations:

  1. Metadata is even more complicated than I thought...
  2. The Exif enumerator you proposed (and tag list you linked) would require a flattened representation of data that WIC exposes as a hierarchy of metadata blocks. For example, the GPS tags (ushort 0-30) in a JPEG image are exposed as /app1/ifd/gps/{ushort=XX} from WIC. My initial read of the docs led me to believe that those values could be queried independent of the hierarchy, as in the /{ushort=40961} example. After some testing and a re-read of the docs, I see that those are listed as relative paths, and they require drilling into the associated metadata block before they'll return any data. The full path required for that one is /app1/ifd/exif/{ushort=40961} for JPEG or /ifd/exif/{ushort=40961} for TIFF. I haven't noticed any overlap in the tag identifiers in those separate blocks, but since the Exif spec defines them as separate blocks also, I suppose there's no guarantee it would always be safe to flatten them.
  3. WIC parses each of the metadata blocks lazily, so there is actually a performance hit for parsing a block you don't need. I had noticed when I added metadata support for reading the Orientation tag to the .NET Core version of MagicScaler that the times in bleroy's benchmarks went up by a full 10% (on a complete load->resize->save operation), just for reading that tag. What I hadn't noticed until some testing this week is that there's a significant difference between querying the usual location of that tag (/app1/ifd/{ushort=274}) and the alternate location (/xmp/tiff:Orientation). I've been using the Windows Metadata Policy for that query, and it checks the XMP location if it doesn't find it in the IFD block. The XMP parsing is what accounts for the full performance difference.

So it seems like keeping the metadata block hierarchy would be the safest and most performant way to do this, but that requires fairly detailed knowledge of what you're looking for in order to construct the correct query path. Have you seen any other libraries that do this well? And given all of that, would a passthrough to IWICMetadataQueryReader.GetMetadataByName cover your needs for the time being?

from photosauce.

iamcarbon avatar iamcarbon commented on May 29, 2024

from photosauce.

saucecontrol avatar saucecontrol commented on May 29, 2024

Excellent, thanks.

It occurred to me after I commented that we could lazily enumerate the blocks (start with /ifd, then do /ifd/exif, /ifd/gps, /ifd/exif/interop, etc... but then you'd still parse blocks you don't need on the way to what you do. The beauty of the WIC query language is that you can tell it exactly which things you care about along the way. And since it caches everything it parses, there's no penalty if you query the same blocks repeatedly.

I think the ideal managed solution probably ends up looking a lot like WIC's implementation anyway, but I'd definitely be interested if you run across anything else that has a good model.

from photosauce.

saucecontrol avatar saucecontrol commented on May 29, 2024

@iamcarbon Have you seen MetadataExtractor? I'm liking what I've seen with that so far.

In some very preliminary testing, it appears to be about 2-5x faster than WIC at metadata-only reads. And it eagerly loads, exposing everything as LINQ-friendly ICollection and IDictionary interfaces. AND that eager loading can be controlled by passing in an explicit list of metadata block readers you're interested in, like such

var metadata = JpegMetadataReader.ReadMetadata(inFileInfo.FullName, new[] { new ExifReader() });
metadata.OfType<GpsDirectory>().First().TryGetRational(GpsDirectory.TagAltitude, out var altitude);

In thinking about exposing either a lazy-enumerated or interactive queryable interface from WIC, it occurs to me we'd need to keep the decoder open, so we'd have to make everything IDisposable. It kind of becomes a mess in a hurry, and I don't see ImageFileInfo ever being able to do everything MetadataExtractor can do already.

Anyway, I'm going to play more with it and see how it holds up, but I'm impressed so far...

from photosauce.

iamcarbon avatar iamcarbon commented on May 29, 2024

@saucecontrol Not sure how I missed MetadataExtractor. I just successfully ran our test cases against it and can confirm that it also beats our legacy WIC implementation in performance. I'm going to run it against a larger sample set later today and will let you know what I uncover -- but I think this may work well for us!

If this isn't blocking anyone else using the library, and MetadataExtractor solves this need, we may want to close this issue out all together.

from photosauce.

saucecontrol avatar saucecontrol commented on May 29, 2024

That's great! I'm not sure how I missed that library the last few times I looked either. I'm eager to hear how it does on your larger image set.

I think a good level of metadata support will always be important for my libraries as well, but it's great to be able to integrate with a purpose-built library for that. I had been thinking about how that might work when getting into plug-in codecs since things like libjpeg have minimal metadata support. But since MetadataExtractor can handle parsing individual blocks pulled out of the container, it seems like the architecture is well-suited for what I'd need.

from photosauce.

saucecontrol avatar saucecontrol commented on May 29, 2024

@iamcarbon Were you able to meet your immediate needs with MetadataExtractor? If so, I'll go ahead and close this issue and revisit my own metadata strategy later

from photosauce.

iamcarbon avatar iamcarbon commented on May 29, 2024

Yep! Working well!

from photosauce.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.