Git Product home page Git Product logo

Comments (45)

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024 30

This comment is to let other developers know that now I am seriously considering support for font subsetting in the library. It looks like it will not happen in the next SkiaSharp release. I will try to investigate this topic during December 😁

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024 15

In the QuestPDF 2022.8 release, I am planning to change the default font from Calibri to Lato. Lato is an open-source, free for commercial use font created by Polish author Łukasz Dziedzic. The font will be distributed with the library as embedded resource and part of the dll file / nuget package. This way, as long as you use the default font, you have it available on all environments. Also, the font is around 20x smaller in size, this should reduce PDF file size (1.57 MB -> 74 KB).

Of course there is caveat, this font does not contain more advanced glyphs, e.g. for Arabic/Chinese/Japanese languages, or for advanced unicode formatting. For such cases, you still need to use a font with proper support.

This solves two issues:

  1. Makes sure that your code works on all environments (in terms of font availability). You don't get exceptions from a family of "Calibri is not available on Linux by default", etc.
  2. The average PDF output size will descrease drastically for most projects. In many cases, this will reduce the need of font subsetting. Of course, in the average case.

from questpdf.

Pietervdw avatar Pietervdw commented on May 12, 2024 10

We ran into this problem with large files being created depending on the font we chose. In this case the client needed to use the Calibri font which resulted in a +2MB file. Using the following technique we were able to bring the file size down to below 100KB.

  1. Go to https://everythingfonts.com/subsetter (There is a cost involved if you want to convert large fonts)
  2. Upload the file & select the following options:
    a) Basic Latin
    b) Uppercase
    c) Lowercase
    d) Numerals
    e) Basic Punctuation
    f) Currency Symbols
  3. Generate and download the created file
  4. Use FontForge. File > Open & open the created subset font file
  5. Select Element Menu > Font Info
  6. Create a new name for the font in the following fields e.g CalibriSmall
    a) Fontname
    b) Family Name
    c) Name for Humans
  7. Click Ok. When prompted, overwrite the GUID for the font.
  8. File Menu > Generate Fonts
    a) Set Type to TrueType
    b) Click Generate

Install the newly created font on your system and in you code use the font name CalibriSmall

Hope this helps!

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024 9

Hello :) Indeed, this may be problematic in some cases. As you have noticed, this is a result of the default behaviour of font embedding in SkiaSharp. In the official Skia documentation, there is an article regarding design decisions behind it.

We can’t assume that an arbitrary font will be available at PDF view time, so we embed all fonts in accordance with modern PDF guidelines.

In this article, it is also mentioned that Chrome uses font subsetting. Currently, I have no idea how to implement such functionality myself, nor investigated it - I am open to receive any help in this regard. The SFNTLY library seems to be outdated.

Many fonts, especially fonts with CJK support are fairly large, so it is desirable to subset them. Chrome uses the SFNTLY package to provide subsetting support to Skia for TrueType fonts.

In this thread, the maintainer of SkiaSharp is responding regarding this particular issue. As far as I see, he is aware and wants to make it right. Something is moving forward.

It is also mentioned that with HarfBuzz it may be possible to perform font subsetting. I am not sure if this is supported in HarfBuzzSharp though at this moment.

As a single person working on the library in my spare time, I am not able to go outside of SkiaSharp support right now, any Skia limitations are also QuestPDF limitations. It is sad but I already spent a couple of hundreds of hours on this project without any benefit (I don't expect any), therefore I want to stay in the current scope and provide the best experience I can. If a flag in SkiaShap is available to disable font embedding / or there is a known alternative solution, I can provide its support within a release :)

Also, please be cautious when using iTextSharp. It is offered as AGPL software, meaning you cannot use it for free in commercial products.

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024 7

I'm thrilled to announce that we're on track to solve this significant issue in the upcoming release of QuestPDF. The problem of excessively large PDF output files has been a major concern since our initial release. Addressing this issue is now our top priority.

This enhancement is a key part of a larger effort, and you can follow our progress in detail here: QuestPDF Improvement Discussion.

Your feedback and support in this journey are invaluable to us. Stay tuned for more updates!

from questpdf.

jcl86 avatar jcl86 commented on May 12, 2024 6

When I started experimenting with the library, I was also concerned about the size of the ouput files, and I think it could be useful to share the workaround I used. It allows me to reduce the file size of the pdf from 1Mb to less than 100kb.

My aprouch here was to compress the output pdf file that questpdf is generating. Firstly, I looked at Ghostscript.NET, a wrapper library for .NET around the Ghostscript library. Ghostscript itself is a cli tool that allows you to compress significantly pdf files, among other functions related with pdf. (AGPL or commercial license, like itextsharp)

Unfortunately, this .NET wrapper is not compatible with .NetStandard, so I chose to embed directly the binaries of ghostcript and call to the library, like this example. You provide an input filepath, and a compressed pdf file is generated in the output filepath.

So, you could generate your pdf document with questpdf, and then, compress it with ghostcript:

  string hash = Guid.NewGuid().ToString().Substring(0, 6);
  var temporalPath = $"{hash}.pdf";
  var outputPath = $"{hash}output.pdf";
  document.GeneratePdf(temporalPath);
  
  CompressionExtensions.CompressPDF(temporalPath, outputPath, compressionOption);
  
  var result = File.ReadAllBytes(outputPath);
  
  File.Delete(temporalPath);
  File.Delete(outputPath);

To get the binaries of ghostscript for windows, you may have to install it and then, copy the binaries from program files folder to your solution.

I have only tested it in windows environment, but I suppose it could work in linux, because you can also install ghostcript in linux.
Otherwise, I don't know what kind of performance would have this with great volume of documents, but for my purpose of rendering one or two pdfs at a time, it works nicely.

from questpdf.

staterium avatar staterium commented on May 12, 2024 6

@Pietervdw thanks for pointing me in the right direction to get my PDFs small as well. I wish to add some lessons I learned during this process, for those who encounter the same problems:

  • The CalibriSmall subset that Pieter has in his linked repository works for plain text, but I also needed bold and italic glyphs, so I paid for EverythingFonts and created CalibriSmallBold and CalibriSmallItalic subsets as well. With @MarcinZiabek's permission I can post those subsets here for anyone that needs them.

  • Using 3 separate font files for regular, bold and italic text means you can't use QuestPDF's built-in .Bold() or .Italic() methods. Instead, you need to register and use the 3 fonts individually. For example:

  //import bold font file
  FontManager.RegisterFont(File.OpenRead("Resources/CalibriSmallBold.ttf"));
  //bold TextStyle
  public TextStyle FontBold => TextStyle.Default.FontSize(8).FontFamily("CalibriSmallBold");
  //usage
  row.RelativeItem().Text("some text").Style(FontBold);
  • Finally, if you are planning to deploy this to Azure Functions, you can't just copy the TTF files to your output directory on build and read them with File.OpenRead. Rather, you need to set the build action to Embedded Resource and register the fonts with:
  var assembly = Assembly.GetExecutingAssembly();
  using var streamBold = assembly.GetManifestResourceStream("{PROJECT_NAMESPACE}.{FOLDER}.CalibriSmallBold.ttf"); 
  FontManager.RegisterFont(streamBold);

from questpdf.

ignoreswing avatar ignoreswing commented on May 12, 2024 6

@MarcinZiabek I want to express my gratitude for your excellent project, especially for its commercial usability. I'm really impressed with the Fluent interface offered by the package.

In my case, I needed to convert a DataTable into a PDF table for user downloads. The DataTable contains dynamic data, including both Chinese and English text. Everything went smoothly during development until I encountered a frustrating problem: a massive 22MB file (just 2 pages) when generating the PDF. I quickly identified that the major file size issue was caused by embedding the Microsoft Jhenghei and Microsoft Jhenghei Bold fonts in the PDF output.

To address this, I decided to abandon bold formatting and switch to the Noto Sans Traditional Chinese font, which reduced the file size to 4MB. However, font files still accounted for 99% of the size.

螢幕擷取畫面 2023-10-04 154844

@garryxiao provided a font subsetting solution (.NET 6↑), but unfortunately, it wasn't compatible with my project's environment. My environment consists of .Net Framework 4.7.2 + WebAPI2 + QuestPDF 2022.12.6.

After some hesitation on whether to revert to using iTextSharp 4.1.6, I finally found a potential solution, which I'll share below.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Web.Hosting;
using System.Windows.Media;

/// <summary>
/// Get Subset from a font file
/// </summary>
public class FontSubset
{
    private static readonly Uri FontFileUri = new Uri(HostingEnvironment.MapPath("~/Resource/NotoSansTC-Light.ttf"));
    private readonly string charset = string.Empty;
    private readonly string tempFilePath = string.Empty;

    /// <summary>
    /// Constructor
    /// </summary>
    /// <param name="charRequired">Adding more required characters</param>
    public FontSubset(string charRequired)
    {
        HashSet<char> uniqueChars = new HashSet<char>("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 -_:、.()/'[]\\\"" + charRequired);
        this.charset = new string(uniqueChars.ToArray());
        this.tempFilePath = Path.Combine(HostingEnvironment.MapPath(@"~/Resource/"), "tmpSubfont_" + Path.GetRandomFileName() + ".ttf");
    }

    /// <summary>
    /// Generate Font subset
    /// </summary>
    /// <returns>tmp sub font path</returns>
    public string CreateSubSet()
    {
        GlyphTypeface glyphTypeface = new GlyphTypeface(FontFileUri);
        ICollection<ushort> index = new List<ushort>();
        byte[] sourceTextBytes = Encoding.Unicode.GetBytes(charset);
        char[] sourceTextChars = Encoding.Unicode.GetChars(sourceTextBytes);
        int sourceTextCharVal;
        int glyphIndex;

        for (int sourceTextCharPos = 0; sourceTextCharPos < sourceTextChars.Length; sourceTextCharPos++)
        {
            sourceTextCharVal = sourceTextChars[sourceTextCharPos];
            glyphIndex = glyphTypeface.CharacterToGlyphMap[sourceTextCharVal];
            index.Add((ushort)glyphIndex);
        }

        byte[] filebytes = glyphTypeface.ComputeSubset(index);

        using (FileStream fileStream = new FileStream(tempFilePath, FileMode.Create))
        {
            fileStream.Write(filebytes, 0, filebytes.Length);
        }
        return tempFilePath;
    }

    /// <summary>
    /// Delete tmp file
    /// </summary>
    public void Clean()
    {
        if (File.Exists(tempFilePath))
        {
            try
            {
                File.Delete(tempFilePath);
            }
            catch (Exception ex)
            {
                Log.Error($"Delete {tempFilePath} file failed.", ex);
            }
        }
    }
}
  • How to use it
public byte[] DownloadTableInPdf(DataTable dt)
{
    // Gather the necessary characters for your use (In my case, all Chinese characters in dt)
    var fontSubset = new FontSubset("這些是除了英文數字等需要被涵蓋進字型檔讓檔案做顯示的文字。");
    var tempFilePath = fontSubset.CreateSubSet();
    var stream = File.OpenRead(tempFilePath);

    FontManager.RegisterFontWithCustomName("TempSubSetFont", stream);
    QuestPDF.Settings.CheckIfAllTextGlyphsAreAvailable = false; // Maybe no need

    var document = Document.Create(container => {
        container.Page(page => {
            page.DefaultTextStyle(x => x.FontFamily("TempSubSetFont"));
        })
        // many other settings
    }
    
    byte[] b = document.GeneratePdf();  // Clean resource after generating the pdf 
    stream.Dispose();
    fontSubset.Clean();
    
    return b;
}

The file size was reduced to below 400KB (depending on how much text needed to be embedded. My PageSize = B0, a super big display table).
The approach doesn't require any commercial licensed packages. However, there are still several risks that I cannot confirm or clarify:

  1. Initially, I intended to write data to MemoryStream (generate subset font data on the fly) without using a FileStream to physically write to a file. However, when I did this, it passed at compile time but couldn't find the TempSubSetFont fontname at runtime, and I don't know how to resolve this.
  2. I'm unsure about the impact on performance or any other risk.
  3. I repeatedly use RegisterFontWithCustomName to register TempSubSetFont, and I'm uncertain about any potential negative effects.

I hope the above approach could potentially serve as a starting point for modification, enabling an options for QuestPDF to embed the whole font file or only a subset. Please feel free to make any further improvements. In the end, thanks to the author for their selfless dedication of time and effort again. I would be greatly appreciated if the issue can be solved.

from questpdf.

Pietervdw avatar Pietervdw commented on May 12, 2024 5

Hey @andycnguyen

It can be a bit of a hit and miss :)
I'm attaching a sample project I used to test a variety of fonts and their sizes. The zip contains smaller fonts generated using the technique mentioned. Here is the output of the console app, showing the resulting PDF sizes:
image

Here is the sample project. Hope it helps!
QuestPdf_SmallSizeExample.zip

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024 3

A subset that always included the useful ascii, a-z A-Z 0-9 and normal punctuation plus whatever was needed outside that range (and maybe was then named with the md5 of itself?) I think would be an ideal balance.

This is exactly what I think. Creating a font subset is most likely a computetional intensive operation (parsing -> modifying -> writing binary data) that should be avoided if possible. In most languages, we use only limited set of characters/glyphs. Moreover, this set is easily predictable. This is a reason why the solution above works - manually generating a font subset with predicted glyphs and then reusing it across documents. However, there are languages like Chineese with thousands of usable glyhps - this is a good example where preparing a smaller font before the fact does not work.

Unfortunatelly, this introduces even more complexity. We not only need to understand different font formats and perform operations on top of them. We also need to keep cache and do some statistical analysis. So, after generating a couple of PDF documents, the library estimates what font subset is most optimal. We have glyphs, different variants (e.g. bold, italic), shapes (e.g. in Arabic) and ligatures.

THIS QUICKLY BECOMS REALLY COMPLEX 😥 And is a reason why I postpone this enormous effort hoping that solutions above meet most requirements.

@Pietervdw Thank you for preparing this instruction. I haven't tested it myself but it seems to be useful for others. That's great!

from questpdf.

mxjones avatar mxjones commented on May 12, 2024 3

@Pietervdw solution works well for this, as a further addition you can embed this font directly in your project and then load it and register it with QuestPDF (FontManager.RegisterFontType) so that you don't need to install the custom font on each machine that will use this.

from questpdf.

Pietervdw avatar Pietervdw commented on May 12, 2024 2

Ok, so in the interest of completeness. I've created a new sample project. It generates a 30 page document with random text and a table containing about 500 rows. It's roughly 5 pages of text and 23 pages of table rows.

The size comes in at around 83 bytes using the font subsets. I've not tested all elements that QuestPDF offers, so your results may vary.
questpdftest_CalibriSmall.pdf

The complete test project is available at https://github.com/Pietervdw/questpdf-sizetest

Hope this helps!

from questpdf.

andycnguyen avatar andycnguyen commented on May 12, 2024 2

@MarcinZiabek
@Pietervdw

When I followed Pietervdw's example more closely, I got comparable results (filesizes under 50kb). So I gather something about my original attempt was off, though I'm not sure what exactly. At any rate, I'm really pleased with this result. Thanks to both of you for your prompt replies.

from questpdf.

AvabAlexander avatar AvabAlexander commented on May 12, 2024 2

Ok, so in the interest of completeness. I've created a new sample project. It generates a 30 page document with random text and a table containing about 500 rows. It's roughly 5 pages of text and 23 pages of table rows.

The size comes in at around 83 bytes using the font subsets. I've not tested all elements that QuestPDF offers, so your results may vary. questpdftest_CalibriSmall.pdf

The complete test project is available at https://github.com/Pietervdw/questpdf-sizetest

Hope this helps!

This helped me a lot! Thanks! The files generated went from 1.5MB to 48KB (about 97% reduction) by using your NotoSansSmall font.

from questpdf.

pablopioli avatar pablopioli commented on May 12, 2024 1

I used QuestPDF in WebAssembly. As this is a constrained environment I couldn't embed the system fonts.

So I subsetted a font (check the license before doing this). Then I embedded the font in my app and used it to build the PDF.

The resultant file was below 200 KB (I kept only letters, numbers, and puntuation marks). This can be an alternative solution.

In Windows you can subset a font based on the text of the document, see
https://stackoverflow.com/questions/3249551/how-to-create-subset-fonts-in-net

In Linux I would use Python fonttools
https://publishing-project.rivendellweb.net/more-on-font-subsetting

Any of those alternatives could used until SkiaSharp implements a better solution.

from questpdf.

pablopioli avatar pablopioli commented on May 12, 2024 1

@odesyatnyk If you want to go this route I can assure it's like entering in a black hole.

Initially I would take a font and use a tool to subset it. Then just use the edited font. Obviously you need to know previously at least wich language you will be using. But it's the simplest option.

Use something like
https://www.fontsquirrel.com/tools/webfont-generator

And if you really want to know what it takes to build your own subsetter read
https://markoskon.com/creating-font-subsets/

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024 1

That's great!

I am asking because drawing the Border element is not optimal. Technically, instead of drawing a single rectangle with a stroke, the library draws four rectangles for each edge. This simplifies the implemntation for edges with various thicknesses. It uses around three times more instructions than optimal - but still, it is very little. You would need to draw dozens of thousands of cells to even notice... I was considering if it plays any role but apparently not 😂

from questpdf.

garryxiao avatar garryxiao commented on May 12, 2024 1

The issue is reasonable but not acceptable. I worked out a solution to create subset font with a nuget package "com.etsoo.EasyPdf":
`
var fontFolder = Environment.GetFolderPath(Environment.SpecialFolder.Fonts);
await using var fontStream = File.OpenRead($"{fontFolder}\msyh.ttc");
await using var subset = await EasyFont.CreateSubsetAsync(fontStream, "青岛亿速思维网络科技有限公司");
FontManager.RegisterFontWithCustomName("msyh", subset);

// When create a page
page.DefaultTextStyle(x => x.FontSize(12).Fallback(f => f.FontFamily("msyh")));
`
With the font "Microsoft YaHei" subset, file size reduced to 125K from 11.8M. The only tricky thing is to generate the included characters.

Hi @MarcinZiabek, during the debug, I found font registered with "RegisterFontWithCustomName" is not prioritized. For example, a custom font with name "Microsoft YaHei" cannot override default system font "Microsoft YaHei". Do you think it is reasonable to do that? Thanks.

from questpdf.

ascott18 avatar ascott18 commented on May 12, 2024 1

If anyone wants a naive way to just strip all embedded fonts, here's an implementation that uses PdfSharp to do it. I make no promises that this won't make your document unusable.

public static void StripEmbeddedFonts(this PdfDocument document)
{
    foreach (PdfDictionary obj in document.Internals.GetAllObjects().OfType<PdfDictionary>())
    {
        if (obj.Elements.ContainsKey("/FontFile2"))
        {
            // Remove the object that contains the actual font file
            document.Internals.RemoveObject(obj.Elements.GetDictionary("/FontFile2"));
            // Remove the reference to that object.
            obj.Elements.Remove("/FontFile2");
        }
    }
}

from questpdf.

girlpunk avatar girlpunk commented on May 12, 2024 1

@mercurycat If QuestPDF is that unsuitable, you're welcome to submit a PR to fix the problem.

from questpdf.

odesyatnyk avatar odesyatnyk commented on May 12, 2024

@pablopioli i'd be really grateful if you could post some code snippets on how you achieved this 😄

from questpdf.

krupalimakadiya avatar krupalimakadiya commented on May 12, 2024

Hey @Pietervdw
I'm facing this same issue
I'm working in node js (html-pdf package)
I'm trying to add calibari font

Did you get any solution ??

from questpdf.

Pietervdw avatar Pietervdw commented on May 12, 2024

Hi @krupalimakadiya

Not sure what you mean. QuestPDF is a .NET library, I don't think it has anything to do with html-pdf...

from questpdf.

andycnguyen avatar andycnguyen commented on May 12, 2024

Hi @Pietervdw

I'm trying your approach and have managed to reduce my desired font to an 18kb woff2 subset. Using this subset does reduce the size of my pdfs from about 1.5mb to .5mb. But this is still bigger than is practical for my application, and much bigger than your reported 100kb (the contents of the pdfs is just a simple table, moreover). I was wondering if there were other steps you took to reduce your file size, or if there might be something I'm missing here.

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024

I'm trying your approach and have managed to reduce my desired font to an 18kb woff2 subset. Using this subset does reduce the size of my pdfs from about 1.5mb to .5mb. But this is still bigger than is practical for my application, and much bigger than your reported 100kb (the contents of the pdfs is just a simple table, moreover). I was wondering if there were other steps you took to reduce your file size, or if there might be something I'm missing here.

Can you please provide an example PDF file you are generating? What does is contain? Images? Complex tables?

from questpdf.

ken-sands avatar ken-sands commented on May 12, 2024

I'm looking into font subsets myself, One of the reasons is that all current tools seem to embed subset or full font, but for my use case I really want something between.
For our use we want often very similar PDFs generated throughout a day and then a process to join them all at the end. We don't want the full fonts with each of those due to the size issue (they are being transferred over the internet so it's a big difference between 10k pdfs at 200k each or 10k pdfs at 3mb each.
Normal subsets are a pain when joining as you then get 10k fonts in the result unless you run a secondary replacement of the fonts.

A subset that always included the useful ascii, a-z A-Z 0-9 and normal punctuation plus whatever was needed outside that range (and maybe was then named with the md5 of itself?) I think would be an ideal balance.
The individual pdfs would be a little bigger than they could be however tiny compared to including full unicode fonts.
99% of the time the pdfs would join and share that single font.
You can reasonably add basic text to the pdfs as needed without adding yet another font (things like a job reference and pdf number in the joined final version)

I realise I'm being quite specific above but if you're looking into it anyway maybe it's worth considering?

from questpdf.

ken-sands avatar ken-sands commented on May 12, 2024

Good points, I did say I was being quite specific for our use case, I'm UK based and there is little chance of Chinese characters in the documents we deal with daily (and whenever there is it's usually a graphic rather than text).
I suppose from my perspective if characters outside of my range turn up I'd fall back to include the full font.

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024

With @MarcinZiabek's permission I can post those subsets here for anyone that needs them.

Of course, please share your files! Just make sure that the we do not break the font license by sharing the modified version 😁

from questpdf.

staterium avatar staterium commented on May 12, 2024

Of course, please share your files! Just make sure that the we do not break the font license by sharing the

Turns out Calibri is copyrighted by Microsoft, but Google has an open source metrically compatable equivalent, Carlito. I attach the subsets for that below. Fair warning: I haven't used or tested these subsets.

carlito_small.zip

from questpdf.

spoofermark21 avatar spoofermark21 commented on May 12, 2024

We ran into this problem with large files being created depending on the font we chose. In this case the client needed to use the Calibri font which resulted in a +2MB file. Using the following technique we were able to bring the file size down to below 100KB.

  1. Go to https://everythingfonts.com/subsetter (There is a cost involved if you want to convert large fonts)
  2. Upload the file & select the following options:
    a) Basic Latin
    b) Uppercase
    c) Lowercase
    d) Numerals
    e) Basic Punctuation
    f) Currency Symbols
  3. Generate and download the created file
  4. Use FontForge. File > Open & open the created subset font file
  5. Select Element Menu > Font Info
  6. Create a new name for the font in the following fields e.g CalibriSmall
    a) Fontname
    b) Family Name
    c) Name for Humans
  7. Click Ok. When prompted, overwrite the GUID for the font.
  8. File Menu > Generate Fonts
    a) Set Type to TrueType
    b) Click Generate

Install the newly created font on your system and in you code use the font name CalibriSmall

Hope this helps!

Thanks! 👍 I used https://www.fontsquirrel.com/tools/webfont-generator as it's free without the limit. :)

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 12, 2024

Hi @MarcinZiabek, during the debug, I found font registered with "RegisterFontWithCustomName" is not prioritized. For example, a custom font with name "Microsoft YaHei" cannot override default system font "Microsoft YaHei". Do you think it is reasonable to do that? Thanks.

@garryxiao This case is interesting. As far as I see, the fonts with custom names are prioritized. Fonts are chosen based on the FontFamily name. https://github.com/QuestPDF/QuestPDF/blob/main/QuestPDF/Drawing/FontManager.cs (line 143). I am not sure why it does not work for you. I am open to discuss it further, as I agree with you regarding the desired behaviour.

I have noticed that your implementation (https://github.com/ETSOO/com.etsoo.EasyPdf) performs actual font substetting. And let me tell you, I'm impressed! It would be fantastic to somehow integrate this logic into the library, so the problem is solved once and for all. Of course, it may be slightly more complex if we want to keep the performance as high as possible. I easily imagine various strategies and caches.

Your implementation could be a great starting point for such an effort. Do you think it is reasonable, if I use your work in the nearest future? I know that your code is under the MIT license but I still would like to get your approval.

My goal is to dedicate more time for the library this year, and solve biggest pains like this one. I am considering various options but hopefully everything will go well 😁

from questpdf.

garryxiao avatar garryxiao commented on May 12, 2024

Hi @MarcinZiabek, first off, feel free to use whatever is there. I agree it's a short-term solution. Long term should be integrated with the library and collect the characters automatically. Amazing work you have done. Good luck. :)

For the font priority, I guess it's caused by the fallback. I found when fallback applied, the font size setting for a specail line also failed. If I remove the fallback and set as default font, then it works.

from questpdf.

garryxiao avatar garryxiao commented on May 12, 2024

@MarcinZiabek I will keep the idea improving (Just added a feature to remove unnecessary fonts in the TTC file before embedding) but currently there is an issue may need your support. It seems the same issue with #162. If I load the subset stream with FontManager.RegisterFont, the bold / italic styles have no effect. If I use the original static font from System Fonts, then it works again. Tricky issue and also the only issue blocks my to put the feature in production. Thanks.

from questpdf.

YePiaoRan17 avatar YePiaoRan17 commented on May 12, 2024

If anyone wants a naive way to just strip all embedded fonts, here's an implementation that uses PdfSharp to do it. I make no promises that this won't make your document unusable.

public static void StripEmbeddedFonts(this PdfDocument document)
{
    foreach (PdfDictionary obj in document.Internals.GetAllObjects().OfType<PdfDictionary>())
    {
        if (obj.Elements.ContainsKey("/FontFile2"))
        {
            // Remove the object that contains the actual font file
            document.Internals.RemoveObject(obj.Elements.GetDictionary("/FontFile2"));
            // Remove the reference to that object.
            obj.Elements.Remove("/FontFile2");
        }
    }
}

Thank you, solved my problem.

from questpdf.

garryxiao avatar garryxiao commented on May 12, 2024

Hi @MarcinZiabek, during the improvement of the sub-font idea. There is something we may correct in the code (QuestPDF.Drawing.FontManager)
`private static void RegisterFontType(SKData fontData, string? customName = null)
{
foreach (var index in Enumerable.Range(0, 256))
{
var typeface = SKTypeface.FromData(fontData, index);

            if (typeface == null)
                break;
            
            var typefaceName = customName ?? typeface.FamilyName;

            var fontStyleSet = StyleSets.GetOrAdd(typefaceName, _ => new FontStyleSet());
            fontStyleSet.Add(typeface);
        }
    }`

StyleSets.GetOrAdd should be changed to StyleSets.AddOrUpdate, make more sense and reflect the intention. Then you can update the font stream with same name. Otherwise always return the first added one.

from questpdf.

mercurycat avatar mercurycat commented on May 12, 2024

如果有人想要一种天真的方法来去除所有嵌入的字体,这里有一个使用 PdfSharp 来实现的实现。我不保证这不会使您的文档无法使用。

public static void StripEmbeddedFonts(this PdfDocument document)
{
    foreach (PdfDictionary obj in document.Internals.GetAllObjects().OfType<PdfDictionary>())
    {
        if (obj.Elements.ContainsKey("/FontFile2"))
        {
            // Remove the object that contains the actual font file
            document.Internals.RemoveObject(obj.Elements.GetDictionary("/FontFile2"));
            // Remove the reference to that object.
            obj.Elements.Remove("/FontFile2");
        }
    }
}

This approach is not suitable for bare-metal and web document publishing。questpdf Not suitable for web applications at the moment, the documentation is too large. There is no subsubset。

from questpdf.

mercurycat avatar mercurycat commented on May 12, 2024

This QuestPDF is not suitable for bare-metal and web document publishing。Not suitable for web applications at the moment, the documentation is too large. There is no subsubset。

from questpdf.

garryxiao avatar garryxiao commented on May 12, 2024

@mercurycat QuestPdf has no subset currently is correct. But It's not correct to achieve smaller size with Web. With the library "com.etsoo.EasyPdf" done by me, I can generate one page Chinese PDF only 50KB in Etsoo's online service.

from questpdf.

mercurycat avatar mercurycat commented on May 12, 2024

@mercurycat QuestPdf has no subset currently is correct. But It's not correct to achieve smaller size with Web. With the library "com.etsoo.EasyPdf" done by me, I can generate one page Chinese PDF only 50KB in Etsoo's online service.

Every Chinese font library has n*Mb. If 10000 pieces are generated on the Internet every day, it will take 70Gb of traffic. The waste is too serious, so the current questpdf is not suitable for the Internet. With ghostscript compression after the general only 10%, can only barely meet the requirements.

from questpdf.

AvabAlexander avatar AvabAlexander commented on May 12, 2024

@mercurycat Have you tried creating your own font for your need to see if it does reduce the size? #31 (comment)

from questpdf.

mercurycat avatar mercurycat commented on May 12, 2024

Local apps with or without subsets have limited impact, but the Internet has a larger impact. At present the author is developing a subset of words, looking forward to the function online. You can only compress the pdf size by a third party during this time. Chinese fonts are very large.

from questpdf.

mercurycat avatar mercurycat commented on May 12, 2024

The previous itext5 code simply changed the font to a non-embedded font, so if the user's client did not download the pdf file with these fonts, it would not be able to punch or garble. So questpdf includes a subset of embedded fonts for use in a wider range of applications.

from questpdf.

mercurycat avatar mercurycat commented on May 12, 2024

questpdf is the best pdf layout component I've seen so far. That's what I've been looking for.

from questpdf.

hrsh avatar hrsh commented on May 12, 2024

As a temporary workaround, I use GenerateImages to create an image of the whole file instead of a PDF, then generate a PDF and add generated image to it.
If I generate PDF, the size is 11.7MB in case of using the Chinese language, but if I use generate image, the final size is 350KB.

from questpdf.

Nepcix avatar Nepcix commented on May 12, 2024

@ignoreswing thank you, a awesome method
the key to use it is how to find all used char in pdf, i think i well write a extension method to wrap Text(), use a set to store them

from questpdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.