mfenniak / pypdf Goto Github PK
View Code? Open in Web Editor NEWPure-Python PDF Library; this repository is no longer maintained, please see https://github.com/knowah/PyPDF2/ insead.
Home Page: https://github.com/knowah/PyPDF2/
License: Other
Pure-Python PDF Library; this repository is no longer maintained, please see https://github.com/knowah/PyPDF2/ insead.
Home Page: https://github.com/knowah/PyPDF2/
License: Other
Example: from pyPdf import PdfFileWriter, PdfFileReader output = PdfFileWriter() input1 = PdfFileReader(file("document1.pdf", "rb")) # add page 1 from input1 to output document, unchanged output.addPage(input1.getPage(0)) # add page 2 from input1, but rotated clockwise 90 degrees output.addPage(input1.getPage(1).rotateClockwise(90)) # add page 3 from input1, rotated the other way: output.addPage(input1.getPage(2).rotateCounterClockwise(90)) # alt: output.addPage(input1.getPage(2).rotateClockwise(270)) # add page 4 from input1, but first add a watermark from another pdf: page4 = input1.getPage(3) watermark = PdfFileReader(file("watermark.pdf", "rb")) page4.mergePage(watermark.getPage(0)) # add page 5 from input1, but crop it to half size: page5 = input1.getPage(4) page5.mediaBox.upperRight = ( page5.mediaBox.getUpperRight_x() / 2, page5.mediaBox.getUpperRight_y() / 2 ) output.addPage(page5) # print how many pages input1 has: print "document1.pdf has %s pages." % input1.getNumPages()) # finally, write "output" to document-output.pdf outputStream = file("document-output.pdf", "wb") output.write(outputStream)
Use pyPdf read and write the attached file.
The adobe reader will report 110 error.
Adding "" in delimiterCharacters fix the issue (end of stream)
Found it when reading portfolio
Seems like this fellow has added the ability to insert Javascript snippets to pyPdf:
http://blog.rsmoorthy.net/2012/01/add-javascript-to-existing-pdf-files.html
Suspect he used this code here:
http://blog.didierstevens.com/programs/pdf-tools/#make-pdf
Seems like a useful addition to pyPdf!
new release listed on home page, but setup.py still refers to 1.12
I have been looking at the documentation and code for pyPdf and I cannot figure out how to go from the outline to the page it links to. Is there a way to iterate over the outline and get the page number it references so that it can be passed to the getPage() method? I am trying to split a large pdf file into smaller ones based on the outline.
In generic.py on line 727, it's like that:
def getHeight(self):
return self.getUpperRight_y() - self.getLowerLeft_x()
And should be like that (the "x" change to "y" at the end):
def getHeight(self):
return self.getUpperRight_y() - self.getLowerLeft_y()
Also both getWidth()
and getHeight()
output should be wrap in abs()
like that:
def getHeight(self):
return abs(self.getUpperRight_y() - self.getLowerLeft_y())
And we could add properties to make the thing more pleasant:
@property
def width(self):
return self.getWidth()
@property
def height(self):
return self.getHeight()
Create an empty StringIO and call the pdf reader on it. It will loop in the readNextEndLine calls before the %%EOF check in read.
even if there were not that many commits since 1.12, can you please release a new version? That'll make it much easier getting the updates in Fedora. Otherwise I have to work with a git checkout which I try to avoid.
At least with the pdf I'm looking at, the TD operator is used to move from the end of one line to the start of another. This is ignored by extractText(), so if one line ends with the last letter of a word, and the next line begins with the first letter of a word, then these two characters are also immediately adjacent in the resulting text, producing a new "word" that is not present in the document.
A specific case I'm seeing is a line ending with "phase" is followed by a line beginning with "insufficiency", so what is included at that point in the resulting string is "phaseinsufficiency", a non-word that does not, in fact, occur in the document. I'm using the result in full text search, so this is problematic, in that a search for "phase" or for "insufficiency", or, in fact, for "phase insufficiency", will fail.
I have a patch (if needed) which adds "TD" to the operators extractText() processes, which checks to see if the y operand (operands[1]) is non-zero, whether text is non-empty, and whether text ends with a non-whitespace character. If all this is true, a newline gets appended to text. This works, and is sufficient to my needs.
Since this is a change in behavior, I have also added an argument to extractText() called split_on_y_change with a default value of False, making the default behavior the old behavior. One could do something similar for x changes and vertical languages, but I don't know enough about such languages to propose the details.
Let me know if you want my patch, and whether you can accept a unified diff somewhere, or whether you need a pull request.
Bill
Hi,
Thanks for the great function additions in 1.13, especially the "merge page" functions. I'm however having difficulty using them with cropped files.
Suppose I want to rid myself of unwanted text on one side of a page, and so apply a crop:
page.mediaBox.upperRight = (page.mediaBox.upperRight[0]/2,
page.mediaBox.upperRight[1])
Now, let us merge this page onto a newly created blank page using:
newblank = PageObject.createBlankPage(None, 612,792)
newblank.mergePage(page)
Unfortunately, in the merged page "newpage", all the cropped region in "page" get displayed again, i.e. the mergePage function does not honor the mediaBox/cropBox of the file being merged. Can this be fixed?
Thanks!
Soum
Hi! I'm Biszak Előd, I'm a hungarian developer, I've been using pyPdf and realized there's a bug int the ASCII85Decode class' decode function. When c=='z' the variable x doesn't increment, so the function remains in an infinite loop.
elif c == 'z':
assert len(group) == 0
retval += '\x00\x00\x00\x00'
continue
should be:
elif c == 'z':
assert len(group) == 0
retval += '\x00\x00\x00\x00'
x += 1
continue
Don't know if anyone else has run into this, but ExtractText() seems to loop infinitely on certain files, and even then, only certain pages on those files. Even left over a 3-day weekend, it remains stuck. I've attached a short sample script illustrating a workaround for whomever comes after me in search of a solution. It uses a timeout argument on the multiprocessing module's Process object.
#this is a workaround for an infinite loop bug in pyPdf
from pyPdf import PdfFileReader
from multiprocessing import Process, Queue
def get_highest_page_number(pdf_path):
pdf_handle = file(pdf_path, "rb")
pdf_file = PdfFileReader(pdf_handle)
if pdf_file.getIsEncrypted():
pdf_file.decrypt("")
highest_page_number = pdf_file.getNumPages()
pdf_handle.close()
return highest_page_number
def get_page_text(pdf_path, page, que):
pdf_handle = file(pdf_path, "rb")
pdf_file = PdfFileReader(pdf_handle)
if pdf_file.getIsEncrypted():
pdf_file.decrypt("")
pdf_page = pdf_file.getPage(page)
page_text = pdf_page.extractText()
pdf_handle.close()
que.put(page_text)
def read_pdf(pdf_path):
pages_top_limit = get_highest_page_number(pdf_path)
for page in range(0, pages_top_limit):
page_text_que = Queue()
page_text_process = Process(target = get_page_text, args = (pdf_path, page, page_text_que))
page_text_process.start()
page_text_process.join(10)
if page_text_process.is_alive():
page_text_process.terminate()
raise RuntimeError
else:
page_text = page_text_que.get()
def main():
pdf_path = "file.pdf"
read_pdf(pdf_path)
if __name__ == "__main__":
main()
I don't like having to re-open the handle for every page, but I really don't see another option at present.
I have a collection of PDFs that contain a line of NUL and space characters on the line after the %%EOF marker. The current technique for identifying the %%EOF fails on these PDFs because the 'while not line' check on line 704 of pdf.py (the start of the read() method on PdfFileReader) isn't sufficient to identify this line of NUL and spaces as something worth ignoring.
What about PEP8 compliant API (under_scores instead camelCase etc.)?
readStringFromStream() fails to create a string object
if a text object like below was given.
BT 1 0 0 1 0 1.9 Tm /F3+0 8.6 Tf 10.5 TL (\376\377 ) Tj T* ET
readStringFromStream() decodes (\376\377 ) to a string '\xfe\xff\x20'.
createStringObject() checks first 2 bytes of the string,
and will attempt to decode with UTF-16.
Then an exception will be raised because '\x20' is illegal as UTF-16.
Apparently, a text "\376\377" should not be treated as BOM.
BOM check would be a conformance of "Text Strings" described in PDF Reference,
but it should be applied only to the "text string" type item specified in PDF Reference.
I'm having error when using pdf with Layers:
Traceback:
File "G:\python-education\pdfinfo.py", line 16, in
print name, inFile.getNumPages()
File "build\bdist.win-amd64\egg\pyPdf\pdf.py", line 431, in getNumPages
File "build\bdist.win-amd64\egg\pyPdf\pdf.py", line 607, in _flatten
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 165, in getObject
File "build\bdist.win-amd64\egg\pyPdf\pdf.py", line 649, in getObject
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 67, in readObject
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 531, in readFromStream
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 58, in readObject
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 153, in readFromStream
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 67, in readObject
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 531, in readFromStream
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 67, in readObject
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 531, in readFromStream
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 58, in readObject
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 153, in readFromStream
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 67, in readObject
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 531, in readFromStream
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 52, in readObject
File "build\bdist.win-amd64\egg\pyPdf\generic.py", line 339, in readStringFrom
Stream pyPdf.utils.PdfReadError: Unexpected escaped string
If I merge Layers in that pdf - all works good
AFAIK there is no interface in PdfFileWriter to set the document meta data (like pdf title). This is a problem for example in the example on your home page (http://pybrary.net/pyPdf/) where the final pdf does not have a title anymore.
I have a PDF document that seems to get stuck in an infinite loop in the "while True" clause of generic.NameObject.
I added the empty string "" to the tuple of NameObject.delimiterCharacters to fix this issue. Don't know if it's the right solution, but it seems to break the infinite loop perfectly.
I'm in way over my head here...kind of feel like the blind pig that found an acorn. Anyway, I'm trying to process a PDF that contains the following items:
10 0 obj
/DeviceGray
endobj
The problem is that when the line "/DeviceGray" is read, tok = stream.read(1) does not seem to advance the file pointer. (I checked by looking at the value of stream.tell() before and after the stream.read())
I don't know why the pointer does not get advanced, but making the code look like this fixes the problem, and things seem to move along just fine.
while True:
pre_read = stream.tell() # new
tok = stream.read(1)
if tok.isspace() or tok in NameObject.delimiterCharacters or stream.tell() == pre_read:
stream.seek(-1, 1)
break
name += tok
return NameObject(name)
I can provide a copy of the PDF to someone if they want an example. (Note to self: this is 98421_SupLegal 2008-02 Stmt_p83_r8.pdf)
Is there anyway to render a single PDF page to an image using PIL with pyPdf? Thanks
Hi,
Today I've installed pyPdf 1.13 for PyPy 1.6 using easy_install.
It doesn't work, but the bug fix is increadibly simple. Just change line 200 of pyPdf/generic.py.
original one:
int.init(value)
bug fix:
super(int, self).init(value)
Sorry for not directly contributing patch, but I'm new to github.
BTW, the error that I got was:
Traceback (most recent call last):
File "app_main.py", line 53, in run_toplevel
File "crack_passwd.py", line 11, in
reader = PdfFileReader(file('ZAJECIA5-PRZYROWNANIE_SEKWENCJI.pdf', 'rb'))
File "/Users/tomek/pypy-1.6/site-packages/pyPdf/pdf.py", line 374, in init
self.read(stream)
File "/Users/tomek/pypy-1.6/site-packages/pyPdf/pdf.py", line 732, in read
num = readObject(stream, self)
File "/Users/tomek/pypy-1.6/site-packages/pyPdf/generic.py", line 87, in readObject
return NumberObject.readFromStream(stream)
File "/Users/tomek/pypy-1.6/site-packages/pyPdf/generic.py", line 236, in readFromStream
return NumberObject(name)
File "/Users/tomek/pypy-1.6/site-packages/pyPdf/generic.py", line 220, in init
int.init(value)
Now it's fixed!!!
Cheers,
paparazzo
Steps to Duplicate:
def encrypt(in_stream, out_stream, user_password, owner_password=None):
"""
Encrypt an existing PDF file (stream)
`in_stream` stream with pdf data
open(filename, 'rb')
`out_stream` stream where output will be written
open(filename, 'wb')
`user_password` the password used for limited access
`owner_password` the password used for full access (defaults to user_password)
I copied this from /sm/script/encryptPdf.py
"""
reader = PdfFileReader(in_stream)
writer = PdfFileWriter()
for i in range(reader.getNumPages()):
writer.addPage(reader.getPage(i))
writer.encrypt(user_password, owner_password)
writer.write(out_stream)
hey folks :)
on some files generated by Microsoft Reporting Service i get one of the following errors using this script:
from pyPdf import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input1 = PdfFileReader(file("infile.pdf", "rb"))
output.addPage(input1.getPage(0))
outputStream = file("outfile.pdf", "wb")
Traceback (most recent call last):
File "/backup/print/municipality stara zagora/110228/Aitos_1/test.py", line 20, in
output.write(outputStream)
.....
File "/usr/local/lib/python2.6/site-packages/pyPdf/generic.py", line 232, in readFromStream
return NumberObject(name)
ValueError: invalid literal for int() with base 10: ''
or using another approach (loading pages in array and then saving them):
Traceback (most recent call last):
File "/backup/print/municipality stara zagora/110228/municipality stara zagora pdf combine 110228 start.py", line 60, in
outpdf.write(outfile)
.....
File "/usr/local/lib/python2.6/site-packages/pyPdf/pdf.py", line 545, in getObject
self.stream.seek(start, 0)
ValueError: I/O operation on closed file
where the file is (of course) not closed
i workaround it resaving the file using pdftk like this:
from pyPdf import PdfFileWriter, PdfFileReader
import shlex, subprocess
pdftkcommand = 'pdftk infile.pdf cat output fixed_infile.pdf'
args = shlex.split(pdftkcommand)
subprocess.call(args)
output = PdfFileWriter()
input1 = PdfFileReader(file("fixed_infile.pdf", "rb"))
output.addPage(input1.getPage(0))
outputStream = file("outfile.pdf", "wb")
but only when using last pdftk version (1.44 - 1.41 produces blank pdf) - i guess this is what pdftk guys have fixed:
1.43 - September 30, 2010
Fixed a stream parsing bug that was causing page content to disappear after merge of PDFs generated by Microsoft Reporting Services PDF Rendering Extension 10.0.0.0.
unfortunately i can't provide the broken file as contents are confidential
hope this helps :)
georgi
I try the Example from README in python 3.1. there are two issues:
here is an diff to solve the second issue
--- ../../old_pdf.py/pdf.py 2009-10-15 10:56:54.000000000 +0200
+++ pdf.py 2010-05-12 18:19:45.000000000 +0200
@@ -47,7 +47,7 @@
from .generic import (readObject, DictionaryObject, DecodedStreamObject,
NameObject, NumberObject, ArrayObject, IndirectObject,
ByteStringObject, StreamObject, NullObject, TextStringObject,
createStringObject, BooleanObject)
createStringObject, BooleanObject, RectangleObject)
Some pyPdf users noticed problems with whitespacing. As an example http://bugs.debian.org/563443 . I (the pyPdf maintainer in Debian) am including the patch proposed in that bug. But clearly deeper recoding is needed.
I'm not an expert on the PDF file format but I think that PDF files contains a "/Page" instruction for each page in it, and this is visible even if the file is protected.
Also, there is the "/Type /Pages" instruction that give a "/Count" of the number of pages of the document that is visible even on a protected file too.
So why is the getNumPages method so complicated? What am I missing?
I think there's a little problem in the PdfFileWriter class' _sweepIndirectReferences function. There's a list called self.stack where the indirect references that we've already seen are stored. I suppose that it is used so that we don't sweep the same indirect reference over and over again. However in the function after it's sweeped once it is removed from self.stack, I don't see the point of that. If there are lots of objects referencing the same object ( for example if we copy the Logical Structure of the pdf as well, many objects reference the same page object wich is quite expensive to sweep ) mantaining it in self.stack could mean significant improvement in time.
if data.pdf == self:
if data.idnum in self.stack:
return data
else:
self.stack.append(data.idnum)
realdata = self.getObject(data)
self._sweepIndirectReferences(externMap, realdata)
self.stack.pop()
return data
I think it should be:
if data.pdf == self:
if data.idnum in self.stack:
return data
else:
self.stack.append(data.idnum)
realdata = self.getObject(data)
self._sweepIndirectReferences(externMap, realdata)
return data
Needed to convert pdf to allcaps in init.py
i have some code :
import pyPdf
def getPDFContent():
content = ""
# Load PDF into pyPDF
pdf = pyPdf.PdfFileReader(file(pathToPdf, 'rb'))
# Iterate pages
print pdf.documentInfo
for i in range(0, pdf.getNumPages()):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + " \n"
# Collapse whitespace
content = u" ".join(content.replace(u"\xa0", u" ").strip().split())
return content
f = open(pathToTxt,'w+')
f.write(getPDFContent())
f.close()
where pathToPdf and pathToTxt it is absolute path to the files.
but i got error :
Traceback (most recent call last):
File "C:/Users/will/Desktop/coding/mytest.py", line 21, in
print pdf.getPage(14)
File "C:\Python\lib\site-packages\pyPdf\pdf.py", line 450, in getPage
self._flatten()
File "C:\Python\lib\site-packages\pyPdf\pdf.py", line 607, in _flatten
self._flatten(page.getObject(), inherit, **addt)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 165, in getObject
return self.pdf.getObject(self).getObject()
File "C:\Python\lib\site-packages\pyPdf\pdf.py", line 649, in getObject
retval = readObject(self.stream, self)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 67, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 531, in readFromStream
value = readObject(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 67, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 531, in readFromStream
value = readObject(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 67, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Python\lib\site-packages\pyPdf\generic.py", line 534, in readFromStream
raise utils.PdfReadError, "multiple definitions in dictionary"
pyPdf.utils.PdfReadError: multiple definitions in dictionary
mergePage function is slow. Needing more speed, I have written a modified version mergePage3 which is much faster when you merge pages from the same file (up to 200x faster) and faster also when you merge pages from different files. I can share the code if you are interested.
The basic idea : mergePage uses StreamContent to get the content of a page. But this class always starts the parseContentStream function even when this is not needed, and this function is time consuming.
mergePage3 parses the content only when really needed. Result is :
On a test file of 55 pages, if I put two pages on a sheet (booklet), with mergePage, it takes 34 seconds, with mergePage3 it takes 0.4 second. (I consider here only the time needed for mergePage, not the generation of the output file.
If you are interested, I can share the code.
I use pyPdf with Python 3.2 on my Windows machine and just got some errors I could resolve:
first one was the opening of a PDF file with the PdfFileReader. I used the code
file = open("PATH_WITH_FILE_AND_EXTENSION", "rb")
doc = PdfFileReader(file)```
The second thing I discovered was that the pdf.py misses the RectangleObject import from the generic.py file. So just add it.
I got the above Error while trying to extractText() by iterating through the pages in a PDF document created with Acrobat Distiller.
Traceback:
Original Traceback (most recent call last):
File "/Users/ulo/.virtualenvs/zrbackend/src/django/django/template/debug.py", line 71, in render_node
result = node.render(context)
File "/Users/ulo/.virtualenvs/zrbackend/src/django/django/template/defaulttags.py", line 155, in render
nodelist.append(node.render(context))
File "/Users/ulo/.virtualenvs/zrbackend/src/django/django/template/debug.py", line 87, in render
output = force_unicode(self.filter_expression.resolve(context))
File "/Users/ulo/.virtualenvs/zrbackend/src/django/django/template/init.py", line 546, in resolve
obj = self.var.resolve(context)
File "/Users/ulo/.virtualenvs/zrbackend/src/django/django/template/init.py", line 687, in resolve
value = self._resolve_lookup(context)
File "/Users/ulo/.virtualenvs/zrbackend/src/django/django/template/init.py", line 722, in _resolve_lookup
current = current()
File "/Users/ulo/.virtualenvs/zrbackend/lib/python2.5/site-packages/pyPdf/pdf.py", line 1035, in extractText
content = ContentStream(content, self.pdf)
File "/Users/ulo/.virtualenvs/zrbackend/lib/python2.5/site-packages/pyPdf/pdf.py", line 1117, in init
stream = StringIO(stream.getData())
File "/Users/ulo/.virtualenvs/zrbackend/lib/python2.5/site-packages/pyPdf/generic.py", line 636, in getData
decoded._data = filters.decodeStreamData(self)
File "/Users/ulo/.virtualenvs/zrbackend/lib/python2.5/site-packages/pyPdf/filters.py", line 237, in decodeStreamData
raise NotImplementedError("unsupported filter %s" % filterType)
NotImplementedError: unsupported filter /LZWDecode
Example code:
from pyPdf import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input1 = PdfFileReader(file("1.pdf", "rb"))
for i in range(input1.getNumPages()):
output.addPage(input1.getPage(1))
outputStream = file("document-output.pdf", "wb")
output.write(outputStream)
outputStream.close()
Example pdf file can be found here (it's a paper named Sequential hashing: A flexible approach for unveiling significant patterns in high speed networks).
I have attached an invalid pdf file.
when I wanted to open that pdf file,
I have faced with long time to read a pdf(more than 1 hour).
seems a bug is in PdfFileReader.
this is my test to reproduce the bug on pypdf2==1.26.0 and python 3.6:
from PyPDF2 import PdfFileReader
f = open('file1.pdf', 'rb')
p = PdfFileReader(f) # in this line we will be wait a long
[x] Bug (Typo)
signifigance
, however expect to see significance
.preceeding
, however expect to see preceding
.optionnal
, however expect to see optional
.keywrods
, however expect to see keywords
.matricies
, however expect to see matrices
.heigth
, however expect to see height
.enviroment
, however expect to see environment
.dimentions
, however expect to see dimensions
.dictionnary
, however expect to see dictionary
.Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md
To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR. Alternatively if the fix is undesired please
close the issue with a small comment about the reasoning.
https://github.com/timgates42/pyPdf/pull/new/bugfix_typos
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.