Comments (11)
It seems that readFile
use CP866 encoding on Windows (with default russian language)
from pandoc-include.
Can confirm with Windows 10 and pandoc 1.16.0.2 (pandoc and pandoc-include built on the machine using stack).
As another data point for hunting down this bug:
Take a file containing the umlauts of a, o, and u (LATIN SMALL LETTER A WITH DIAERESIS etc.)
bug.markdown
ä ö ü
Run it in cmd as
chcp 65001
pandoc bug.markdown
and you get the expected:
ä ö ü
Even if you run it as
pandoc --filter pandoc-include bug.markdown
you get the expected output.
However, if you run it as
pandoc --filter pandoc-include incl.markdown
where incl.markdown only includes bug.markdown, you get messed-up characters.
from pandoc-include.
Hi @russtone, @LarsEKrueger, thank you for your patience and the detailed issue. On a new branch I've made it possible to set the encoding to UTF-8, will this help in your case?
To use, simply add utf-8
after the include class, like so:
```include utf8
a.md
```
from pandoc-include.
Can't test the branch right now. Might take a few days until I find the time.
However, I don't understand why you want to make a difference in the encoding of the file that does the include and the one that is included.
Pandoc is UTF-8 on input and there's no way around it. One would expect that include files are UTF-8 too, without requesting them to be. If there's a relevant use case that I don't see right now, you definitely need to document that.
from pandoc-include.
@LarsEKrueger This makes it even easier, thank you!
from pandoc-include.
Tried commit 53b0d1 on Windows 10 (with Creator's Update). Pandoc and pandoc-include compiled using stack and ghc 8.
Issue is still there.
from pandoc-include.
Thank you for checking. Unfortunately I have no other idea what could be wrong.
from pandoc-include.
Oh, I see. That commit is old, and does not contain the fixes. Please try the latest on the fixing branch: 913ca87 . Thanks!
from pandoc-include.
Still doesn't work.
Could it be that the hSetEncoding
isn't evaluated due to laziness and didn't notice during testing (i.e. because your default encoding is already uft8)?
I use the following code in my filter and it does work on windows.
justReadFile :: String -> IO (Maybe [Block])
justReadFile fn = bracket (openFile fn ReadMode) hClose $ \handle -> do
hSetEncoding handle utf8
cont <- hGetContents handle
case readMarkdown def cont of
Left _ -> return Nothing
Right (Pandoc _ blocks) -> return $ Just blocks
If I use your fmap
pattern, it ceases to work correctly. The code is:
justReadFile :: String -> IO (Maybe [Block])
justReadFile fn = bracket (openFile fn ReadMode) hClose $ \handle -> do
fmap (`hSetEncoding` utf8) $ return handle
cont <- hGetContents handle
case readMarkdown def cont of
Left _ -> return Nothing
Right (Pandoc _ blocks) -> return $ Just blocks
Your variable handle
in fileContentAsString
is actually of type IO Handle
, not Handle
. Thus the fmap
typecheck correctly, but the hSetEncoding
is either run never or after the hGetContents
. It's the same reason I needed to add the return
, because fmap
wants an IO Handle
.
from pandoc-include.
Thank you, you are awesome. I've updated the branch, and really hope this will solve the bug.
from pandoc-include.
After removing the utf8 class from the include
code block, I ran the following command:
(cd test/encoding/ ; pandoc -f markdown -t html -s -o test.html --filter ../../dist/dist-sandbox-1d3e9dda/build/pandoc-include/pandoc-include test.md)
pandoc-include: include.md: hGetContents: illegal operation (delayed read on closed handle)
Error running filter ../../dist/dist-sandbox-1d3e9dda/build/pandoc-include/pandoc-include:
Filter returned error status 1
I encountered this problem too when writing justReadFile (see previous comment). I fixed it by moving the readMarkdown inside the function.
Same error happens on Windows.
from pandoc-include.
Related Issues (11)
- Doesn’t work with pandoc-citeproc HOT 1
- Problem with package csquotes HOT 2
- Nested includes don't work HOT 2
- Include only certain line numbers HOT 4
- How to install on windows? HOT 1
- Pandoc 2 Support Issues
- Couldn't match type ‘[Char]’ with ‘Data.Text.Internal.Text’ HOT 7
- Does this still work?
- 'cabal update' followed by 'cabal install pandoc-include' fails to build HOT 7
- Will you update this package to support pandoc 2.10.x? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandoc-include.