markwatkinson / luminous Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 5.0 3.92 MB

Accurate and powerful syntax highlighting library

Home Page: http://luminous.asgaard.co.uk

License: GNU Lesser General Public License v2.1

JavaScript 0.96% PHP 94.40% Python 1.06% CSS 3.56% ApacheConf 0.01%

luminous's People

Contributors

Stargazers

Watchers

Forkers

elkuku jasondavis nrjacker4 dictbox anil-ku-in

luminous's Issues

ruby %w operators

The Ruby %w and %W operators are used to define arrays of strings separated by whitespace. They should have their contents highlighted as a series of strings, rather than one long string. This makes a visual difference in the styles where strings have a background colour.

web python (Django etc)

Similarly to PHP and Ruby/Rails, we should also support a Python which embeds in HTML documents.

Todo: investigate if there are multiple tag formats that need observing (Django will have one, are there others?)

Rounded corners

When trying to set rounded borders on the outer-most luminous div element, one (or more) of the inner elements conspires against our eye for design by poking out past the rounded corner.

php_snippet with '{' and '}'

Spent ages trying to fix this - but got nowhere.

echo \luminous::highlight('php_snippet', '{', false);
echo \luminous::highlight('php_snippet', '}', false);

expected:

1. {
1. }

result:

1. {
1. {

Exception safety

Luminous needs proper exception safety in production settings. Many exceptions are thrown for debugging purposes (for the stack trace), but in many of these cases there is probably a sensible recovery option also available.

Convert site docs to markdown

Site docs are under docs/site/* and should be converted to proper markdown for ease of maintenance.

CSS lexer will bork on @media and similar

CSS lexer will bork on things like

@media screen {
  .class { width: 100%; }
}

as it won't descend down to an arbitrary level, which is sort of correct most of the time, but there are valid special cases like this.

Specify the formatter via the class name on the command line

Just an idea I came up with while working on #34. Now that Luminous is (well, will be) relying on Composers autoloader, wouldn’t it make sense to allow for specifying the formatter via the class name?
Here’s why it would be useful (IMHO):

Suppose you’re working on a project J5lx\AwesomeProject which relies on Luminous for syntax highlighting.
For this project you created a custom PDF formatter J5lx\AwesomeProject\PdfFormatter (btw, I’m not actually working on a PDF formatter, even though this might be an interesting project)
You want to use the luminous CLI to test your formatter. You do this by running vendor/bin/luminous -f J5lx\\AwesomeProject\\PdfFormatter -i test.php -o test.pdf
- In case the vendor/bin/luminous looks a bit strange to you: This is a symlink to the Luminous CLI that is created by composer in projects that require Luminous (since J5lx/luminous@674c03f77fc11ce54bc3e7c2803e582a549f7048)
By combining this with evince’s autoreload feature you can now efficiently test your PDF formatter!

Of course this would also be nice for scanners, however some extra work would be required to make it work for those as well, since there needs to be a way to specify name and description of the language.

What do you think about this?

SQL cache optimisation

If I recall correctly, the SQL cache runs a clean-up query every time it's read from (or was it written to?). In any case, running this so often is going to slow things down unnecessarily, and it might be better for large caches if we reserved a single row to store the date of the last clean-up and only run it once a day or so.

Language guessing

It would be useful if Luminous was able to make a guess at the language of a piece of source code. Pygments implements this by having each of its lexers implement a method which takes a string of source code and returns a probability value, which seems like a sensible way to go about it.

re-factor web languages into a generic scanner with subscanners

The web scanners are a bit convoluted and repetitive.

PHP, Ruby and Django could in theory each share a generic scanner, which takes as subscanners: a web (PHP/Ruby/Django) scanner and a HTML scanner.

The code for context switching between the server side language and the client side language would be generic, it would just be the server-side scanner that would change. If I recall correctly, PHP and Rails are already factored much like this anyway and there's no reason they shouldn't be merged.

[idea] Modify the Luminous codebase to follow PSR standards, be more Composer-compatible, and use PHP 5.3

While working with the Luminous codebase I noticed that It follows quite unusual code conventions, while the most PHP projects follow the PSR standards nowadays. In fact, this is what I got when I first loaded the LaTeX formatter into my highly PSR-optimized editor:

There’s a PSR-related error on almost every line of code, or in other words: The Luminous code conventions are really different from what most other projects use.

Now I’d want to know what you’re thinking about modifying the Luminous code to follow those established standards. If you have no objections, I’d be happy to do the actual code change myself. I would also make this package more Composer-compatible and -dependent, since everybody uses this tool now and it makes things a lot easier (no more require statements for every single class to use!). And finally, I’d suggest to drop support for PHP 5.2 and move on to 5.3 since all popular hosting packages support it nowadays (at least I don’t know any package that doesn’t), so we can finally use things like closures 😉

Laravel / packagist cache locations

Issue

When using luminous within Laravel 4, the cache folder does not use the default Laravel storage folder for caching. This means that luminous requires another location to be made read/write and thus possibly open up to 'hack'.

Resolution

Check to see if luminous within laravel, and if so, point the cache files at the location storage/luminous.

Laravel / Packagist CSS/JS References

Issue

When using luminous within Laravel 4, the Luminous CSS and JavaScript references use filesystem absolute references. This causes issues in that the Laravel application will manipulate them to be part of a URI rather than a filesystem reference.

eg:
-- /home/XYZ/html_home/luminous/client/luminous.js
and
-- /home/XYZ/html_home/luminous/style/luminous.js
becomes:
-- http://example.com/home/XYZ/html_home/luminous/client/luminous.js
and
-- http://example.com/home/XYZ/html_home/luminous/style/luminous.js

rather than:
-- http://example.com/js/luminous.js
and
-- http://example.com/css/luminous/luminous.js

License and PSR compilant

I got to this library searching for a redcarpet equivalent for PHP, and I think would be more appropriate to use LGPL instead of GPL license, and would be great to have it available with composer (and PSR-0 compilant at least), think about RoR and redcarpet integration, most frameworks and libraries are available at packagist, SmartyPHP, Swiftmailer, Twig, Laravel, Doctrine and Textile just to name a few

Btw, using dev branches would be better idea than suggesting not to clone the repo, so people will add or fix stuff to the stable version instead of the last "testing" version.

Thanks for such a great library

HTML scanner's 'server tags'

The HTML scanner has some 'server tags', which denote the start of a server-side language (e.g. <? for PHP). These apply downwards to any sub-language, including CSS and JavaScript.

For reasons I can't remember (therefore probably bad ones) this is taken as a plain string. For #15 we need it to be a regex (or at least an array). Changing it may break JS and CSS.

WordPress Plugin

I am really thankful for this project, I currently use the RainbowJS syntax highlighter on my WordPress sites, it is lightweight and fast but I have always preferred a NICE/GOOD backend syntax highlighter in PHP. I had tried GESHI before but it does not highlight a lot of things that the other highlighters do and it seems really bloated and not put together well.

If anyone has the time, this would be a great highlighter to create a WordPress plugin with

Embedded diff highlighting needs improvement

the diff scanner works by isolating the actual code and then passing it down to an appropriate sub-scanner, then reassembling the diff format. The code is just one long block of added/removed/unchanged lines. This stops working in the following situation:

--- something.c
+++ something.c

  /* comment open
- comment  close */
+ comment close */

As the comment's opening/closing delimiters are essentially mismatched.

This is always going to be a potential problem as diff code fragments are always going to be incomplete, but we could try taking separately both:

unchanged + added
unchanged + removed

to form two complete-ish snippets, and then trying to merge them together so that 'unchanged' is only included in the output once. It seems a little complicated when there are multiple separated add/remove blocks, but as everything can be split intwo individual lines, it shouldn't be too hard.

HTML structure and CSS is a bit of a mess

See title.

There are at least two wrapper divs, then a table (for line numbering) then pres, then spans, and the CSS doesn't necessarily follow the elements entirely logically. It needs some attention.

Trigraphs in C

C scanner should recognise trigraph sequences, e.g.

 // Will the next line be executed????????????????/
 a++;

the last '??/' is a trigraph which is synonymous with '', which means the newline is escaped so the whole block is a single line comment.

http://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C

Low priority as I expect they are rarely (if ever) used.

Preserve non-unix line endings

Luminous converts everything to unix line endings (\r\n => \n, \r=>\n) for consistency, it might be polite to convert them back to their original form again afterwards. This shouldn't be too hard but it's not completely trivial if the input is using a strange mixture of different forms.

ANSI formatter should use more compatible xterm256color escape codes

Currently the ANSI formatter uses the 'closest match' escape codes from ISO-8613-3 which are supported by very few terminal emulators (only GNOME Terminal, xterm and Konsole IIRC). Instead, the formatter should rely on nothing but the more compatible xterm256color codes by default. However those codes would require lots of expensive color distance calculations, so 'closest match' support should still be available through an option.

I’m going to work on this myself very soon.

Ruby heredoc

Ruby's heredoc allows unusual constructs like the following:

# a comment should look like this
some_function("arg1", arg2 + 2, <<ARG3, arg4 + 4, "arg5") # not like this
arg3
ARG3

which renders:

Currently the Ruby scanner goes into heredoc mode as soon as it sees the '<<DELIMITER' and the rest of the line is immune to highlighting (aside from other heredoc declarations). The Ruby scanner's main loop structure is pretty ugly and needs refactoring anyway, but it should also be possible to delay heredoc mode until EOL.

Proposal: option to store cache in SQL

It would be nice if the cache could use an SQL database instead of the filesystem.

Rationale: The cache currently stores one highlight per file on the filesystem. In some setups this could potentially lead to the cache directory containing thousands and thousands of tiny files, which might cause problems on some filesystems (running out of inodes), particularly if they were formatted without expecting this to occur. In any case, it seems neater to put it all in a database if there's one available.

Caveats: there's SQL beyond MySQL. This should support at least MySQL and Postgresql. PHP might have some abstract interface but I've never noticed one before.

edit: it might be best to implement this by callback functions and have the user define a function which actually sends the SQL to their database.

Perl heredoc slightly broken

Reproduce with:

some_function(<<EOF, 2);
argument 1
EOF

Expected: the heredoc string consists of 'argument 1'
Actual: the heredoc string begins right after <<EOF.

This is a similar problem/solution to issue #1

Do you still maintain this?

Simple question. The last commit was more than a year ago, so I’d like to know whether you aren’t maintaining this anymore or there’s just nothing to do.

options/settings documentation

The options/settings model (luminous::set() and luminous::setting()) is hard to document properly because it's all dynamic and Doxygen can't pick out the valid options from the options array's keys.

Solution: implement a new class (struct) with hard each valid option coded as an attribute name which can be documented inline.

This won't change the external API or calling procedure as it is all hidden behind set() and setting() anyway.

cache: support for other RDBMSs than MySQL

The cache supports using a MySQL table as a storage location. It should also support at least PostgreSQL as well.

Currently MySQL specifics are:

setting MyISAM in the creation table (because it's a lot faster than InnoDB for our use case)
Using INSERT IGNORE to write cache misses (there's a race condition otherwise and the DB will complain about duplicate keys and possibly break the page)

PCRE and backtracking

On whatever the default PHP version was on kubuntu 10.04, luminous was passing the ifuzz testing with flying colours. I've just upgraded to 11.04 and I can't run it for more than a few seconds without it spewing errors due to a lot of expressions hitting the backtracking limit.

$ php --version
PHP 5.3.5-1ubuntu7 with Suhosin-Patch (cli) (built: Apr 17 2011 13:58:11)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

I don't currently know if this is a poorly configured default in Ubuntu 11.04 or a PCRE change that we should be adapting to.

Distinction in embedded languages (HTML/JS/PHP/etc)

Sometimes the visual styling of embedded languages are such that it's not always clear at a glance which language any particular line belongs to. Some amount of visual redesign of the stylesheets might fix this, but it might be necessary to do something more radical - introduce language specific styling or something.

HTML already has a specific 'HTMLTAG' token type. It might also help to introduce some CSS-specific tokens and highlighting styles to at least get embedded CSS looking distinct.

Remove LUMINOUS_DEBUG flag

Replace the LUMINOUS_DEBUG flag with a log. Make sure the log is checked by tests, and make it a requirement that the log is empty of errors for tests to pass.

See comments here:
#31 (comment)
#31 (comment)

markwatkinson / luminous Goto Github PK

luminous's People

Contributors

Stargazers

Watchers

Forkers

luminous's Issues

Issue

Resolution

Issue

Recommend Projects

Recommend Topics

Recommend Org