In some project (mainly that specifically target linux platform) a file without extens

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Looks neat 👍 And unless my eyes fail me, the <code class="notransla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the feedback <a class="user-mention notranslate" data-hovercard-type="user"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Detect language of file based on shebang about coala-quickstart HOT 11 CLOSED

coala commented on May 25, 2024 3

Detect language of file based on shebang

from coala-quickstart.

Comments (11)

adtac commented on May 25, 2024 1

@hemangsk sure, but please describe how you'd do this before you write the code in case any of us have suggestions/modifications - it's much easier for both sides! :)

Assigning you 👍

from coala-quickstart.

jayvdb commented on May 25, 2024 1

get_language_from_hashbang return value can be memorized.
But performance is not a consideration, as this is run once per project lifetime typically.

from coala-quickstart.

adtac commented on May 25, 2024 1

Looks neat 👍

And unless my eyes fail me, the data.close() is outside the with open(...) as data ;) I know, this is just a prototype. Just saying :P

from coala-quickstart.

hemangsk commented on May 25, 2024

Hey! can I take this up?

from coala-quickstart.

hemangsk commented on May 25, 2024

@adtac Thanks!
I figured this solution that in coala-quickstart > generation > Utilities.py > get_extension(), split_by_language(), These functions have a similar task to separate the given files based on language and extensions. So Inside the loop which iterates through the list of project_files, we can add to call to new utility functions get_language_from_hashbang() and get_extension_from_hashbang().
These will read the contents from first line of extension-less file and then parse the string in it to see if the string starts '!#', confirming its a hashbang, we can obtain the language that is being used in that file and hence the extension from the exts dictionary/ the pygments approach [https://github.com/coala/coala/pull/3162].

Like for string on first line be,

first_line = '!#bin/bash'
lang = first_line[5:]
ext = exts[lang]

will it be the right approach and can be worked upon?

from coala-quickstart.

jayvdb commented on May 25, 2024

Sounds good. get_language_from_hashbang will be the interesting/challenging part. Would be good if you can describe how you will do that.

from coala-quickstart.

adtac commented on May 25, 2024

One more thing to look into is #!/usr/bin/env python - that should have the same effect as a #!/usr/bin/python shebang ;)

from coala-quickstart.

hemangsk commented on May 25, 2024

sorry for the delay! Here's the approach I've come up for get_language_from_hashbang(),
In the coala-quickstart > generation > Utilities.py

def split_by_language(project_files):
    lang_files = defaultdict(lambda: set())
    for file in project_files:
        name, ext = os.path.splitext(file)
        if ext in exts:
            for lang in exts[ext]:
                lang_files[lang.lower()].add(file)
                lang_files["all"].add(file)

       # Check for hashbang

        elif name and not ext:
            with open(file, 'r') as data:
                hashbang = data.readline()
                if(re.match('/(^#![(a-z)|\/]*[ ][a-z]*)|(^#![(a-z)|\/]*)/', hashbang)):
                    language = get_language_from_hashbang(hashbang)
                try:
                    for ext in exts:
                          for lang in exts[ext]:
                                 if(language == lang):
                                       lang_files[lang.lower()].add(file)
                                       lang_files["all"].add(file)
                except KeyError:
                   # Handling error                       
           data.close()
    return lang_files

And get_language_from_hashbang(hashbang)

def get_language_from_hashbang(hashbang):
    if(re.match('(^#![(a-z)|\/]*[ ][a-z]*)', hashbang)):
        language = hashbang.split(' ')[1]
    elif(re.match('(^#![(a-z)|\/]*)', hashbang)):
       language = hashbang.split('/')[-1]
    return language

Shortcomings in this approach which I've figured out till now and I'm working on are,

Regex can be improved using (Backtracing?)
Nested for loop is used in try block and it is not time efficient

from coala-quickstart.

hemangsk commented on May 25, 2024

Thanks for the feedback @jayvdb @adtac :) I'm on it

from coala-quickstart.

sils commented on May 25, 2024

@hemangsk any news?

from coala-quickstart.

hemangsk commented on May 25, 2024

I'll do the second iteration today asap :)

…

On Feb 2, 2017 3:56 AM, "Lasse Schuirmann" ***@***.***> wrote: @hemangsk <https://github.com/hemangsk> any news? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMalytkn0bnisrjlys0q0hRhMb2wmmJRks5rYQaigaJpZM4K3J7b> .

from coala-quickstart.

Detect language of file based on shebang about coala-quickstart HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent