Git Product home page Git Product logo

Comments (11)

adtac avatar adtac commented on May 25, 2024 1

@hemangsk sure, but please describe how you'd do this before you write the code in case any of us have suggestions/modifications - it's much easier for both sides! :)

Assigning you 👍

from coala-quickstart.

jayvdb avatar jayvdb commented on May 25, 2024 1

get_language_from_hashbang return value can be memorized.
But performance is not a consideration, as this is run once per project lifetime typically.

from coala-quickstart.

adtac avatar adtac commented on May 25, 2024 1

Looks neat 👍

And unless my eyes fail me, the data.close() is outside the with open(...) as data ;) I know, this is just a prototype. Just saying :P

from coala-quickstart.

hemangsk avatar hemangsk commented on May 25, 2024

Hey! can I take this up?

from coala-quickstart.

hemangsk avatar hemangsk commented on May 25, 2024

@adtac Thanks!
I figured this solution that in coala-quickstart > generation > Utilities.py > get_extension(), split_by_language(), These functions have a similar task to separate the given files based on language and extensions. So Inside the loop which iterates through the list of project_files, we can add to call to new utility functions get_language_from_hashbang() and get_extension_from_hashbang().
These will read the contents from first line of extension-less file and then parse the string in it to see if the string starts '!#', confirming its a hashbang, we can obtain the language that is being used in that file and hence the extension from the exts dictionary/ the pygments approach [https://github.com/coala/coala/pull/3162].

Like for string on first line be,

first_line = '!#bin/bash'
lang = first_line[5:]
ext = exts[lang]

will it be the right approach and can be worked upon?

from coala-quickstart.

jayvdb avatar jayvdb commented on May 25, 2024

Sounds good. get_language_from_hashbang will be the interesting/challenging part. Would be good if you can describe how you will do that.

from coala-quickstart.

adtac avatar adtac commented on May 25, 2024

One more thing to look into is #!/usr/bin/env python - that should have the same effect as a #!/usr/bin/python shebang ;)

from coala-quickstart.

hemangsk avatar hemangsk commented on May 25, 2024

sorry for the delay! Here's the approach I've come up for get_language_from_hashbang(),
In the coala-quickstart > generation > Utilities.py

def split_by_language(project_files):
    lang_files = defaultdict(lambda: set())
    for file in project_files:
        name, ext = os.path.splitext(file)
        if ext in exts:
            for lang in exts[ext]:
                lang_files[lang.lower()].add(file)
                lang_files["all"].add(file)

       # Check for hashbang

        elif name and not ext:
            with open(file, 'r') as data:
                hashbang = data.readline()
                if(re.match('/(^#![(a-z)|\/]*[ ][a-z]*)|(^#![(a-z)|\/]*)/', hashbang)):
                    language = get_language_from_hashbang(hashbang)
                try:
                    for ext in exts:
                          for lang in exts[ext]:
                                 if(language == lang):
                                       lang_files[lang.lower()].add(file)
                                       lang_files["all"].add(file)
                except KeyError:
                   # Handling error                       
           data.close()
    return lang_files

And get_language_from_hashbang(hashbang)

def get_language_from_hashbang(hashbang):
    if(re.match('(^#![(a-z)|\/]*[ ][a-z]*)', hashbang)):
        language = hashbang.split(' ')[1]
    elif(re.match('(^#![(a-z)|\/]*)', hashbang)):
       language = hashbang.split('/')[-1]
    return language

Shortcomings in this approach which I've figured out till now and I'm working on are,

  • Regex can be improved using (Backtracing?)
  • Nested for loop is used in try block and it is not time efficient

from coala-quickstart.

hemangsk avatar hemangsk commented on May 25, 2024

Thanks for the feedback @jayvdb @adtac :) I'm on it

from coala-quickstart.

sils avatar sils commented on May 25, 2024

@hemangsk any news?

from coala-quickstart.

hemangsk avatar hemangsk commented on May 25, 2024

from coala-quickstart.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.