Git Product home page Git Product logo

resume-ranker's Introduction


Functionality

A multiplatform Kotlin application which scans the files from a specified directory (which as per intention, should contain CVs/resumes) for any number of specific keywords, reads the ones with .doc/.pdf extensions and subsequently enlists the keywords each file contains and the strength of it (given by the keyword count), in addition to telling apart the resume with the highest score. I built this early on with the mindset to help recruiters to rank resumes from a large candidate pool. Not the most ideal approach, but this is generally one of the steps in filtering out resumes by automated systems.


I/O

The program takes three inputs via a Compose-based graphical user interface:

  • A string of whitespace separated keywords.
  • The directory in which the resumes are stored locally for the user.
  • The location where a separate file containing the results for the session will be saved.

When provided with these, it displays the total weightage of each resume and the best one among the lot with respect to the keyword-based search - all in a text box within the flexible GUI, and additionally within the user-specified file that contains the entire trace of the run for future reference (sort of enacting as a log).

Here's a short video demonstrating a run with a bunch of authentic resumes:


Notes

Resumes other than the ones in the typical formats (i.e., docs and pdfs, including LaTeX ones) although not facilitated here, can be supported - one has to first mention the other extension(s) in the file filter code segment (the easy part) and then proceed to write the corresponding 'specific-to-that-format'-handling text extractor block(s), using some well-defined library (unless you want to do that yourself). For instance, I used Apache POI and PDFBox libraries to handle .doc and .pdf file formats respectively.


License

Going with Apache for this one.

resume-ranker's People

Contributors

anirban166 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

resume-ranker's Issues

Refactor codebase

Can understand that I built this solo in a 24hr hackathon and that it works, but darn. Does this code need refactoring.

Caught up in a bunch of work at the moment and not sure if I can have time for this anytime soon (this being one on the future todo from a brain dump), but some stuff to look into and do:

  • Avoid redundancy. File filter looks fine, but the main code has unnecessary repetition for the .pdf and .doc case handler blocks.
  • Use a better GUI for I/O.

Heck, might as well do it in Kotlin if I make a go for it. Could be a good way to learn the language and see what's different than this verbose big decimal centered lang.

Security issues with Log4j (SQL Injection, Deserialization of Untrusted Data, etc.) as detected with the Maven build

Striked x4, as follows:

  • By design, the JDBCAppender in Log4j 1.2.x accepts an SQL statement as a configuration parameter where the values to be inserted are converters from PatternLayout. The message converter, %m, is likely to always be included. This allows attackers to manipulate the SQL by entering crafted strings into input fields or headers of an application that are logged allowing unintended SQL queries to be executed. Note this issue only affects Log4j 1.x when specifically configured to use the JDBCAppender, which is not the default. Beginning in version 2.0-beta8, the JDBCAppender was re-introduced with proper support for parameterized SQL queries and further customization over the columns written to in logs. (Apache Log4j 1.2 reached end of life in August 2015. Users should upgrade to Log4j 2 as it addresses numerous other issues from the previous versions)
  • JMSSink in all versions of Log4j 1.x is vulnerable to deserialization of untrusted data when the attacker has write access to the Log4j configuration or if the configuration references an LDAP service the attacker has access to. The attacker can provide a TopicConnectionFactoryBindingName configuration causing JMSSink to perform JNDI requests that result in remote code execution in a similar fashion to GHSA-fp5r-v3w9-4333. Note this issue only affects Log4j 1.x when specifically configured to use JMSSink, which is not the default. (Apache Log4j 1.2 reached end of life in August 2015. Users should upgrade to Log4j 2 as it addresses numerous other issues from the previous versions)
  • JMSAppender in Log4j 1.2 is vulnerable to deserialization of untrusted data when the attacker has write access to the Log4j configuration. The attacker can provide TopicBindingName and TopicConnectionFactoryBindingName configurations causing JMSAppender to perform JNDI requests that result in remote code execution in a similar fashion to GHSA-jfh8-c2jp-5v3q. Note this issue only affects Log4j 1.2 when specifically configured to use JMSAppender, which is not the default. (Apache Log4j 1.2 reached end of life in August 2015. Users should upgrade to Log4j 2 as it addresses numerous other issues from the previous versions)
  • Included in Log4j 1.2 is a SocketServer class that is vulnerable to deserialization of untrusted data which can be exploited to remotely execute arbitrary code when combined with a deserialization gadget when listening to untrusted network traffic for log data. This affects Log4j versions up to 1.2 up to 1.2.17. (Users are advised to migrate to org.apache.logging.log4j:log4j-core)

Avoid counting multiples for a keyword

Initially missed out on this, but I'm only finding how many given keywords are found, totally ignoring the occurrence or frequency past one for each keyword in a resume, since candidates can have the same keyword repeated in different spots within the file as opposed to others just mentioning it once, and this way, the highest ranked resume won't be accurate due to the dupe counts (which should be invalidated).

At most, the highest score should be the number of keywords themselves. (i.e., the best case scenario when a candidate has all of the required specified skills)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.