anirban166 / resume-ranker Goto Github PK

Multiplatform application for keyword-based resume ranking

Java 25.79% Kotlin 74.21%

java file-io resume-parser keyword-detection kotlin hackathon-project

resume-ranker's Introduction

Functionality

A multiplatform Kotlin application which scans the files from a specified directory (which as per intention, should contain CVs/resumes) for any number of specific keywords, reads the ones with .doc/.pdf extensions and subsequently enlists the keywords each file contains and the strength of it (given by the keyword count), in addition to telling apart the resume with the highest score. I built this early on with the mindset to help recruiters to rank resumes from a large candidate pool. Not the most ideal approach, but this is generally one of the steps in filtering out resumes by automated systems.

I/O

The program takes three inputs via a Compose-based graphical user interface:

A string of whitespace separated keywords.
The directory in which the resumes are stored locally for the user.
The location where a separate file containing the results for the session will be saved.

When provided with these, it displays the total weightage of each resume and the best one among the lot with respect to the keyword-based search - all in a text box within the flexible GUI, and additionally within the user-specified file that contains the entire trace of the run for future reference (sort of enacting as a log).

Here's a short video demonstrating a run with a bunch of authentic resumes:

Notes

Resumes other than the ones in the typical formats (i.e., docs and pdfs, including LaTeX ones) although not facilitated here, can be supported - one has to first mention the other extension(s) in the file filter code segment (the easy part) and then proceed to write the corresponding 'specific-to-that-format'-handling text extractor block(s), using some well-defined library (unless you want to do that yourself). For instance, I used Apache POI and PDFBox libraries to handle .doc and .pdf file formats respectively.

License

Going with Apache for this one.

resume-ranker's People

Contributors

Stargazers

Watchers

Forkers

niharikacii rajmq jakarta4ew lisaborkakoti jayantaborkakoti quesinp seimens jeffreii

resume-ranker's Issues

Refactor codebase

Can understand that I built this solo in a 24hr hackathon and that it works, but darn. Does this code need refactoring.

Caught up in a bunch of work at the moment and not sure if I can have time for this anytime soon (this being one on the future todo from a brain dump), but some stuff to look into and do:

Avoid redundancy. File filter looks fine, but the main code has unnecessary repetition for the .pdf and .doc case handler blocks.
Use a better GUI for I/O.

Heck, might as well do it in Kotlin if I make a go for it. Could be a good way to learn the language and see what's different than this verbose big decimal centered lang.

Security issues with Log4j (SQL Injection, Deserialization of Untrusted Data, etc.) as detected with the Maven build

Striked x4, as follows:

By design, the JDBCAppender in Log4j 1.2.x accepts an SQL statement as a configuration parameter where the values to be inserted are converters from PatternLayout. The message converter, %m, is likely to always be included. This allows attackers to manipulate the SQL by entering crafted strings into input fields or headers of an application that are logged allowing unintended SQL queries to be executed. Note this issue only affects Log4j 1.x when specifically configured to use the JDBCAppender, which is not the default. Beginning in version 2.0-beta8, the JDBCAppender was re-introduced with proper support for parameterized SQL queries and further customization over the columns written to in logs. (Apache Log4j 1.2 reached end of life in August 2015. Users should upgrade to Log4j 2 as it addresses numerous other issues from the previous versions)
JMSSink in all versions of Log4j 1.x is vulnerable to deserialization of untrusted data when the attacker has write access to the Log4j configuration or if the configuration references an LDAP service the attacker has access to. The attacker can provide a TopicConnectionFactoryBindingName configuration causing JMSSink to perform JNDI requests that result in remote code execution in a similar fashion to GHSA-fp5r-v3w9-4333. Note this issue only affects Log4j 1.x when specifically configured to use JMSSink, which is not the default. (Apache Log4j 1.2 reached end of life in August 2015. Users should upgrade to Log4j 2 as it addresses numerous other issues from the previous versions)
JMSAppender in Log4j 1.2 is vulnerable to deserialization of untrusted data when the attacker has write access to the Log4j configuration. The attacker can provide TopicBindingName and TopicConnectionFactoryBindingName configurations causing JMSAppender to perform JNDI requests that result in remote code execution in a similar fashion to GHSA-jfh8-c2jp-5v3q. Note this issue only affects Log4j 1.2 when specifically configured to use JMSAppender, which is not the default. (Apache Log4j 1.2 reached end of life in August 2015. Users should upgrade to Log4j 2 as it addresses numerous other issues from the previous versions)
Included in Log4j 1.2 is a SocketServer class that is vulnerable to deserialization of untrusted data which can be exploited to remotely execute arbitrary code when combined with a deserialization gadget when listening to untrusted network traffic for log data. This affects Log4j versions up to 1.2 up to 1.2.17. (Users are advised to migrate to org.apache.logging.log4j:log4j-core)

Avoid counting multiples for a keyword

Initially missed out on this, but I'm only finding how many given keywords are found, totally ignoring the occurrence or frequency past one for each keyword in a resume, since candidates can have the same keyword repeated in different spots within the file as opposed to others just mentioning it once, and this way, the highest ranked resume won't be accurate due to the dupe counts (which should be invalidated).

At most, the highest score should be the number of keywords themselves. (i.e., the best case scenario when a candidate has all of the required specified skills)

anirban166 / resume-ranker Goto Github PK

resume-ranker's Introduction

Functionality

I/O

Notes

License

resume-ranker's People

Contributors

Stargazers

Watchers

Forkers

resume-ranker's Issues

Refactor codebase

Security issues with Log4j (SQL Injection, Deserialization of Untrusted Data, etc.) as detected with the Maven build

Avoid counting multiples for a keyword

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent