Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH - Add support for custom skills #47

Closed
wants to merge 11 commits into from

Conversation

yonglin-wang
Copy link

Fixes #46

Please see custom_skills_updates.md for a comprehensive walkthrough, questions, and future directions.

Feel free to search for "?" in the documentation to jump to actionable items/questions.

Thanks for the wait and looking forward to hearing your thoughts 😸

@AnasAito AnasAito requested a review from Badr-MOUFAD May 12, 2022 08:24
@Badr-MOUFAD Badr-MOUFAD self-assigned this May 12, 2022
@Badr-MOUFAD Badr-MOUFAD changed the title Support for Custom Skill Lists ENH - Add support for custom skills May 12, 2022
@Badr-MOUFAD Badr-MOUFAD added the enhancement New feature or request label May 12, 2022
@Badr-MOUFAD
Copy link
Collaborator

Thanks for your interest @yonglin-wang. I highly appreciate your suggestions.

Before, let me clarify some important things:

Suppose that we approved the EPL Skill database and merged it to the codebase. And later on, another contributor suggested another one, and we merged it, and so on and so forth... we will end up in the end with a very huge codebase (json, classes, and their docs) that is chaotic and difficult to maintain in the future. We better keep our codebase as small as possible.

Hence, EMSI skill database will be the only database supported by default in skillner as it contains many skills that were gathered and approved by experts, and plus, we enriched some of them manually.

Using a custom skill database is something user-specific. So what I suggest, is to define a user-friendly pipeline that creates the skills bundles so that to use them with SkillExtractor, which is, somehow, what you have suggested.

With that being said, instead of implementing classes, abstract classes, or abstract methods, ... (let's keep things as simple as possible), I suggest adopting a sequential point view

 flowchart LR;

o(x) -->| list, path, url| A(Load data) --> B(create dict with id, name)
B --> C{other processing...}
C -->D(save bundles)
Loading

Let's us start first by implementing these pieces and then combine them to form a complete pipeline.

Note on work organization:

  • make sure to push frequently so that we can follow with you. Indeed, I was a bit overwhelmed with 11 commits, 14 file changes, ... This will help us keep up with you and intervene whenever necessary
  • Don't reinvent the wheel. For instance, we have already implemented a cleaner class, so you can use it instead of creating your own. This saves as from retesting things over and over again.
  • Try to have a code as readable as possible, Indeed, I struggled a bit to get what you are trying to do. Also, make sure to follow a certain convention in docstrings, in our case we use the numpydoc, instead of putting sentences as functions' descriptions. It help us understand your class/function without having to read its implementation.

import json
from posixpath import dirname
# installed packs
#
# my packs
from skillNer.network.remote_db import RemoteBucket

# TODO needs to be updated to accommodate custom skill lists with different skill types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you!

We will change that in this PR to enable custom support of skill databases

# installed packs
import pandas as pd
from pandas import json_normalize

# TODO this script is not used
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script was used to fetch the EMSI skill database.
It will be used later to update the database as it is updated frequently

return os.path.join(save_dir, filename)


def _clean_skill(self, skill_raw_title: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Cleaner instead

}


if __name__ == "__main__":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to put the example in a notebook as it is done in the sandbox

@AnasAito AnasAito closed this Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Extracting Skills from Custom Skill Lists
3 participants