Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to have all supported languages available at runtime? #25

Open
NachoBrito opened this issue May 1, 2022 · 3 comments
Open

How to have all supported languages available at runtime? #25

NachoBrito opened this issue May 1, 2022 · 3 comments

Comments

@NachoBrito
Copy link

I'm trying to use this library in a multilingual environment. I have function that receives the raw text and a language name as parameters, then loads the right language package and return sentences.

As only english is loaded by default, my test fails for all other languages, which I expected. But then I tried to run "make spanish" in the project folder and had two different errors:

  • First one, a permission error since data/spanish.json is readonly (installed with go get ...)
  • Then I ran with sudo, which worked fine. But my test fails with this error:

gopkg.in/neurosnap/sentences.v1/data

/Users/***/go/pkg/mod/gopkg.in/neurosnap/[email protected]/data/spanish.go:18:6: bindataRead redeclared in this block

Could you give me some indication on how to compile all supported language packages so they are available to choose at runtime?

Thanks!

@neurosnap
Copy link
Owner

Greetings! It's been awhile since I've worked on this project and to be quite honest almost all usage has been with english. I'll spend a little bit of time this week trying to figure out how to get other languages loaded properly and fix any issues I notice. I also welcome any PRs if you find something that needs to be fixed in order to get this to work.

Thanks!

@neurosnap
Copy link
Owner

neurosnap commented Jun 19, 2022

Greetings! Sorry it took me so long to respond. So I think I need to update the readme to better reflect how a user would use the library. For the CLI I embed the json data directly into the binary so it's a single executable. This doesn't really help library developers.

Could you try this and see if it works for you?

package main

import (
        "fmt"
        "os"

        "github.com/neurosnap/sentences"
)

func main() {
        text := "..." // test text

        b, _ := os.ReadFile("path/to/spanish.json") // download this file from this repo

        // load the training data
        training, _ := sentences.LoadTraining(b)

        // create the default sentence tokenizer
        tokenizer := sentences.NewSentenceTokenizer(training)
        sentences := tokenizer.Tokenize(text)

        for _, s := range sentences {
                fmt.Println(s.Text)
        }
}

If this works then you should be able to load all the languages. You just need to download the json files and save them so you can reference them in your go code. Then you can call LoadTraining for each language you want to support.

@robinbraemer
Copy link

sentences could expose the training language files via Go embed so library devs simply use a function or global variable to data.German e.g.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants