Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable importing of CSV data files #201

Open
visakha opened this issue Jan 1, 2024 · 4 comments
Open

enable importing of CSV data files #201

visakha opened this issue Jan 1, 2024 · 4 comments

Comments

@visakha
Copy link

visakha commented Jan 1, 2024

Ingesting structured data is also a big requirement. companies have good understanding of their structured data already, they want the data to work for them and Aryn could be a channel to enable that.

solution
PDF Ingestion should be like a plugin - I have not read the code yet, but if it is, then the same plugin architecture could be adopted for other file formats, in this case csv.
if not then, I will be more than happy to work under guidance to implement it.

Alternatives
No alternatives

Additional context
Say that a client has customer data in a CRM system that is RDBMS backed, now we want to put some conversational intelligence into that space, how do we that. We would first export tables into CSV files 1:1 and then use Sycamore to ingest it and build relationships between the CSV files to Accelerate known knowledge.

@bsowell
Copy link
Contributor

bsowell commented Jan 2, 2024

Hi @visakha. Thanks for the feedback! We definitely agree that structured data is super important in this space and we welcome suggestions on the best way to incorporate it.

We do have a JSON reader (

class JsonScan(FileScan):
) that might give some flavor for how this could work. I could see a CSV reader working similarly -- you specify which field to use as the "main content" (text_representation in Sycamore terms), and then read the rest as properties. Does This seem like it would work for your use case?

@bsowell
Copy link
Contributor

bsowell commented Jan 2, 2024

Since you mentioned that that data originates in an RDBMS, another question is whether it would be useful to have connectors directly to the database rather than doing an intermediate CSV export.

@alexaryn
Copy link
Contributor

alexaryn commented Jan 8, 2024

If we end up doing CSV, we should add TSV at the same time. It's trivial and a lot easier to work with.

@visakha
Copy link
Author

visakha commented Jan 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants