enable importing of CSV data files #201

visakha · 2024-01-01T16:42:02Z

Ingesting structured data is also a big requirement. companies have good understanding of their structured data already, they want the data to work for them and Aryn could be a channel to enable that.

solution
PDF Ingestion should be like a plugin - I have not read the code yet, but if it is, then the same plugin architecture could be adopted for other file formats, in this case csv.
if not then, I will be more than happy to work under guidance to implement it.

Alternatives
No alternatives

Additional context
Say that a client has customer data in a CRM system that is RDBMS backed, now we want to put some conversational intelligence into that space, how do we that. We would first export tables into CSV files 1:1 and then use Sycamore to ingest it and build relationships between the CSV files to Accelerate known knowledge.

bsowell · 2024-01-02T03:35:25Z

Hi @visakha. Thanks for the feedback! We definitely agree that structured data is super important in this space and we welcome suggestions on the best way to incorporate it.

We do have a JSON reader (

sycamore/sycamore/scans/file_scan.py

Line 161 in 33e3245

class JsonScan(FileScan):

) that might give some flavor for how this could work. I could see a CSV reader working similarly -- you specify which field to use as the "main content" (text_representation in Sycamore terms), and then read the rest as properties. Does This seem like it would work for your use case?

bsowell · 2024-01-02T03:40:33Z

Since you mentioned that that data originates in an RDBMS, another question is whether it would be useful to have connectors directly to the database rather than doing an intermediate CSV export.

alexaryn · 2024-01-08T21:44:12Z

If we end up doing CSV, we should add TSV at the same time. It's trivial and a lot easier to work with.

visakha · 2024-01-09T14:58:16Z

The reason I say CSV/ TSV (vs direct DB Conn) is the clean boundary. The integration concerns will have a clear Starting point

…

On Mon, Jan 8, 2024 at 3:44 PM Alex Meyer ***@***.***> wrote: If we end up doing CSV, we should add TSV at the same time. It's trivial and a lot easier to work with. — Reply to this email directly, view it on GitHub <#201 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACATVRALOLBZ4FZDD3GNXRLYNRSDPAVCNFSM6AAAAABBJEU5N2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBRHA3DSNBWGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable importing of CSV data files #201

enable importing of CSV data files #201

visakha commented Jan 1, 2024 •

edited

Loading

bsowell commented Jan 2, 2024

bsowell commented Jan 2, 2024

alexaryn commented Jan 8, 2024

visakha commented Jan 9, 2024 via email

enable importing of CSV data files #201

enable importing of CSV data files #201

Comments

visakha commented Jan 1, 2024 • edited Loading

bsowell commented Jan 2, 2024

bsowell commented Jan 2, 2024

alexaryn commented Jan 8, 2024

visakha commented Jan 9, 2024 via email

visakha commented Jan 1, 2024 •

edited

Loading