Tuning data enrichment models for knowledge graph extraction
Hi everyone, curious if anyone has managed to train a data enrichment models?
I am working on tuning Bert and Roberta law models to look at judgements and extract citations, persons, quotes, organisation ECT so I can build a knowledge document with inter contracting references.
Has anyone had success tuning models like this?
Are there any base models you'd recommend?
I see most on this forum are tuning LLMs - but they're pretty over powering for data extraction/enrichment
My setup I have a I have a NVIDIA GeForce RTX 4060 TI 8GB
Quen takes around 5 seconds a doc, a Bert or Roberta tuned is taking around 50ms a doc extracting entities. But accuracy is still a challenge
My data set is pre extracted fields from legal documents I extracted with scrapy