Extract Image from PDF - blurry image issues
I have a use case where i have pdf in which there is an image present in one of its page. So from that image, i want to extract data. In our system, user uploads the PDF, then we check the PDF go through it and find the specific image, now that image is blurry, and a table like format is present there. So currently in backend gpt-4.1-mini is being used to extract data from the image, but it gives lot of wrong data in the respective rows. In UI we have to extract data and show it in column and row format, so is there any way i can improve it, we are trying to reduce manual effort here, and we are also trying to show confidence score of the LLM. But even for wrong rows it gives 87-90% confidence score. I tried changing the flow - using PaddleOCR, OpenCV and tools to extract the data and provide text format to LLM, which improved extraction to some level, but there are other problems of hallucination where it brings data which not even present in the image. Is Azure document intelligence helpful here? I want some guidance on its usecase