50GB worth of excel files, how to load?
Hi,
I got a task where I get hundreds of excel files, each 700-800MB of size. I cannot influence what I get so I am stuck with these files.
Things tried so far on 6 files for starters 4.5GB:
- Notebook(Python) - One file takes 30min, all 6 files it will time out.
- Copy job - I get a message that the file is too big for it :(
- Dataflow - all 6 files 24min, so to prevent timeout will probably need to build few of them and the orchestrate in pipeline.
Any suggestions on how to deal with this monster anything I am missing here? I am for now trying to put them in one table in a lake house for further data flow processing.
u/seacess — 14 hours ago