I’ve worked professionally with data for 20 years. Today I am working with literally the worst data set I have ever come across. It’s published every week by CQC. There are formatting errors or typos in either the filename or the headers every other week. Sometimes the order of the headers or the headers that are included change. It is virtually impossible to automate. For example: https://www.diffchecker.com/3Lhlr6SF/