Experienced Developer Needed for PDF Scraping and Upload to Azure
Upwork

Remoto
•9 hours ago
•No application
About
We are seeking an experienced developer to scrape approximately 377,000 PDF ombudsman decisions from the UK Financial Ombudsman Service and upload them to Azure Blob Storage. The task involves crawling result pages, downloading PDFs, and extracting the pdf's existing metadata to produce a detailed Excel report of the export. metadata already exists in the pdf properties and shouldnt be inferred. Excel report to contain 1) File name 2) Source URL 3) Blob path 4) File size 5) SHA256 hash 6) Download status 7) Download time 8) Embedded meta data properties:- a. Identifier b. Title c. Author d. Creator e. Producer f. CreationDate g. ModDate h. Subject i. Description j. Business k. IndustrySector l. ProductType m. ComplaintIssue n. Outcome o. DecisionDate p. ComplainantType It is important that the scraping is done ethically and appropriately, using a design that does not trigger blocking or overwhelm / degrade the service.




