Anonymous ID: 10f409 Feb. 13, 2019, 12:25 p.m. No.5158844   🗄️.is 🔗kun   >>8866 >>8947 >>9427

>>5158195 (lb)

>>5158120 (lb)

 

Regarding the DWS link not resolving… looped over all entries that matched "wasserman" and issued HEAD requests. Here are the results.

 

Any pattern? honest question, I'm more concerned with getting everything into a flat csv atm.

 

http://clerk.house.gov/public_disc/financial-pdfs/2008/8138631.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2009/8143095.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2010/8147726.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2011/8205095.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2011/8202785.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2012/8210420.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2012/8205716.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2013/9103053.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2013/10001512.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2014/10006472.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2014/30001021.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2015/10010309.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2015/20000883.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2015/20003402.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2015/20003667.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2015/20003704.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2015/30001821.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2016/10015737.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2016/20004858.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2016/20005420.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2016/20006076.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2016/20006428.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2016/30002876.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2017/10023759.pdf =HTTP/1.1 200 OK

http://clerk.house.gov/public_disc/financial-pdfs/2017/20006521.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20006593.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20006753.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20006950.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20006986.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20006999.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20007425.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20007584.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20007716.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20008273.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20008277.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20008298.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/20008457.pdf =HTTP/1.1 404 Not Found

http://clerk.house.gov/public_disc/financial-pdfs/2017/9113171.pdf =HTTP/1.1 200 OK

Anonymous ID: 10f409 Feb. 13, 2019, 12:28 p.m. No.5158866   🗄️.is 🔗kun   >>8947 >>9427

>>5158844

 

Over 17,000 entries were processed the same exact way to generate these direct links to the US Rep financial PDFs (from the raw .txt in the .ZIPs).

 

So I find it odd that some links from the same year (2017) return HTTP 200 and others return HTTP 404.

Anonymous ID: 10f409 Feb. 13, 2019, 12:38 p.m. No.5158947   🗄️.is 🔗kun   >>9427

>>5158844

>>5158866

 

Spotted a pattern, URL changes based on the typeof filling.

 

Bare with me anons… need to figure out the mapping from "FilingType" column to url-slug.

 

General URL form seems to be:

 

http://clerk.house.gov/public_disc/<map_FilingType_to_slug>/<Year>/<DocID>.pdf

Anonymous ID: 10f409 Feb. 13, 2019, 1:20 p.m. No.5159427   🗄️.is 🔗kun   >>9445 >>9450 >>9462

>>5158844

>>5158866

>>5158947

 

>>5157325 A Call to Codefags/Shovels

 

Any other anons looking at this stuff?

 

Could use a second set of eyes, been awake a long time at this point… is there something to this or do I just desperately need sleep, kek.

 

Seeing oddities on the US House Reps Financial Disclosures search page (http://clerk.house.gov/public_disc/financial-search.aspx) -vs- in the full year .ZIPs they provide.

 

 

Take Raul Ruiz for example (randomly sampled):

 

  • the search returns 2 records for 2016 (ruiz-a.png), yet the .ZIP contents says there should be 3 (ruiz-b.png)

 

  • the search returns 18 records for last name Ruiz in California wo year specified (ruiz-b.png), yet the .ZIP contents says there should be 21 (ruiz-d.png)