@CommonCrawl Reg. twitter.com/tomayac/status…, found this: groups.google.com/d/msg/common-c…. Could you update the examples in github.com/commoncrawl/co…?
Some @CommonCrawl files in s3n://aws-publicdatasets/common-crawl/parse-output/segment/*/*.arc.gz ↵ 403 Forbidden. Just me? CC: @hfmuehleisen