COLLECTED BY
The Open Syllabus collection contains WARC files from a mid-2021 crawl of about 50 million unique seed URLs extracted from the Open Syllabus version 2.6 dataset and their page requisites. The bulk of the seed URLs are from ".com", ".org", ".edu", and ".uk" TLDs.
Crawl Summary
- Crawl start: 2021-04-12
- Crawl end: 2021-09-05
- Seed URLs: 49,735,419
- Archived URLs: 338,690,414
- Collection Size: 25 TB
- Crawler: Heritrix/3.3.0-hq1-SNAPSHOT-2015-03-16T18:09:23Z
- Crawl depth: maxHops=0
Seed Summary
- Unique URLs: 49,735,419
- Unique Canonical URLs: 48,956,395
- Unique Hosts: 984,223
- IPv4 Addresses: 3,328
- Unique TLDs: 21,761
- Unique IANA Valid TLDs: 739
- Wayback Machine URLs*: 6,568,213
* NOTE: More than 13% URLs in the dataset point to Wayback Machine!
The Wayback Machine - https://webcf.waybackmachine.org/web/20210413042724/http://opendatahandbook.org/
Open Data Handbook
Guides, case studies and resources for government & civil society
on the "what, why & how" of open data.
Open Data Guide
This guide discusses the legal, social and technical aspects of open data. It can be used by anyone but is especially designed for those seeking to open up data. It discusses why to go open, what open is, and the how to 'open' data.
Start Reading
Value Stories
Use cases, stories and case studies highlighting the social and economic value, the impact and the varied applications of open data from cities and countries across the globe.
Value Stories
Resource Library
A curated collection of open data resources, including articles, longer publications, how to guides, presentations and videos, produced by the global open data community.
Open Data Resources