Difference between revisions of "IRC Quotes"
(Scraped I-Rox.com, improved the page layout) |
CoolCanuck (talk | contribs) m (→Project Status: typos fixed: german → German) |
||
(16 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{Historical}} | |||
__NOTOC__ | __NOTOC__ | ||
== What's this, then? == | == What's this, then? == | ||
Line 5: | Line 6: | ||
== Project Hosting == | == Project Hosting == | ||
Auguste is currently hosting scrapes [http://www.deaddyingdamned.com/archives/ here]. Everybody is encouraged to help mirror. | Auguste is currently hosting scrapes [http://www.deaddyingdamned.com/archives/ here]. Everybody is encouraged to help mirror. | ||
== QdbScraper == | |||
[https://github.com/dHeinemann/QdbScraper QdbScraper] is a Perl script for scraping [http://www.qdbs.org QdbS]-powered quote databases (e.g. Auguste's [http://www.deaddyingdamned.com/qdb/ ArchiveTeam QDB]). At the moment, it's only capable of scraping QdbS installations that use the default template and/or minimal customization. Support for other common templates is coming soon. | |||
== Helping Out == | == Helping Out == | ||
Line 13: | Line 17: | ||
* Each file is named 'n.txt', where 'n' is the quote's ID number | * Each file is named 'n.txt', where 'n' is the quote's ID number | ||
* All quotes should be compressed into an archive | * All quotes should be compressed into an archive | ||
* The archive name should identify the original location and date of scraping (e.g. 'QuoteIRC.com Quote Collection 2011-04-04.7z', or 'DOMAIN.TLD Quote Collection YYYY-MM-DD.EXT') | * The archive name should identify the original location (URL) and date of scraping (e.g. 'QuoteIRC.com Quote Collection 2011-04-04.7z', or 'DOMAIN.TLD Quote Collection YYYY-MM-DD.EXT'). | ||
** If the original location (URL) has subdirectories (e.g. 'Foobar.com/baz'), replace forward slashes with hyphens: 'Foobar.com-baz'. | |||
'''Tips''' | |||
* Scrape from the browse page (e.g. [http://bash.org/?browse http://bash.org/?browse]). This way you can scrape 10-50 quotes per page request, rather than cycling through thousands of individual quote pages. | |||
== Project Status == | == Project Status == | ||
Line 27: | Line 35: | ||
|The quote database that pretty much created all others. | |The quote database that pretty much created all others. | ||
|- | |- | ||
|[http://www.deaddyingdamned.com/qdb/ DeadDyingDamned.com/ | |[http://bomblol.net/ornot/ BombLol.net/ornot] | ||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://www.deaddyingdamned.com/qdb/ DeadDyingDamned.com/qdb/] | |||
|No | |No | ||
| | | | ||
|The unofficial ArchiveTeam QDB. I'll have the server automatically save these somewhere. --[[User:Auguste|Auguste]] 13:36, 7 April 2011 (UTC) | |The unofficial ArchiveTeam QDB. I'll have the server automatically save these somewhere. --[[User:Auguste|Auguste]] 13:36, 7 April 2011 (UTC) | ||
|- | |||
|[http://deanyderkheiser.net/qdb/ DeanyDerkheiser.net/qdb] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://freqbase.com/qdb/ FreqBase.com/qdb] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://frostfall-guild.com/ff/qdb/ Frostfall-Guild.com/ff/qdb] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://german-bash.org german-bash.org] | |||
|Yes ([http://cornelia.regengedanken.de/~mdrueing/files/german-bash.org%20Quote%20Collection%202011-04-08.7z here]) | |||
|Darkstar | |||
|German version of bash.org | |||
|- | |- | ||
|[http://www.i-rox.com I-Rox.com] | |[http://www.i-rox.com I-Rox.com] | ||
Line 36: | Line 69: | ||
|Auguste | |Auguste | ||
| | | | ||
|- | |||
|[http://ibash.de ibash.de] | |||
|Yes ([http://cornelia.regengedanken.de/~mdrueing/files/ibash.de%20Quote%20Collection%202011-04-09.7z here]) | |||
|Darkstar | |||
|Another German quotes DB | |||
|- | |||
|[http://jdl.host.hk-diy.net/quote/ JDL.Host.HK-DIY.net/quote] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://linuxcult.org/quotes/ LinuxCult.org/quotes] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://www.lolimbanned.com LolImBanned.com] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |- | ||
|[http://www.mandaliet.com/furcqdb/ Mandaliet.com/furcqdb/] | |[http://www.mandaliet.com/furcqdb/ Mandaliet.com/furcqdb/] | ||
Line 41: | Line 94: | ||
|Auguste | |Auguste | ||
|The Furcadia quote database | |The Furcadia quote database | ||
|- | |||
|[http://moarpupr.com/quotes MoarPupr.com/quotes] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://notsafeforsanity.com/quotes/ NotSafeForSanity.com/quotes] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://pilkipedia.co.uk/qdb Pilkipedia.co.uk/qdb] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |||
|[http://qdb.honk-honk.org/ QDB.Honk-Honk.org] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |- | ||
|[http://qdb.mit.edu QDB.MIT.edu] | |[http://qdb.mit.edu QDB.MIT.edu] | ||
Line 46: | Line 119: | ||
|Auguste | |Auguste | ||
|The MIT quote database | |The MIT quote database | ||
|- | |||
|[http://qdb.pesterchum.net/ QDB.PesterChum.net] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |- | ||
|[http://www.qdb.us QDB.us] | |[http://www.qdb.us QDB.us] | ||
Line 51: | Line 129: | ||
|Auguste | |Auguste | ||
| | | | ||
|- | |||
|[http://qdbs.chanops.org/ QDBS.ChanOps.org] | |||
|Yes | |||
|Auguste | |||
|Generic QdbS QDB | |||
|- | |- | ||
|[http://www.quoteirc.com QuoteIRC.com] | |[http://www.quoteirc.com QuoteIRC.com] | ||
Line 77: | Line 160: | ||
|The xkcd quote database | |The xkcd quote database | ||
|} | |} | ||
[[Category:Archive Team]] | |||
{{Navigation box}} |
Revision as of 02:54, 5 December 2017
What's this, then?
Auguste, BlueMax and Dr-Spangle are currently scraping IRC quote databases (e.g. Bash.org). If you can help out or suggest other quote databases to scrape, please join them in #bashup.
Project Hosting
Auguste is currently hosting scrapes here. Everybody is encouraged to help mirror.
QdbScraper
QdbScraper is a Perl script for scraping QdbS-powered quote databases (e.g. Auguste's ArchiveTeam QDB). At the moment, it's only capable of scraping QdbS installations that use the default template and/or minimal customization. Support for other common templates is coming soon.
Helping Out
Scraping doesn't take a lot of work; the QDBs are all more or less the same. You only need to write one script, then make a few changes to adapt it to any other QDB you want to scrape. The actual scraping process should easily take under 10 minutes.
If you do want to help with the scraping, please follow the existing scrape format:
- Each quote has its own file
- Each file is named 'n.txt', where 'n' is the quote's ID number
- All quotes should be compressed into an archive
- The archive name should identify the original location (URL) and date of scraping (e.g. 'QuoteIRC.com Quote Collection 2011-04-04.7z', or 'DOMAIN.TLD Quote Collection YYYY-MM-DD.EXT').
- If the original location (URL) has subdirectories (e.g. 'Foobar.com/baz'), replace forward slashes with hyphens: 'Foobar.com-baz'.
Tips
- Scrape from the browse page (e.g. http://bash.org/?browse). This way you can scrape 10-50 quotes per page request, rather than cycling through thousands of individual quote pages.
Project Status
Database | Has been scraped | Scraper | Notes |
---|---|---|---|
Bash.org | Yes | Dr-Spangle | The quote database that pretty much created all others. |
BombLol.net/ornot | Yes | Auguste | Generic QdbS QDB |
DeadDyingDamned.com/qdb/ | No | The unofficial ArchiveTeam QDB. I'll have the server automatically save these somewhere. --Auguste 13:36, 7 April 2011 (UTC) | |
DeanyDerkheiser.net/qdb | Yes | Auguste | Generic QdbS QDB |
FreqBase.com/qdb | Yes | Auguste | Generic QdbS QDB |
Frostfall-Guild.com/ff/qdb | Yes | Auguste | Generic QdbS QDB |
german-bash.org | Yes (here) | Darkstar | German version of bash.org |
I-Rox.com | Yes | Auguste | |
ibash.de | Yes (here) | Darkstar | Another German quotes DB |
JDL.Host.HK-DIY.net/quote | Yes | Auguste | Generic QdbS QDB |
LinuxCult.org/quotes | Yes | Auguste | Generic QdbS QDB |
LolImBanned.com | Yes | Auguste | Generic QdbS QDB |
Mandaliet.com/furcqdb/ | Yes | Auguste | The Furcadia quote database |
MoarPupr.com/quotes | Yes | Auguste | Generic QdbS QDB |
NotSafeForSanity.com/quotes | Yes | Auguste | Generic QdbS QDB |
Pilkipedia.co.uk/qdb | Yes | Auguste | Generic QdbS QDB |
QDB.Honk-Honk.org | Yes | Auguste | Generic QdbS QDB |
QDB.MIT.edu | Yes | Auguste | The MIT quote database |
QDB.PesterChum.net | Yes | Auguste | Generic QdbS QDB |
QDB.us | Yes | Auguste | |
QDBS.ChanOps.org | Yes | Auguste | Generic QdbS QDB |
QuoteIRC.com | Yes | Auguste | |
Quotes.BurntElectrons.org | Yes | Auguste | The IRC.Mozilla.org quote database |
WarpDrive.se | Yes | Auguste | Quotes are in Swedish |
WQDB.org | Yes | Auguste | The Worms quote database |
xkcdb.com | Yes | Auguste | The xkcd quote database |