Difference between revisions of "500px"

From Archiveteam
Jump to navigation Jump to search
m (Reverted edits by Megalanya0 (talk) to last revision by Start)
(Added lots of info about Archival, API, IRC, etc.)
Line 4: Line 4:
| description = High-quality photo sharing & selling site
| description = High-quality photo sharing & selling site
| URL = {{url|1=http://www.500px.com}}
| URL = {{url|1=http://www.500px.com}}
| project_status = {{online}}
| project_status = {{endangered}
| archiving_status = {{notsavedyet}}
| archiving_status = {{notsavedyet}}
| irc = 500pieces
}}
}}
'''500px''' is a photo sharing site, that caters to high-quality photos. It provides ways to photographers to sell their images, as well as providing a large collection of images to view. While it doesn't seem to be in any danger currently, it has enough user-contributed material that ArchiveTeam should keep an eye on it.
'''500px''' is a photo sharing site, that caters to high-quality photos. It provides ways to photographers to sell their images, as well as providing a large collection of images to view. On June 30th, they are removing all Creative Commons images from their site (see https://support.500px.com/hc/en-us/articles/360005097533)
{{Navigation box}}


{{Navigation box}}
== Archival ==
My method of getting API info: Using the BurpSuite Pro Network Security tools, I set up a MITM attack in between a VM with a custom SSL CA certificate installed and the server. After intercepting a request to api.500px.com, I cloned the request and sent it to the "Intruder" Tool, where I set the page string in the GET request to the API as a 'payload', then had it auto-increment numbers while processing the requests and saving the responses. I set the limit to be 1000, although I ended up stopping it at around 900 because I noticed the responses were turning empty (and theres a total pages number in the api info). I 7zipped all of the responses and threw them up on the IA for someone to have a go at if they want, because after writing this I'm heading to bed.
Attribution License 3.0 All API Info: https://archive.org/details/AttributionLicense3APISeverResponses.7z
Example of one of the responses: https://pastebin.com/TygNSTSu
 
I also had a go at writing a python script that once given a list of URLs, it would parse and download all of the metadata and photos from those URLS: https://github.com/adinbied/500pxBU
 
Hopefully someone can pick up where I left off using what I've posted - I should be back around 3PM UTC on 6/29/18.
 
~adinbied

Revision as of 06:25, 29 June 2018

{{Infobox project | title = 500px | image = 500pxdotcom screenshot.png | description = High-quality photo sharing & selling site | URL = http://www.500px.com[IAWcite.todayMemWeb] | project_status = {{endangered} | archiving_status = Not saved yet | irc = 500pieces }} 500px is a photo sharing site, that caters to high-quality photos. It provides ways to photographers to sell their images, as well as providing a large collection of images to view. On June 30th, they are removing all Creative Commons images from their site (see https://support.500px.com/hc/en-us/articles/360005097533)


Archival

My method of getting API info: Using the BurpSuite Pro Network Security tools, I set up a MITM attack in between a VM with a custom SSL CA certificate installed and the server. After intercepting a request to api.500px.com, I cloned the request and sent it to the "Intruder" Tool, where I set the page string in the GET request to the API as a 'payload', then had it auto-increment numbers while processing the requests and saving the responses. I set the limit to be 1000, although I ended up stopping it at around 900 because I noticed the responses were turning empty (and theres a total pages number in the api info). I 7zipped all of the responses and threw them up on the IA for someone to have a go at if they want, because after writing this I'm heading to bed. Attribution License 3.0 All API Info: https://archive.org/details/AttributionLicense3APISeverResponses.7z Example of one of the responses: https://pastebin.com/TygNSTSu

I also had a go at writing a python script that once given a list of URLs, it would parse and download all of the metadata and photos from those URLS: https://github.com/adinbied/500pxBU

Hopefully someone can pick up where I left off using what I've posted - I should be back around 3PM UTC on 6/29/18.

~adinbied