bandcamp scraping info
note: as of writing, this is very much unfinished. it’s meant to be a personal note more than anything.
bandcamp provides a lot of data on the basic html pages, outside of its official api, and it can usually get you a long way without needing the api. this data is used in free-bandcamp-downloader, a python script that I helped work on.
artist/label pages
release data
ol#music-grid contains the individual releases on the page. depending on the amount of releases, the release will actually be split up between direct li descendents as html and html-escaped json in the data-client-items attribute on the <ol>.
for <li> descendents, it’s structured as follows:
// release_type = "album" | "track"
<li data-item-id="{release_type}-{release_id}" data-band-id="{artist_id}">
<a href="{release_link}">
<div class="art">
<img src="{cover_url}" alt=""/>
</div>
<p class="title">
{release_title}
<br/>
<span class="artist-override">
{artist_name}
</span>
</p>
</a>
</li>
for the data-client-items attribute, you first need to unescape the xml. then the JSON follows:
[
{
"art_id": (int, cover art id)
"artist": (str, as-printed)
"band_id": (int)
"id": (int)
"page_url": (str)
"title": (str)
"type": ("album" | "track")
"filtered": (bool)
}
]
filtered only exists when a filter is applied (e.g. this). page_url will be a partial URL linking as /album/album-link if it’s on the same band page, or a full URI if not.
release page
release pages have two separate places where metadata lies: the inner text of head > script[type="application/ld+json"] and the data-tralbum attribute of head > script[data-tralbum] (i.e. the only script where data-tralbum exists. I’ll refer to the former as “head data” and the latter as “tralbum data”
head data
head data follows the MusicAlbum schema for /album/ links and the MusicRecording schema for /track/ links.
there are a few additionalProperty types as well:
| Name | Type | Description |
|---|---|---|
track_id |
int | ID of the track |
art_id |
int | ID of the cover art |
license_name |
string | bandcamp-specific license type (todo: what are all the options?) |
some miscellaneous notes on this:
- bandcamp seems to prefer the artist’s custom domain if they have one as opposed to using the “canonical” bandcamp url.
-
inAlbumgets filled out even for standalone/track/releases, and it’s where theoffersfor the track lie
tralbum data
tralbum data doesn’t have any pre-defined schema, so I’ll format it like one:
Root
| Property | Type | Description |
|---|---|---|
for the curious |
string | Links to I heard you can steal music on Bandcamp. What are you doing about this? and Bandcamp Terms of Use |
current |
Current | Other information (todo: what determines this? |
preorder_count |
? | ? (Seems to always be null, even for preorder releases) |
hasAudio |
bool | Whether the release page has any tracks at all |
art_id |
int | ID of the cover art |
packages |
?Package | Miscellaneous packages on the page that aren’t the main offer (physicals, etc.) |
defaultPrice |
float | Default price of the release as shown on the page. Seems to be 9.0 if it’s NYP for some reason |
freeDownloadPage |
?URL | Link to the free download page of the release. null if release is paid or requires e-mail to download |
FREE |
int | ? (Seems to always be 1?) |
PAID |
int | ? (Seems to always be 2?) |
artist |
string | Artist as written |
item_type |
ItemType | Type of release |
id |
int | ID of release |
last_subscription_item |
? | ? |
has_discounts |
bool | Whether you can apply a discount code to the release. |
is_bonus |
?bool | ? |
play_cap_data |
TODO | How many times songs can be streamed before… something. |
is_purchased |
?bool | If you purchased the release. null when not logged in. |
items_purchased |
TODO | TODO |
is_private_stream |
?bool | ? |
is_band_member |
?bool | ? |
licensed_version_ids |
? | ? |
package_associated_license_id |
? | ? |
has_video |
?bool | Whether the page has a video or not. |
tralbum_subscriber_only |
bool | Whether the release is a subscriber-only release |
?featured_track_id |
?int | Featured track of the release. null when there’s no tracks in the album. Not present for track releases. |
?initial_track_num |
? | ? Not present for track releases. |
?is_preorder |
bool | Whether the album is up for pre-order. Not present for track releases. |
album_is_preorder |
bool | Whether the album of the release is up for preorder. |
album_release_date |
string | Release date of the album. |
trackinfo |
?[TrackInfo] | Array of track information |
playing_from |
string | “album page” or “track page” |
url |
URL | URL of the release. Will be the artist’s custom domain if they have one. |
use_expando_lyrics |
bool | ? |
Current
TrackInfo
Package
ItemType
"album" | "track"






















