~yosh@unix.dog

bandcamp scraping info

note: as of writing, this is very much unfinished. it’s meant to be a personal note more than anything.

bandcamp provides a lot of data on the basic html pages, outside of its official api, and it can usually get you a long way without needing the api. this data is used in free-bandcamp-downloader, a python script that I helped work on.

artist/label pages

release data

ol#music-grid contains the individual releases on the page. depending on the amount of releases, the release will actually be split up between direct li descendents as html and html-escaped json in the data-client-items attribute on the <ol>.

for <li> descendents, it’s structured as follows:

// release_type = "album" | "track"

<li data-item-id="{release_type}-{release_id}" data-band-id="{artist_id}">
  <a href="{release_link}">
    <div class="art">
      <img src="{cover_url}" alt=""/>
    </div>
    <p class="title">
      {release_title}
      <br/>
      <span class="artist-override">
        {artist_name}
      </span>
    </p>
  </a>
</li>

for the data-client-items attribute, you first need to unescape the xml. then the JSON follows:

[
 {
  "art_id": (int, cover art id)
  "artist": (str, as-printed)
  "band_id": (int)
  "id": (int)
  "page_url": (str)
  "title": (str)
  "type": ("album" | "track")
  "filtered": (bool)
 }
]

filtered only exists when a filter is applied (e.g. this). page_url will be a partial URL linking as /album/album-link if it’s on the same band page, or a full URI if not.

release page

release pages have two separate places where metadata lies: the inner text of head > script[type="application/ld+json"] and the data-tralbum attribute of head > script[data-tralbum] (i.e. the only script where data-tralbum exists. I’ll refer to the former as “head data” and the latter as “tralbum data”

head data

head data follows the MusicAlbum schema for /album/ links and the MusicRecording schema for /track/ links.

there are a few additionalProperty types as well:

Name Type Description
track_id int ID of the track
art_id int ID of the cover art
license_name string bandcamp-specific license type (todo: what are all the options?)

some miscellaneous notes on this:

  • bandcamp seems to prefer the artist’s custom domain if they have one as opposed to using the “canonical” bandcamp url.
  • inAlbum gets filled out even for standalone /track/ releases, and it’s where the offers for the track lie

tralbum data

tralbum data doesn’t have any pre-defined schema, so I’ll format it like one:

Root

Property Type Description
for the curious string Links to I heard you can steal music on Bandcamp. What are you doing about this? and Bandcamp Terms of Use
current Current Other information (todo: what determines this?
preorder_count ? ? (Seems to always be null, even for preorder releases)
hasAudio bool Whether the release page has any tracks at all
art_id int ID of the cover art
packages ?Package Miscellaneous packages on the page that aren’t the main offer (physicals, etc.)
defaultPrice float Default price of the release as shown on the page. Seems to be 9.0 if it’s NYP for some reason
freeDownloadPage ?URL Link to the free download page of the release. null if release is paid or requires e-mail to download
FREE int ? (Seems to always be 1?)
PAID int ? (Seems to always be 2?)
artist string Artist as written
item_type ItemType Type of release
id int ID of release
last_subscription_item ? ?
has_discounts bool Whether you can apply a discount code to the release.
is_bonus ?bool ?
play_cap_data TODO How many times songs can be streamed before… something.
is_purchased ?bool If you purchased the release. null when not logged in.
items_purchased TODO TODO
is_private_stream ?bool ?
is_band_member ?bool ?
licensed_version_ids ? ?
package_associated_license_id ? ?
has_video ?bool Whether the page has a video or not.
tralbum_subscriber_only bool Whether the release is a subscriber-only release
?featured_track_id ?int Featured track of the release. null when there’s no tracks in the album. Not present for track releases.
?initial_track_num ? ? Not present for track releases.
?is_preorder bool Whether the album is up for pre-order. Not present for track releases.
album_is_preorder bool Whether the album of the release is up for preorder.
album_release_date string Release date of the album.
trackinfo ?[TrackInfo] Array of track information
playing_from string “album page” or “track page”
url URL URL of the release. Will be the artist’s custom domain if they have one.
use_expando_lyrics bool ?

Current

TrackInfo

Package

ItemType

"album" | "track"

back