lastfm_archive v0.7.2 LastfmArchive View Source

lastfm_archive is a tool for creating local Last.fm scrobble file archive, Solr archive and analytics.

The software is currently experimental and in preliminary development. It should eventually provide capability to perform ETL and analytic tasks on Lastfm scrobble data.

Current usage:

Link to this section Summary

Functions

Download all scrobbled tracks and create an archive on local filesystem for the default user

Download all scrobbled tracks and create an archive on local filesystem for a Lastfm user

Download scrobbled tracks within a date range and create an archive on local filesystem for a Lastfm user

Load all TSV data from the archive into Solr for a Lastfm user

Sync scrobbled tracks for the default user

Sync scrobbled tracks for a Lastfm user

Transform downloaded raw JSON data and create a TSV file archive for a Lastfm user

Link to this section Types

Link to this type date_range() View Source
date_range() ::
  :all | :today | :yesterday | integer() | Date.t() | Date.Range.t()
Link to this type solr_url() View Source
solr_url() :: atom() | Hui.URL.t()

Link to this section Functions

Link to this function archive() View Source
archive() :: :ok | {:error, :file.posix()}

Download all scrobbled tracks and create an archive on local filesystem for the default user.

Example

  LastfmArchive.archive

The archive belongs to a default user specified in configuration, for example user_a (in config/config.exs):

  config :lastfm_archive,
    user: "user_a",
    ... # other archiving options

See archive/2 for further details on archive format, file location and archiving options

Link to this function archive(user, options) View Source
archive(binary(), keyword()) :: :ok | {:error, :file.posix()}

Download all scrobbled tracks and create an archive on local filesystem for a Lastfm user.

Example

  LastfmArchive.archive("a_lastfm_user")

  # with archiving option
  LastfmArchive.archive("a_lastfm_user", interval: 300) # 300ms interval between Lastfm API requests
  LastfmArchive.archive("a_lastfm_user", overwrite: true) # re-fetch / overwrite downloaded data

Older scrobbles are archived on a yearly basis, whereas the latest (current year) scrobbles are extracted on a daily basis to ensure data immutability and updatability.

The data is currently in raw Lastfm recenttracks JSON format, chunked into 200-track (max) gzip compressed pages and stored within directories corresponding to the years and days when tracks were scrobbled.

Options:

  • :interval - default 500(ms), the duration between successive Lastfm API requests. This provides a control for request rate. The default interval ensures a safe rate that is within Lastfm’s term of service: no more than 5 requests per second

  • :overwrite - default false, if sets to true the system will (re)fetch and overwrite any previously downloaded data. Use this option to refresh the file archive. Otherwise (false), the system will not be making calls to Lastfm to check and re-fetch data if existing data chunks / pages are found. This speeds up archive updating

  • :per_page - default 200, number of scrobbles per page in archive. The default is the max number of tracks per request permissible by Lastfm

  • :daily - default false, an option for archiving at daily granularity, entailing smaller and immutable archive files suitable for latest scrobbles data update

The data is written to a main directory, e.g. ./lastfm_data/a_lastfm_user/ as configured in config/config.exs:

  config :lastfm_archive,
    ...
    data_dir: "./lastfm_data/"

See archive/3 for archiving data within a date range.

Reruns and refresh archive

Lastfm API calls could timed out occasionally. When this happen the function will continue archiving and move on to the next data chunk (page). It will log the missing page event(s) in an error directory.

Rerun the function to download any missing data chunks. The function skips all existing archived pages by default so that it will not make repeated calls to Lastfm. Use the overwrite: true option to re-fetch existing data.

To create a fresh or refresh part of the archive: delete all or some files in the archive and re-run the function, or use the overwrite: true option.

Link to this function archive(user, date_range \\ :all, options \\ []) View Source
archive(binary(), date_range(), keyword()) :: :ok | {:error, :file.posix()}

Download scrobbled tracks within a date range and create an archive on local filesystem for a Lastfm user.

Example

  LastfmArchive.archive("a_lastfm_user", :past_month)

  # data from year 2016
  LastfmArchive.archive("a_lastfm_user", 2016)

  # with Date struct
  LastfmArchive.archive("a_lastfm_user", ~D[2018-10-31])

  # with Date.Range struct
  d1 = ~D[2018-01-01]
  d2 = d1 |> Date.add(7)
  LastfmArchive.archive("a_lastfm_user", Date.range(d1, d2), daily: true, overwrite: true)

Supported date range:

  • :all: archive all scrobble data between Lastfm registration date and now
  • :today, :yesterday, :past_week, past_month - other convenience date ranges
  • yyyy (integer): data for a single year
  • Date: data for a specific date - single day
  • Date.Range: data for a specific date range

See archive/2 for more details on archiving options.

Link to this function load_archive(user, url) View Source
load_archive(binary(), solr_url()) :: :ok | {:error, Hui.Error.t()}

Load all TSV data from the archive into Solr for a Lastfm user.

The function finds TSV files from the archive and sends them to Solr for ingestion one at a time. It uses Hui client to interact with Solr and the Hui.URL.t/0 struct for Solr endpoint specification.

Example

  # define a Solr endpoint with %Hui.URL{} struct
  headers = [{"Content-type", "application/json"}]
  url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}

  LastfmArchive.load_archive("a_lastfm_user", url)

TSV files must be pre-created before the loading - see transform_archive/2.

Link to this function sync() View Source
sync() :: :ok | {:error, :file.posix()}

Sync scrobbled tracks for the default user.

Example

  LastfmArchive.sync

The first sync downloads all scrobbles and creates an archive on local filesystem. Subsequent sync calls download the latest scrobbles starting from the previous date of sync.

See archive/0 for further details on how to configured a default user.

Link to this function sync(user) View Source
sync(binary()) :: :ok | {:error, :file.posix()}

Sync scrobbled tracks for a Lastfm user.

Example

  LastfmArchive.sync("a_lastfm_user")

The first sync downloads all scrobbles and creates an archive on local filesystem. Subsequent sync calls download only the latest scrobbles starting from the previous date of sync. The date of sync is logged in a .lastfm_archive file in the user archive data directory.

Link to this function transform_archive(user, mode \\ :tsv) View Source
transform_archive(binary(), :tsv) :: :ok

Transform downloaded raw JSON data and create a TSV file archive for a Lastfm user.

Example

  LastfmArchive.transform_archive("a_lastfm_user")

The function only transforms downloaded archive data on local filesystem. It does not fetch data from Lastfm, which can be done via archive/2, archive/3.

The TSV files are created on a yearly basis and stored in gzip compressed format. They are stored in a tsv directory within either the default ./lastfm_data/ or the directory specified in config/config.exs (:lastfm_archive, :data_dir).