lastfm_archive v0.7.0 LastfmArchive View Source
lastfm_archive
is a tool for creating local Last.fm scrobble file archive, Solr archive and analytics.
The software is currently experimental and in preliminary development. It should eventually provide capability to perform ETL and analytic tasks on Lastfm scrobble data.
Current usage:
archive/0
,archive/2
: download all raw Lastfm scrobble data to local filesystemarchive/3
: download a data subset within a date rangesync/0
,sync/1
: sync Lastfm scrobble data to local filesystemtransform_archive/2
: transform downloaded raw data and create a TSV file archiveload_archive/2
: load all (TSV) data from the archive into Solr
Link to this section Summary
Functions
Download all scrobbled tracks and create an archive on local filesystem for the default user
Download all scrobbled tracks and create an archive on local filesystem for a Lastfm user
Download scrobbled tracks within a date range and create an archive on local filesystem for a Lastfm user
Load all TSV data from the archive into Solr for a Lastfm user
Sync scrobbled tracks for the default user
Sync scrobbled tracks for a Lastfm user
Transform downloaded raw JSON data and create a TSV file archive for a Lastfm user
Link to this section Types
date_range() :: :all | :today | :yesterday | integer() | Date.t() | Date.Range.t()
Link to this section Functions
Download all scrobbled tracks and create an archive on local filesystem for the default user.
Example
LastfmArchive.archive
The archive belongs to a default user specified in configuration, for example user_a
(in
config/config.exs
):
config :lastfm_archive,
user: "user_a",
... # other archiving options
See archive/2
for further details on archive format, file location and archiving options
archive(binary(), keyword()) :: :ok | {:error, :file.posix()}
Download all scrobbled tracks and create an archive on local filesystem for a Lastfm user.
Example
LastfmArchive.archive("a_lastfm_user")
# with archiving option
LastfmArchive.archive("a_lastfm_user", interval: 300) # 300ms interval between Lastfm API requests
LastfmArchive.archive("a_lastfm_user", overwrite: true) # re-fetch / overwrite downloaded data
Older scrobbles are archived on a yearly basis, whereas the latest (current year) scrobbles are extracted on a daily basis to ensure data immutability and updatability.
The data is currently in raw Lastfm recenttracks
JSON format, chunked into
200-track (max) gzip
compressed pages and stored within directories corresponding
to the years and days when tracks were scrobbled.
Options:
:interval
- default500
(ms), the duration between successive Lastfm API requests. This provides a control for request rate. The default interval ensures a safe rate that is within Lastfm’s term of service: no more than 5 requests per second:overwrite
- defaultfalse
, if sets to true the system will (re)fetch and overwrite any previously downloaded data. Use this option to refresh the file archive. Otherwise (false), the system will not be making calls to Lastfm to check and re-fetch data if existing data chunks / pages are found. This speeds up archive updating:per_page
- default200
, number of scrobbles per page in archive. The default is the max number of tracks per request permissible by Lastfm:daily
- defaultfalse
, an option for archiving at daily granularity, entailing smaller and immutable archive files suitable for latest scrobbles data update
The data is written to a main directory,
e.g. ./lastfm_data/a_lastfm_user/
as configured in
config/config.exs
:
config :lastfm_archive,
...
data_dir: "./lastfm_data/"
See archive/3
for archiving data within a date range.
Reruns and refresh archive
Lastfm API calls could timed out occasionally. When this happen
the function will continue archiving and move on to the next data chunk (page).
It will log the missing page event(s) in an error
directory.
Rerun the function
to download any missing data chunks. The function skips all existing
archived pages by default so that it will not make repeated calls to Lastfm.
Use the overwrite: true
option to re-fetch existing data.
To create a fresh or refresh part of the archive: delete all or some
files in the archive and re-run the function, or use the overwrite: true
option.
archive(binary(), date_range(), keyword()) :: :ok | {:error, :file.posix()}
Download scrobbled tracks within a date range and create an archive on local filesystem for a Lastfm user.
Example
LastfmArchive.archive("a_lastfm_user", :past_month)
# data from year 2016
LastfmArchive.archive("a_lastfm_user", 2016)
# with Date struct
LastfmArchive.archive("a_lastfm_user", ~D[2018-10-31])
# with Date.Range struct
d1 = ~D[2018-01-01]
d2 = d1 |> Date.add(7)
LastfmArchive.archive("a_lastfm_user", Date.range(d1, d2), daily: true, overwrite: true)
Supported date range:
:all
: archive all scrobble data between Lastfm registration date and now:today
,:yesterday
,:past_week
,past_month
- other convenience date rangesyyyy
(integer): data for a single yearDate
: data for a specific date - single dayDate.Range
: data for a specific date range
See archive/2
for more details on archiving options.
load_archive(binary(), solr_url()) :: :ok | {:error, Hui.Error.t()}
Load all TSV data from the archive into Solr for a Lastfm user.
The function finds TSV files from the archive and sends them to
Solr for ingestion one at a time. It uses Hui
client to interact
with Solr and the Hui.URL.t/0
struct
for Solr endpoint specification.
Example
# define a Solr endpoint with %Hui.URL{} struct
headers = [{"Content-type", "application/json"}]
url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}
LastfmArchive.load_archive("a_lastfm_user", url)
TSV files must be pre-created before the loading - see
transform_archive/2
.
Sync scrobbled tracks for the default user.
Example
LastfmArchive.sync
The first sync downloads all scrobbles and creates an archive on local filesystem. Subsequent sync calls download the latest scrobbles starting from the previous date of sync.
See archive/0
for further details on how to configured a default user.
Sync scrobbled tracks for a Lastfm user.
Example
LastfmArchive.sync("a_lastfm_user")
The first sync downloads all scrobbles and creates an archive on local filesystem. Subsequent sync calls
download only the latest scrobbles starting from the previous date of sync. The date of sync is logged in
a .lastfm_archive
file in the user archive data directory.
transform_archive(binary(), :tsv) :: :ok
Transform downloaded raw JSON data and create a TSV file archive for a Lastfm user.
Example
LastfmArchive.transform_archive("a_lastfm_user")
The function only transforms downloaded archive data on local filesystem. It does not fetch data from Lastfm,
which can be done via archive/2
, archive/3
.
The TSV files are created on a yearly basis and stored in gzip
compressed format.
They are stored in a tsv
directory within either the default ./lastfm_data/
or the directory specified in config/config.exs (:lastfm_archive, :data_dir
).