Lastfm Archive

A tool for creating local Last.fm scrobble file archive, Solr archive and analytics.
The software is currently experimental and in preliminary development. It should eventually provide capability to perform ETL and analytic tasks on Lastfm scrobble data.
Current Usage
Download and create a file archive of Lastfm scrobble tracks via Elixir
applications or interactive Elixir
(invoking iex -S mix
command line action while in software home directory).
# archive all data of a default user specified in configuration
LastfmArchive.archive
LastfmArchive.sync # subsequent calls download only latest scrobbles
# archive all data of any Lastfm user
# the data is stored in directory named after the user
LastfmArchive.archive("a_lastfm_user")
LastfmArchive.sync("a_lastfm_user") # subsequent calls download only latest scrobbles
# archive a data subset
LastfmArchive.archive("a_lastfm_user", :past_month)
# data from year 2016
LastfmArchive.archive("a_lastfm_user", 2016)
# with Date struct
LastfmArchive.archive("a_lastfm_user", ~D[2018-10-31])
# with Date.Range struct and archiving options
d1 = ~D[2018-01-01]
d2 = d1 |> Date.add(7)
LastfmArchive.archive("a_lastfm_user", Date.range(d1, d2), daily: true, overwrite: true)
Older scrobbles are archived on a yearly basis, whereas the latest (current year) scrobbles are extracted on a daily basis to ensure data immutability and updatability.
The data is currently in raw Lastfm recenttracks
JSON format,
chunked into 200-track (max) gzip
compressed pages and stored within directories
corresponding to the years or days when tracks were scrobbled.
See archive/2
,
archive/3
for more details
and archiving options.
The data is written to a main directory specified in configuration - see below.
To generate a TSV file archive from downloaded data:
# transform all data of a user into to TSV files
LastfmArchive.transform_archive("a_lastfm_user")
See transform_archive/2
.
Loading data into Solr
To load all transformed TSV data from the archive into Solr:
# define a Solr endpoint with %Hui.URL{} struct
headers = [{"Content-type", "application/json"}]
url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}
LastfmArchive.load_archive("a_lastfm_user", url)
# use Solr endpoint from config setting - Configuration below
LastfmArchive.load_archive("a_lastfm_user", :lastfm_archive)
The function finds TSV files from the archive and send them to
Solr for ingestion one at a time. It uses Hui
client to interact
with Solr and the Hui.URL.t/0
struct for Solr endpoint specification.
Requirement
This tool requires Elixir and Erlang, see installation details for various operating systems.
Installation
lastfm_archive
is available in Hex,
the package can be installed by adding lastfm_archive
to your list of dependencies in mix.exs
:
def deps do
[
{:lastfm_archive, "~> 0.6.0"}
]
end
Documentation can be found at https://hexdocs.pm/lastfm_archive.
Configuration
Add the following entries in your config - config/config.exs
. For example,
the following specifies a default_user
and a main file location for
multiple user archives, ./lastfm_data/
relative to the software home directory.
config :lastfm_archive,
user: "default_user", # the default user
data_dir: "./lastfm_data/", # main directory for multiple archives
per_page: 200, # 200 is max no. of tracks per call permitted by Lastfm API
interval: 500 # milliseconds between requests cf. Lastfm's max 5 reqs/s rate limit
config :elixirfm,
lastfm_ws: "http://ws.audioscrobbler.com/",
api_key: "", # mandatory
secret_key: ""
# optional: Solr endpoint for Lastfm data loading
config :hui, :lastfm_archive,
url: "http://localhost:8983/solr/lastfm_archive",
handler: "update",
headers: [{"Content-type", "application/json"}]
See archive/2
for other configurable archiving options, e.g. interval
, per_page
.
See Hui
for more details on Solr configuration.
An api_key
must be configured to enable Lastfm API requests,
see https://www.last.fm/api (“Get an API account”).