View Source LastfmArchive (lastfm_archive v0.11.0)
lastfm_archive
is a tool for creating local file archive and analytics of Last.fm music listening data.
Current usage:
sync/0
,sync/1
: sync Lastfm scrobble data to local filesystemtransform/0
,transform/2
: transform downloaded raw data to archive in other formats, e.g. CSV, Apache Parquet, Arrowread/2
: daily amd monthly data frame of the file archive, or yearly data frame from various archive formatsload_archive/2
: load all CSV data from the archive into Solr
Summary
Functions
Returns the default coofigured Lastfm user
Returns the total playcount and registered, i.e. earliest scrobble time for a user.
Load all CSV data from the archive into Solr for a Lastfm user.
Read from an archive of a Lastfm user.
Sync scrobbles for a Lastfm user.
Transform downloaded file archive into CSV or Apache Parquet formats for a Lastfm user.
Types
@type metadata() :: LastfmArchive.Archive.Metadata.t()
@type options() :: LastfmArchive.Behaviour.Archive.options()
Functions
Returns the default coofigured Lastfm user
Returns the total playcount and registered, i.e. earliest scrobble time for a user.
@spec load_archive(binary(), solr_url()) :: :ok | {:error, Hui.Error.t()}
Load all CSV data from the archive into Solr for a Lastfm user.
The function finds CSV files from the archive and sends them to
Solr for ingestion one at a time. It uses Hui
client to interact
with Solr and the Hui.URL.t/0
struct
for Solr endpoint specification.
Example
# define a Solr endpoint with %Hui.URL{} struct
headers = [{"Content-type", "application/json"}]
url = %Hui.URL{url: "http://localhost:8983/solr/lastfm_archive", handler: "update", headers: headers}
LastfmArchive.load_archive("a_lastfm_user", url)
CSV files must be pre-created before the loading - see
transform/2
.
Read from an archive of a Lastfm user.
This returns scrobbles for a single day or month period in a lazy Explorer.DataFrame for further data manipulation and visualisation.
Example
# read a single-day scrobbles from the configured
# archive (FileArchive) and default user
LastfmArchive.read(day: ~D[2022-12-31])
# read a single-month scrobbles for a user
LastfmArchive.read("a_lastfm_user", month: ~D[2022-12-31])
Options:
:day
- read scrobbles for this particular date (Date.t()
):month
- read scrobbles for this particular month (Date.t()
)
This function can also return a lazy data frame from derived archive.
i.e. CSV, Parquet archives created via transform/2
.
Example
# read a single year of scrobbles for a user from Parquet archive
LastfmArchive.read("a_lastfm_user", format: :parquet, year: 2023)
# read everything for a user from Parquet archive
LastfmArchive.read("a_lastfm_user", format: :parquet)
Options:
:format
(required) - derived archive format::csv
,:parquet
,:ipc
,:ipc_stream
:year
- only read scrobbles for this particular year:columns
- an atom list for retrieving only a columns subset, available columns::album
,:album_mbid
,:artist
,:artist_mbid
,:artist_url
,:datetime
,:datetime_unix
,:id
,:mbid
,:name
,:url
,:year
@spec sync( binary(), keyword() ) :: {:ok, metadata()} | {:error, :file.posix()}
Sync scrobbles for a Lastfm user.
Example
LastfmArchive.sync("a_lastfm_user")
You can also specify a default user is in configuration,
for example user_a
in config/config.exs
:
config :lastfm_archive,
user: "user_a",
... # other archiving options
And run:
LastfmArchive.sync
The first sync downloads all daily scrobbles in 200-track (gzip compressed) chunks that are written into a local file archive. Subsequent syncs extract further scrobbles starting from the date of latest downloaded scrobbles.
The data is currently in raw Lastfm recenttracks
JSON format, chunked into
200-track (max) gzip
compressed pages and stored within directories corresponding
to the days when tracks were scrobbled.
Options:
:interval
- default1000
(ms), the duration between successive Lastfm API requests. This provides a control for request rate. The default interval ensures a safe rate that is within Lastfm's term of service: no more than 5 requests per second:overwrite
- defaultfalse
(not available currently), if sets to true the system will (re)fetch and overwrite any previously downloaded data. Use this option to refresh the file archive. Otherwise (false), the system will not be making calls to Lastfm to check and re-fetch data if existing data chunks / pages are found. This speeds up archive updating:per_page
- default200
, number of scrobbles per page in archive. The default is the max number of tracks per request permissible by Lastfm:data_dir
- defaultlastfm_data
. The file archive is created within a main data directory, e.g../lastfm_data/a_lastfm_user/
.
These options can be configured in config/config.exs
:
config :lastfm_archive,
...
data_dir: "./lastfm_data/"
Transform downloaded file archive into CSV or Apache Parquet formats for a Lastfm user.
Example
LastfmArchive.transform("a_lastfm_user", format: :csv)
# transform archive of the default user into CSV files
LastfmArchive.transform()
The function only transforms downloaded archive data on local filesystem. It does not fetch data from Lastfm,
which can be done via sync/2
.
The transformed files are created on a yearly basis and stored in gzip
compressed format.
They are stored in a csv
or parquet
directory within either the default ./lastfm_data/
or the directory specified in config/config.exs (:lastfm_archive, :data_dir
).
Options:
:format
- format into which file archive is transformed::csv
,:parquet
,:ipc
,:ipc_stream
:overwrite
existing data, default: false:year
- transform data for this particular year