Skip to content

Commit

Permalink
deduplication strategies added, version updated to 0.0.2
Browse files Browse the repository at this point in the history
  • Loading branch information
unrec committed Feb 25, 2023
1 parent 8e3068f commit 7783260
Show file tree
Hide file tree
Showing 10 changed files with 244 additions and 55 deletions.
41 changes: 31 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,42 @@
# lastfm-tracks-dumper

This cmd application obtains listened (aka *scrobbled*) tracks for a specific [Last.fm](https://www.last.fm/home) user
and save it to .csv file.
![lastfm_csv](docs/lastfm_header_new_400x92.png)

### Usage
The application obtains listened (aka *scrobbled*) tracks for a specific [Last.fm](https://www.last.fm/home) user
and save it to a **.csv** file. The application allows to download all scrobbled tracks or duplicated as well.

Run the .jar and pass two parameters as the arguments:
### Usage

1. Required Last.fm username
2. Last.fm API token, see [here](https://www.last.fm/api#getting-started).
Run the application **.jar** file and provide next parameters:
- required username (`--user`)
- API token, see [here](https://www.last.fm/api#getting-started) (`--token`)
- download strategy (`--starategy`, optional):
* `default` - get all scrobbled tracks
* `only-duplicates` - get duplicated tracks without the 1st one (can be used for deduplication of the library)
* `without-duplicates` - get only duplicated tracks (each duplicated track of the scrobbling history will be shown once)

```shell

java -jar lastfm-tracks-dumper-0.0.1-standalone.jar 'username' 'api_token'
java -jar lastfm-tracks-dumper.jar --user %user% --token %token% --strategy default
```

Currently only `date`, `artist`, `track` and `album` fields are saved to .csv.
### Duplicates

Due to scrobbling issues duplicated tracks can appear in the library 2 or more times. The application determine
duplicates with two rules:
1. Duplicated tracks go in sequential order.
2. Difference in the scrobbled time is less than 5 sec.

![duplicates](docs/duplicates_720x330.png)

Depending on the strategy there will be different output result:
* `only-duplicates` - track **Human** will appear 2 times, track **Be Mine** 1 time.
* `without-duplicates` - each duplicated track will appear just once.

Besides there is some [issue](
### Exported .csv data

Currently only `date`, `artist`, `track` and `album` values are saved to .csv.

Besides there is an [issue](
https://support.last.fm/t/invalid-mbids-in-responses-to-user-gettoptracks-and-user-getrecenttracks/2011) with *track/artist/album* ids and that's why this id data is not valuable right now.

For `only-duplicates` strategy 2 more fields added: `page` and `pageLink` for easy navigation in the library.
4 changes: 3 additions & 1 deletion build.gradle.kts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

version = "0.0.1"
version = "0.0.2"
group = "com.unrec"
description = "lastfm-tracks-dumper"
java.sourceCompatibility = JavaVersion.VERSION_11
Expand Down Expand Up @@ -30,6 +30,8 @@ dependencies {
implementation("com.squareup.okhttp3:okhttp:4.9.0")
implementation("me.tongfei","progressbar","0.9.4")

implementation("org.seleniumhq.selenium:selenium-java:4.8.1")

implementation("com.fasterxml.jackson.module", "jackson-module-kotlin", Versions.JACKSON)
implementation("com.fasterxml.jackson.dataformat", "jackson-dataformat-csv", Versions.JACKSON)
implementation("com.fasterxml.jackson.datatype", "jackson-datatype-jsr310", Versions.JACKSON)
Expand Down
Binary file added docs/duplicates_720x330.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/lastfm_header_new_400x92.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
106 changes: 64 additions & 42 deletions src/main/kotlin/com/unrec/lastfm/tracks/dumper/App.kt
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,31 @@ import com.fasterxml.jackson.core.JsonGenerator
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.dataformat.csv.CsvGenerator
import com.fasterxml.jackson.dataformat.csv.CsvMapper
import com.fasterxml.jackson.dataformat.csv.CsvSchema
import com.fasterxml.jackson.module.kotlin.registerKotlinModule
import com.unrec.lastfm.tracks.dumper.Constants.baseUrl
import com.unrec.lastfm.tracks.dumper.Constants.defaultPageSize
import com.unrec.lastfm.tracks.dumper.Constants.fetchPageSize
import com.unrec.lastfm.tracks.dumper.Constants.strategyKey
import com.unrec.lastfm.tracks.dumper.Constants.tokenKey
import com.unrec.lastfm.tracks.dumper.Constants.userKey
import com.unrec.lastfm.tracks.dumper.CsvSchemas.schemaMap
import com.unrec.lastfm.tracks.dumper.model.Track
import com.unrec.lastfm.tracks.dumper.model.UserInfo
import com.unrec.lastfm.tracks.dumper.utils.asConfig
import com.unrec.lastfm.tracks.dumper.utils.countPages
import com.unrec.lastfm.tracks.dumper.utils.extractTracks
import com.unrec.lastfm.tracks.dumper.utils.extractUser
import com.unrec.lastfm.tracks.dumper.utils.recentTracksGetRequest
import com.unrec.lastfm.tracks.dumper.utils.toSitePage
import com.unrec.lastfm.tracks.dumper.utils.userInfoGetRequest
import com.unrec.lastfm.tracks.dumper.utils.userPageUrl
import kotlinx.coroutines.runBlocking
import me.tongfei.progressbar.ProgressBar
import okhttp3.ConnectionPool
import okhttp3.OkHttpClient
import ru.gildor.coroutines.okhttp.await
import java.io.File
import java.net.SocketTimeoutException
import java.time.LocalDate
import java.time.format.DateTimeFormatter
import java.util.concurrent.ConcurrentHashMap
Expand All @@ -31,81 +42,92 @@ fun main(args: Array<String>) {

val measureTimeMillis = measureTimeMillis {

// get user info for total pages value
val userInfoRequest = userInfoGetRequest(baseUrl, args[0], args[1])
// define the settings
val settings = args.asConfig()
val user = settings[userKey]!!
val token = settings[tokenKey]!!
val filterStrategy = when (val strategy = settings[strategyKey]) {
null -> defaultStrategy
else -> strategiesMap[strategy] ?: throw IllegalArgumentException("Incorrect strategy is provided")
}

// check if the user exists
val userInfoRequest = userInfoGetRequest(baseUrl, user, token)
val userResponse = client.newCall(userInfoRequest).execute()
if (userResponse.code == 404) {
println("Failed to get data for the '${args[0]}' user")
println("Failed to get data for the '$user' user")
exitProcess(1)
}

// get the user info for a total pages amount
val userInfoResponse = client.newCall(userInfoRequest).execute().body?.string()
val userInfo: UserInfo = mapper.extractUser(userInfoResponse!!)
val totalPages = countPages(userInfo.playCount, pageSize)

println("Starting to load Last.fm data for '${args[0]}' user. Total pages to fetch: $totalPages")
val totalScrobbles = userInfo.playCount
println("Total scrobbles: $totalScrobbles, last.fm pages: ${countPages(totalScrobbles, defaultPageSize)} ")
val pagesToFetch = countPages(totalScrobbles, fetchPageSize)

// starting to consume tracks
val map = ConcurrentHashMap<Int, List<Track>>()
val progressBar = ProgressBar("Pages processed:", totalPages.toLong())
val progressBar = ProgressBar("Pages processed:", pagesToFetch.toLong())

println("Starting to load Last.fm data for '$user' user. \nTotal pages to fetch: $pagesToFetch")

runBlocking {
for (page in totalPages downTo 1) {
val request = recentTracksGetRequest(baseUrl, args[0], args[1], page, pageSize)
val response = client.newCall(request).await()
val tracks = mapper.extractTracks(response.body?.string()!!)
map[page] = tracks
progressBar.step()
for (page in pagesToFetch downTo 1) {
runCatching {
val request = recentTracksGetRequest(baseUrl, user, token, page, fetchPageSize)
val response = client.newCall(request).await()
val tracks = mapper.extractTracks(response.body?.string()!!)
val refinedTracks = tracks.let(filterStrategy)

for ((index, track) in refinedTracks.withIndex()) {
val sitePage = index.toSitePage()
track.page = sitePage
track.pageUrl = userPageUrl(user, sitePage)
}
map[page] = refinedTracks
progressBar.step()
}.onFailure {
when (it) {
is SocketTimeoutException -> {
println("Failed to fetch data from Last.fm due to ${it.javaClass}: ${it.message}")
exitProcess(1)
}

else -> throw it
}
}
}
}
progressBar.close()

val tracks = mutableListOf<Track>()
for (page in 1..totalPages) {
for (page in 1..pagesToFetch) {
tracks.addAll(map[page]!!)
}

println("Tracks loaded = ${tracks.size}")
println("Tracks found = ${tracks.size}")

// save tracks to .csv file
val csvMapper = CsvMapper()
csvMapper.configure(JsonGenerator.Feature.IGNORE_UNKNOWN, true)
csvMapper.configure(CsvGenerator.Feature.ALWAYS_QUOTE_STRINGS, false)

val schema = schemaMap[filterStrategy]
val formatter = DateTimeFormatter.ofPattern("yyyy_MM_dd")
val outputFile = File("${args[0]}_${LocalDate.now().format(formatter)}.csv")
val outputFile = File("${user}_${LocalDate.now().format(formatter)}.csv")
val objectWriter = csvMapper.writerFor(Track::class.java).with(schema)
objectWriter.writeValues(outputFile.bufferedWriter()).writeAll(tracks)
}
println("Total dump time = ${measureTimeMillis.toDuration(DurationUnit.MILLISECONDS)}")
exitProcess(0)
}

private const val baseUrl = "http://ws.audioscrobbler.com/2.0/"
private const val pageSize = 200
val mapper = ObjectMapper().registerKotlinModule()

private val mapper = ObjectMapper().registerKotlinModule()
private val csvMapper: ObjectMapper = CsvMapper()
.configure(CsvGenerator.Feature.ALWAYS_QUOTE_STRINGS, false)
.configure(JsonGenerator.Feature.IGNORE_UNKNOWN, true)

private val client = OkHttpClient.Builder()
.connectionPool(ConnectionPool(20, 5, TimeUnit.MINUTES))
.readTimeout(60, TimeUnit.SECONDS)
.connectTimeout(30, TimeUnit.SECONDS)
.writeTimeout(30, TimeUnit.SECONDS)
.retryOnConnectionFailure(true)
.build()

private val schema: CsvSchema = CsvSchema.builder()
.setColumnSeparator(';')
.disableQuoteChar()
.setUseHeader(true)
.addColumn("date")
.addColumn("artist")
.addColumn("track")
.addColumn("album")
.build()

private fun countPages(total: Int, pageSize: Int) = kotlin.math.ceil(total.toDouble() / pageSize).toInt()





12 changes: 12 additions & 0 deletions src/main/kotlin/com/unrec/lastfm/tracks/dumper/Constants.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package com.unrec.lastfm.tracks.dumper

object Constants {

const val userKey = "--user"
const val tokenKey = "--token"
const val strategyKey = "--strategy"

const val baseUrl = "http://ws.audioscrobbler.com/2.0/"
const val fetchPageSize = 200
const val defaultPageSize = 50
}
35 changes: 35 additions & 0 deletions src/main/kotlin/com/unrec/lastfm/tracks/dumper/CsvSchemas.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
package com.unrec.lastfm.tracks.dumper

import com.fasterxml.jackson.dataformat.csv.CsvSchema

object CsvSchemas {

private val defaultSchema: CsvSchema = CsvSchema.builder()
.setColumnSeparator(';')
.disableQuoteChar()
.setUseHeader(true)
.addColumn("date")
.addColumn("artist")
.addColumn("track")
.addColumn("album")
.build()

private val schemaWithPages: CsvSchema = CsvSchema.builder()
.setColumnSeparator(';')
.disableQuoteChar()
.setUseHeader(true)
.addColumn("date")
.addColumn("artist")
.addColumn("track")
.addColumn("album")
.addColumn("page")
.addColumn("pageLink")
.build()

val schemaMap = mapOf(
defaultStrategy to defaultSchema,
withoutDuplicatesStrategy to defaultSchema,
duplicatesOnlyStrategy to schemaWithPages
)
}

33 changes: 33 additions & 0 deletions src/main/kotlin/com/unrec/lastfm/tracks/dumper/Strategies.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
package com.unrec.lastfm.tracks.dumper

import com.unrec.lastfm.tracks.dumper.model.Track

val defaultStrategy = { list: List<Track> -> list }

val withoutDuplicatesStrategy = { list: List<Track> -> list.removeAdjacentDuplicates() }

val duplicatesOnlyStrategy = { list: List<Track> -> list.onlyDuplicates() }

fun <T : Any> Iterable<T>.removeAdjacentDuplicates(): List<T> {
var last: T? = null
return mapNotNull {
if (it == last) {
null
} else {
last = it
it
}
}
}

fun <T : Any> Iterable<T>.onlyDuplicates(): List<T> {
return this.zipWithNext()
.filter { it.first == it.second }
.map { it.second }
}

val strategiesMap = mapOf(
"default" to defaultStrategy,
"without-duplicates" to withoutDuplicatesStrategy,
"only-duplicates" to duplicatesOnlyStrategy
)
29 changes: 27 additions & 2 deletions src/main/kotlin/com/unrec/lastfm/tracks/dumper/model/Track.kt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,31 @@ data class Track(
val utsDate: Long,

@field:JsonProperty("date")
val textDate: String
val textDate: String,

)
var page: Int = 0,

var pageUrl: String = ""

) {

override fun equals(other: Any?): Boolean {
if (this === other) return true
if (javaClass != other?.javaClass) return false

other as Track

if (trackName != other.trackName) return false
if (artistName != other.artistName) return false
if (utsDate - other.utsDate > 5) return false

return true
}

override fun hashCode(): Int {
var result = trackName.hashCode()
result = 31 * result + artistName.hashCode()
result = 31 * result + textDate.hashCode()
return result
}
}
39 changes: 39 additions & 0 deletions src/main/kotlin/com/unrec/lastfm/tracks/dumper/utils/AppUtils.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
package com.unrec.lastfm.tracks.dumper.utils

import com.unrec.lastfm.tracks.dumper.Constants.defaultPageSize
import com.unrec.lastfm.tracks.dumper.Constants.strategyKey
import com.unrec.lastfm.tracks.dumper.Constants.tokenKey
import com.unrec.lastfm.tracks.dumper.Constants.userKey
import kotlin.system.exitProcess

fun Array<String>.asConfig(): Map<String, String> {

if (this.size % 2 != 0) {
println("Incorrect parameters are provided")
exitProcess(1)
}

val map = this.toList().chunked(2).associate { it[0] to it[1] }

if (!map.keys.contains(userKey)) {
println("User is not specified")
exitProcess(1)
}

if (!map.keys.contains(tokenKey)) {
println("API token is not provided")
exitProcess(1)
}

if (map[strategyKey] == null) {
println("Strategy is not specified, tracks will not be filtered.")
}

return map
}

fun countPages(total: Int, pageSize: Int) = kotlin.math.ceil(total.toDouble() / pageSize).toInt()

fun Int.toSitePage() = (this / defaultPageSize + 1) * defaultPageSize

fun userPageUrl(user: String, page: Int) = "https://www.last.fm/user/$user/library?page=$page"

0 comments on commit 7783260

Please sign in to comment.