Skip to content

Commit

Permalink
docs(blog): wow analysis blog post (#8441)
Browse files Browse the repository at this point in the history
## Description of changes

Adding a new (hopefully fun) blog post analyzing World of Warcraft data
to determine who was playing the most, who leveled the fastest, and what
activities they did during leveling.

Original draft on Medium, but hoping we can do this here!
  • Loading branch information
IndexSeek authored Feb 29, 2024
1 parent 7b88a2d commit 6bd6c26
Show file tree
Hide file tree
Showing 2 changed files with 215 additions and 0 deletions.
215 changes: 215 additions & 0 deletions docs/posts/wow-analysis/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
---
title: "Analysis of World of Warcraft Data"
author: "Tyler White"
error: false
date: "2024-02-29"
image: thumbnail.png
categories:
- blog
- data engineering
- duckdb
---

## Introduction

I grew up playing games, and with the recent re-release of World of Warcraft Classic,
it seems like a perfect time to analyze some in-game data!

This dataset is the product of a Horde player's diligent recording throughout 2008,
capturing the transitional phase between the Burning Crusade and Wrath of the Lich King
expansions. Notably, starting November 13, 2008, the data showcases numerous characters
venturing into new territories and advancing beyond the former level cap of 70.

## Analysis

We'll determine who logged in the most, who leveled from 70 to 80 the fastest, and
what activities these players engaged with based on zones. Let's get to work.

### Getting Started

Ibis ships with an `examples` module, which includes this specific data. We'll use DuckDB
here, but this is possible with other backends, and we encourage you to experiment.
DuckDB is the default Ibis backend, so it'll be easy to use with this example.

You can execute `pip install ibis-framework[duckdb,examples]` to work with Ibis and the
example data.

```{python}
from ibis.interactive import *
wowah_data = ex.wowah_data_raw.fetch()
wowah_data
```

### Getting Table Info

Let's learn more about these fields. Are there any nulls we should consider? We can
use the [`info`](../../reference/expression-tables.qmd#ibis.expr.types.relations.Table.info)
method on our Ibis expression.

```{python}
wowah_data.info()
```

We can also use [`value_counts`](../../reference/expression-generic.qmd#ibis.expr.types.generic.Column.value_counts)
on specific columns if we want to learn more.

```{python}
wowah_data.race.value_counts()
```

We don't have any missing values, and the data `value_counts` results match what I would
expect.

How about duplicates? We can check the count of unique rows against the total count.

```{python}
print(wowah_data.count())
print(wowah_data.nunique())
print(wowah_data.count() == wowah_data.nunique())
```

So we have some duplicates. What could the duplicate rows be?

We can find them like this.

```{python}
wowah_duplicates = wowah_data.mutate(
row_num=ibis.row_number().over(
ibis.window(group_by=wowah_data.columns, order_by=_.timestamp)
)
).filter(_.row_num > 0)
wowah_duplicates
```

I suspect this data was captured by a single player spamming “/who” in the game, most
likely using an AddOn, about every ten minutes. Some players could
have been captured twice, depending on how the command was being filtered.

We can go ahead and remove these duplicates.

```{python}
wowah_data = wowah_data.distinct()
```

### Which player logged in the most?

We mentioned that there was a single player likely capturing these results.
Let's find out who that is.

```{python}
(
wowah_data
.group_by([_.char, _.race, _.charclass])
.agg(sessions=_.count())
.order_by(_.sessions.desc())
)
```

That Troll Hunter that never exceeded level 1 is likely our person with 42,770 sessions.

### Who leveled the fastest from 70–80?

At the end of the year, there were 884 level 80s. Who leveled the fastest?

Finding this answer will involve filtering, grouping, and aggregating to compute
each character's time taken to level from 70 to 80.

Let's start by creating an expression to filter to only the level 80 characters, then
join it to filter and identify only where they were level 70 or 80. We're only concerned
with three columns so that we will select only those.

```{python}
max_level_chars = wowah_data.filter(_.level == 80).select(_.char).distinct()
wowah_data_filtered = (
wowah_data
.join(max_level_chars, "char", how="inner")
.filter(_.level.isin([70, 80]))
.select(_.char, _.level, _.timestamp)
)
wowah_data_filtered
```

Let's use the `where` option to help with the aggregation.

```{python}
level_calc = (
wowah_data_filtered.group_by(["char"])
.mutate(
ts_70=_.timestamp.max(where=_.level == 70),
ts_80=_.timestamp.min(where=_.level == 80),
)
.drop(["level", "timestamp"])
.distinct()
.mutate(days_from_70_to_80=(_.ts_80.delta(_.ts_70, "day")))
.order_by(_.days_from_70_to_80)
)
```

The data is filtered and grouped by character, and two new columns are created to
represent timestamps for levels 70 and 80. Then we drop what we no longer need, get the
distinct values, and calculate the time taken to level from 70 to 80. Then we sort it!

```{python}
level_calc
```

This isn't perfect, as I found a case where there was a player who seemed to have quit
in March and then returned for the new expansion. They hit 71 before it looks like their
login at 70 was captured later. If you're curious, take a look at **char=21951** for yourself.

### How did they level?

Let's grab all the details from the previous result and join it back to get the
timestamp and zone data.

```{python}
leveler_zones = (
level_calc.join(wowah_data, "char", how="inner")
.filter(_.timestamp.between(_.ts_70, _.ts_80))
.group_by([_.char, _.zone])
.agg(zone_count=_.zone.count())
)
leveler_zones
```

This code summarizes how often those characters appear in different zones while leveling
up from level 70 to 80. It combines two sets of data based on character names, selects
records within the leveling timeframe, groups data by character and zone, and counts the
number of times each character was found in each zone.

There is another example table we can join to figure out the Zone information. I'm only
interested in two columns, so I'll filter this further and rename the columns.

```{python}
zones = ex.wowah_zones_raw.fetch()
zones = zones.select(zone=_.Zone, zone_type=_.Type)
zones
```

Making use of `pivot_wider` and joining back to our `leveler_zones` expression will make
this a breeze!

```{python}
zones_pivot = (
leveler_zones.join(zones, "zone")
.group_by([_.char, _.zone_type])
.agg(zone_type_count=_.zone.count())
.pivot_wider(names_from="zone_type", values_from="zone_type_count")
)
zones_pivot
```

If they have a high value in the "Zone" column, they were likely questing. Other
players opted to venture into dungeons.

## Next Steps

It's pretty easy to do complex analysis with Ibis. We churned through over 10 million rows
in no time.

Get in touch with us on [GitHub](https://github.com/ibis-project) or
[Zulip](https://ibis-project.zulipchat.com/), we'd love to see more analyses of this
data set.
Binary file added docs/posts/wow-analysis/thumbnail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6bd6c26

Please sign in to comment.