Skip to content

Commit

Permalink
🛑 Stop using MyPurdue pages that require auth (#57)
Browse files Browse the repository at this point in the history
Recent changes to Purdue's authentication processes have made scraping
MyPurdue pages that require authorization not feasible.

This change updates the scraping process to avoid these pages and resort
to workarounds (such as manually defined data mapping tables) or
omitting data entirely (such as enrollment information).

See issues #54, #55, #56 for more information on the changes to
available data.
  • Loading branch information
haydenmc committed Nov 12, 2023
1 parent 28a77d1 commit 11dcdb7
Show file tree
Hide file tree
Showing 6 changed files with 354 additions and 412 deletions.
13 changes: 5 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ there through the query tester at [http://api.purdue.io/](api.purdue.io/).

## Tools

Purdue.io is written in C# on .NET 5. It will run natively on most major
Purdue.io is written in C# on .NET 8. It will run natively on most major
architectures and operating systems (Windows, Linux, Mac OS).

Entity Framework is used to communicate with an underlying database provider. Currently,
Expand All @@ -71,10 +71,7 @@ To start developing locally, install the .NET SDK.
CatalogSync is the process used to pull course data from MyPurdue and synchronize it to a
relational database store.

In order to access detailed course section information, CatalogSync requires a valid
MyPurdue username and password.

CatalogSync also accepts options to configure which database provider and connection it uses.
CatalogSync accepts options to configure which database provider and connection it uses.

Additional flags are available to configure CatalogSync behavior.
Use the `--help` flag for more information.
Expand All @@ -83,10 +80,10 @@ Use the `--help` flag for more information.
cd src/CatalogSync

# To sync to default SQLite file purdueio.sqlite
dotnet run -- -u USERNAME -p PASSWORD
dotnet run

# To sync to a specific SQLite file
dotnet run -- -u USERNAME -p PASSWORD -d Sqlite -c "Data Source=path/to/file.sqlite"
dotnet run -- -d Sqlite -c "Data Source=path/to/file.sqlite"
```

CatalogSync will begin synchronizing course catalog data to `purdueio.sqlite`.
Expand All @@ -96,7 +93,7 @@ and connection string:

```sh
# To sync to a local PostgreSQL instance:
dotnet run -- -u USERNAME -p PASSWORD -d Npgsql -c "Host=localhost;Database=purdueio;Username=purdueio;Password=purdueio"
dotnet run -- -d Npgsql -c "Host=localhost;Database=purdueio;Username=purdueio;Password=purdueio"
```

## API
Expand Down
21 changes: 1 addition & 20 deletions src/CatalogSync/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,6 @@ public enum DataProvider

public class Options
{
[Option(shortName: 'u', longName: "user", HelpText = "MyPurdue User Name")]
public string MyPurdueUser { get; set; }

[Option(shortName: 'p', longName: "pass", HelpText = "MyPurdue Password")]
public string MyPurduePass { get; set; }

[Option(shortName: 'd', longName: "data-provider", Default = DataProvider.Sqlite,
HelpText = "The database provider to use")]
public DataProvider DataProvider { get; set; }
Expand Down Expand Up @@ -58,19 +52,6 @@ static async Task Main(string[] args)

static async Task RunASync(Options options)
{
string username = options.MyPurdueUser ??
Environment.GetEnvironmentVariable("MY_PURDUE_USERNAME");
string password = options.MyPurduePass ??
Environment.GetEnvironmentVariable("MY_PURDUE_PASSWORD");

if ((username == null) || (password == null))
{
Console.Error.WriteLine("You must provide a MyPurdue username and password " +
"to sync course data. Use command line options or environment variables " +
"MY_PURDUE_USERNAME and MY_PURDUE_PASSWORD.");
return;
}

var loggerFactory = LoggerFactory.Create(b =>
b.AddSimpleConsole(c => c.TimestampFormat = "hh:mm:ss.fff "));

Expand All @@ -85,7 +66,7 @@ static async Task RunASync(Options options)

var behavior = options.SyncAllTerms ?
TermSyncBehavior.SyncAllTerms : TermSyncBehavior.SyncNewAndCurrentTerms;
var connection = await MyPurdueConnection.CreateAndConnectAsync(username, password,
var connection = new MyPurdueConnection(
loggerFactory.CreateLogger<MyPurdueConnection>());
var scraper = new MyPurdueScraper(connection,
loggerFactory.CreateLogger<MyPurdueScraper>());
Expand Down
4 changes: 0 additions & 4 deletions src/Scraper/Connections/IMyPurdueConnection.cs
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,5 @@ public interface IMyPurdueConnection
// Retrieves the contents of bwckschd.p_get_crse_unsec from MyPurdue for the given term
// and subject
Task<string> GetSectionListPageAsync(string termCode, string subjectCode);

// Retrieves the contents of bwskfcls.P_GetCrse_Advanced from MyPurdue for the given term
// and subject
Task<string> GetSectionDetailsPageAsync(string termCode, string subjectCode);
}
}
Loading

0 comments on commit 11dcdb7

Please sign in to comment.