Skip to content

Camoen/CattleLog-File-Processor-Public

Repository files navigation

CattleLog File Processor CattleLog - Main Screen


This repository holds the complete code for the CattleLog File Processor, as well as four sample input files. Note that all credentials and private cattle records have been redacted. The primary goal of CattleLog File Processor (CFP) is to provide the VanWagner Family Farm's record-keeper with an efficient, user-friendly method of updating and maintaining CattleLog's backend database. CFP achieves this goal by accepting four input files, automatically performing any necessary data pre-processing, creating or updating an SQLite database, and uploading the resulting database to DropBox, where it can be accessed by CattleLog (the mobile application).

Table of Contents

CattleLog File Processor Video Demonstration

A video demonstration of the CattleLog File Processor is available on YouTube: https://youtu.be/zOrB2FUL0ys

CattleLog (Mobile Application)

Beyond the development of this Python executable, an Android app, named CattleLog, was created to display data from the resulting database. The primary goals of CattleLog are to (1) provide employees of the VanWagner Family Farm with an efficient method of cattle record lookup via an intuitive user interface and (2) display requested cattle records in an aesthetically pleasing, organized manner. Its source code, readme, and a video demonstration of its functionality can be found at https://github.com/Camoen/CattleLog.

CattleLog Project Report

A report detailing the major features and functionality of the entire CattleLog project (both the CattleLog File Processor and CattleLog mobile application) can be found at https://github.com/Camoen/CattleLog-File-Processor-Public/files/3903171/CattleLog.Project.Report.pdf.

CattleLog File Processor Features

Input File Validation

If the user selects the same input file twice, fails to select an input file, or enters an invalid filepath, the program ceases execution and prompts the user to try again. If CFP discovers that one of the input files is in an invalid format during data preprocessing, the program stops execution before any attempt is made to update the CattleLog database. It then prompts the user to try running the program again (with valid files).

Backup Functionality

CattleLog File Processor accepts four input files, three of which are automatically generated .csv files from a herd management software called PCDART. The last file, Treatments.xls file, is manually generated by the VanWagner family farm's record-keeper. Because Treatments.xls is manually maintained, CFP creates a Backup directory (if one doesn't exist already) and stores a copy of the file during execution.

CFP also stores a backup of the CattleLog Database every time an update is applied. The real (non-toy) database file is approximately 1 MB in size, and an update is attempted once per week, on average, so space is not a concern.

Data Preprocessing

Treatments File Birthdate Association

The manually updated "Treatments.xls" file references cattle by tag number only, not by the combination of both tag number and birth date, as mandated by the CattleLog Database Schema. To prevent creating additional work for the record-keeper, birthdates are associated with these tag numbers automatically. First, tag number and birth date combinations are recorded from the one of the PCDART input files. These combinations are stored in a Python dictionary, with each tag number serving as a key for its associated birth date value. Then, as data is stripped from the "Treatments.xls" file, the dictionary is used to link each tag number in the workbook with its associated birth date—this linkage is necessary to ensure that the stripped data can, eventually, be inserted into the treatment table of the CattleLog database.

Error Checking

Of course, since the "Treatments.xls" file is manually updated, implementing the appropriate logic to check for errors was important. If a duplicate tag number (a tag number that was already associated with a birthdate during CFP’s current execution) or invalid tag number (a tag number that has no associated birth date in the generated dictionary) is found, CFP stops executing, outputs a meaningful error message, and prompts the user to retry once the error is fixed. For example, in the case of a duplicate tag number, the displayed error message reads, "TagNumber is a duplicate number. Add birthdate manually."

Creation of DataFrames

A majority of the data preprocessing work consists of stripping data from the four input files and merging them into pandas DataFrames that closely align with the CattleLog Database Schema. When all four dataframes are created, they are utilized to create or update the CattleLog Database.

Note that all dates are stored in the "datetime" format, but, when the SQLite CattleLog Database is updated, the dates must be converted to strings.

Cattle DataFrame

The Cattle DataFrame includes all data from two of the .csv files generated by PCDART (Report 1.csv and Report 2.csv), as well as any info in the "sold sheet" of Treatments.xls (this sheet contains manually maintained data about animals that have left the herd). After data preprocessing is complete, this DataFrame contains all 44 attributes required by the cattle table of the CattleLog Database.

Health DataFrame

The Health DataFrame contains data from one of the .csv files generated by PCDART (Report 3.csv). Due to the unique (read: strange) way in which this file is formatted, every time a new TagNumber and BirthDate combination is discovered as the file is processed, these two attributes need to be copied into each subsequent row (until the next combination is discovered).

Treatments DataFrame

The Treatments DataFrame is fairly straightforward to generate, since all tag numbers have already been associated with birthdates as discussed in the Treatments File Birthdate Association section. Data is stripped from the first sheet of the manually generated Treatments.xls file, and a unique RecordNumber is associated with row. This RecordNumber attribute becomes part of the table's key, as shown in the CattleLog Database Schema.

User-Defined Field DataFrame

PCDART allows users to output 10 user-defined fields, which show up as "UserDef1" through "UserDef10" in the CattleLog Database's cattle table. Since these fields can change in purpose, users of the CattleLog mobile application need to be provided with a label and description for each of these fields. To allow the record-keeper to update these labels and descriptions, a third sheet was added to the Treatments.xls file. The third sheet simply contains 10 labels and 10 associated descriptions, which are used to create the userdef DataFrame. This DataFrame is used to update the userFields table in the CattleLog Database, which can then be referenced in the CattleLog mobile app to provide labels and descriptions for user-defined fields. The goal here was flexibility, since these ten fields can change in purpose over time. Screenshots of the user-defined fields, labels, and descriptions can be found in the Overview Tab section of the CattleLog mobile app README.

Date Modifications

Out of all the details provided by PCDART’s output .csv files, the single most useful attribute in day-to-day work on the farm is probably "NextExpHeat", which is short for "Next Expected Heat Date". A cow is most fertile while in heat, so it’s in the farm’s best interests to breed an animal during this period of increased fertility. A cow exhibits many tell-tale signs when in heat, and having an animal’s next expected heat date readily available allows a farmhand to verify whether or not an animal that’s exhibiting signs is actually in heat.

Unfortunately, PCDART’s output files only include the month and day, but not the year, of the next expected heat date. Furthermore, all of the NextExpHeat dates output by PCDART occur either on the present day or on a future date (never on a date prior to the day on which the output file was generated). For ease of querying within the CattleLog mobile app, CFP appends the current year to each of these dates during preprocessing. If the resulting date is prior to the current day, but within the past 4 weeks, the NextExpHeat date is adjusted by the addition of one or two heat cycles (21 days or 42 days, whichever is required). If, instead, the resulting date is prior to the current day, but not within the past 4 weeks, the NextExpHeat date is adjusted by the addition of one year. This logic assumes that CFP will be utilized within 4 weeks of generating the input files on PCDART (standard procedure is to generate these files on a weekly basis—three additional weeks were added as a user-friendly buffer).

The implementation logic is better illustrated by an example, as shown in the table below. Further implications of the addition of years to the "NextExpHeat" attribute are discussed in section 2.2.2 of the CattleLog Project Report, or in the Heats List section of CattleLog mobile application README.

NextExpHeat date adjustment logic.  Final result must be current date or a future date.
NextExpHeat date adjustment logic. Final result must be current date or a future date.

CattleLog Database Modification

CattleLog Database Schema

A full description of CattleLog's database design is provided in section 2.1.1 of the CattleLog Project Report, but the E-R diagram is included here.

E-R Diagram
CattleLog Database E-R Diagram

42 additional attributes in Cattle Entity
Attributes in the Cattle Entity (from Appx. A)

CattleLog Database Creation

If no local cattlelog_database.db file already exists, a new database file is created (if such a file already exists, a backup of the file is stored, and the program skips to the update functionality). The complete code for the database's initial creation is located at https://github.com/Camoen/CattleLog-File-Processor-Public/blob/master/create_database.py.

As shown in the E-R diagram above, the health (HealthRecords) and treatment (Treatments) table are reliant on TagNumber and BirthDate from the cattle table as foreign keys. The ON DELETE CASCADE clause is included to ensure that health records and treatments are deleted if their associated cattle entity is deleted.

CattleLog Database Updates

Once a database file has been located or created, CFP uses the generated DataFrames to update the database file. The complete code for the database update functionality is located at https://github.com/Camoen/CattleLog-File-Processor-Public/blob/master/update_database.py.

A majority of the update logic consists of INSERT OR REPLACE queries for the cattle, health, treatment, and userFields tables.

Mistagged Animal Functionality

Since the database is completely abstracted away from the record-keeper, it became necessary to add a way of "resetting" (completely removing) erroneous records for a given animal. In the "Treatments.xls" input file, the second spreadsheet includes data about animals that have left the farm. There are four fields: Index # (TagNumber), Brthdate (BirthDate), Date (DateLeft), and Reason. The latter of these two fields are used to indicate the date an animal left the herd and the reason for leaving. If the "Reason" field is set to the string "Mistagged", all records for the associated animal (as indicated by TagNumber and BirthDate) are completely removed from the database. This functionality was first added when a newborn calf was given an incorrect TagNumber, hence the use of "Mistagged" as the keyword.

Removal of Unnecessary Records

Some of the cattle that are raised on the VanWagner Family Farm belong to another farm. Each of these animals has a BarnName attribute that starts with the letter R. These records (and only these records) can be removed from the database when the specific animals have left the farm. Other animal records should always be maintained, even if the animal has left the farm.

When any animal leaves the farm, the record-keeper sets their status to "Left Herd" in PCDART, so the output .csv files no longer include data about these animals. Furthermore, the record-keeper removes this animal from the "Treatments.xls" file. Therefore, no data about an animal that has left the farm will appear in the generated dataframes, which makes it relatively simple to programmatically identify "missing" animals (animals that have left). Remember, however, that most animal records must be maintained in perpetuity; only animals with a BarnName that begins with an R should be purged from the database.

To implement this functionality, during each database update, the program creates a table called keep_matches to record all animals that still exist in the four input files. Cattle records are purged from the database if (1) the animal does not appear in the keep_matches table AND (2) the animal has a BarnName starting with R. This leaves all other cattle records intact, regardless of whether or not they have left the farm. The following bit of SQL implements this logic:

DELETE FROM cattle 
WHERE NOT EXISTS
     (SELECT * FROM keep_matches km
     WHERE (cattle.TagNumber =  km.TagNumber AND cattle.BirthDate =  km.BirthDate))
AND cattle.BarnName LIKE 'R%'

CattleLog Database Upload

After the cattlelog_database.db file has been updated, it's automatically uploaded to DropBox, where the CattleLog mobile app can access it via a static URL. Upload functionality is contained in this file: https://github.com/Camoen/CattleLog-File-Processor-Public/blob/master/upload_database.py

Releases

No releases published

Packages

No packages published

Languages