Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

US NTSB Crash NL embeddings #4432

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

hareesh-ms
Copy link

@@ -2370,6 +2370,10 @@ Count_Person_InLaborForce_ResidesInCollegeOrUniversityStudentHousing,Number of L
Count_Person_InLaborForce_ResidesInGroupQuarters,Number of Labor Force Participants reside in Group Quarters
Count_Person_InLaborForce_ResidesInNoninstitutionalizedGroupQuarters,Number of Labor Force Participants reside in Noninstitutionalized Group Quarters
Count_Person_IsInternetUser_PerCapita,percentage of internet users
Count_Person_InvolvedInCrash_Motorists,Number of motorists involved in crash
Count_Person_InvolvedInCrash_MotorVehicleOccupant,Number of motor vehicle occupants involved in crash
Count_Person_InvolvedInCrash_NonMotorists,Number of non-motorists involved in crash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now we don't add "non xxx" and "other xxx" variables to the index since they are vague when matching query (or even match the opposite meaning)

@@ -2760,6 +2764,10 @@ Count_ThunderstormWindEvent,Number of thunderstorm wind events
Count_TornadoEvent,Number of tornado events
Count_TropicalStormEvent,Number of tropical storm events
Count_UnemploymentInsuranceClaim_StateUnemploymentInsurance,Number of state unemployment insurance claims
Count_Vehicle_InvolvedInCrash_InTransport,Number of vehicles crashed in transport
Count_VehicleCrashIncident_NationalHighway,Number of crash incidents occurred on the national highway
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"number of vehicle crash incidents..." same as below

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its counting the vehicles in-transport, shall we add sentence as "Number of vehicles in-transport involved in crash incidents"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i did not meant to include line 2767, that line looks good

Count_Vehicle_InvolvedInCrash_InTransport,Number of vehicles crashed in transport
Count_VehicleCrashIncident_NationalHighway,Number of crash incidents occurred on the national highway
Count_VehicleCrashIncident_StateHighway,Number of crash incidents on state highway
Count_VehicleCrashIncident_USHighway,Number of crash incidents on US highway
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

US highway is a bit vague in the meaning and maybe very hard to match actual query. Looking at these 3 stat vars, I wonder if there are aggregate stat var?

For example, the most likely query is "how many highway car crashes happened in 2020", which stat vars do we want to match here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw dataset we processed is at a crash granularity which we aggregated. Looking at the data the USHighway or StateHighway may or may not be part of the National Highway Network. The US highway seems to be an older system as per https://en.wikipedia.org/wiki/Numbered_highways_in_the_United_States. Shall we remove the USHighway from the index and retain both National highway and State Highway. Is the question "how many highway car crashes happened in 2020" returning both NationalHighway, StateHighway numbers a desirable behavior ?

@shifucun shifucun requested a review from pradh July 2, 2024 15:05
@pradh
Copy link
Contributor

pradh commented Jul 2, 2024

How about we add to a new CSV file? Like say based on date: 2024_q3.csv?

@hareesh-ms
Copy link
Author

Added the NL sentences to a new csv file and modified the NL sentences.

@@ -4,37 +4,6 @@
"categories": [
{
"blocks": [
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should not be deletion.

  1. If you sync the branch, make sure to rebuild the embeddings.
  2. can re-run run_test.sh -g see if this gets fixed.

Count_Person_InvolvedInCrash_Motorists,Number of motorists involved in crash
Count_Person_InvolvedInCrash_MotorVehicleOccupant,Number of motor vehicle occupants involved in crash
Count_Vehicle_InvolvedInCrash_InTransport,Number of vehicles in motion involved in crash
Count_VehicleCrashIncident_CollisionCrash_HeadOnCollision,Number of headon vehicle collisions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head-on

dcid,sentence
Count_Person_InvolvedInCrash_Motorists,Number of motorists involved in crash
Count_Person_InvolvedInCrash_MotorVehicleOccupant,Number of motor vehicle occupants involved in crash
Count_Vehicle_InvolvedInCrash_InTransport,Number of vehicles in motion involved in crash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so "in transport" does not mean being transported from place A to B.

How about just say "moving vehicles"?

Count_Person_InvolvedInCrash_Motorists,Number of motorists involved in crash
Count_Person_InvolvedInCrash_MotorVehicleOccupant,Number of motor vehicle occupants involved in crash
Count_Vehicle_InvolvedInCrash_InTransport,Number of vehicles in motion involved in crash
Count_VehicleCrashIncident_CollisionCrash_HeadOnCollision,Number of headon vehicle collisions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a side question, is there distinction between "collision" and "crash" in schema? And shall we unify / distinct them in description?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per definition crash seems to be more severe with damage or fatalities, hence will stick with crash for all.

Copy link
Contributor

@pradh pradh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hareesh-ms - sorry for the delay. Is it ready for a review? I had some of the same comments as Bo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants