Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HostID Truncation can cause collisions #2015

Open
mathewc opened this issue Oct 11, 2017 · 54 comments
Open

HostID Truncation can cause collisions #2015

mathewc opened this issue Oct 11, 2017 · 54 comments
Assignees
Milestone

Comments

@mathewc
Copy link
Member

mathewc commented Oct 11, 2017

Currently when generating a default host ID we use the host name (slot host name) and we truncate to 32 characters max (code here). This ensures that the generated ID conforms to the core SDK length restrictions (code here).

This truncation can of course open the possibility for naming collisions, particularly in the case of slots. In slot scenarios, if the site name is over 32 characters long and a slot is created that starts with the same 32 characters and is only disambiguated in later characters, both the production and the slot site will be using the same host ID. This can lead to issues. For example, TimerTrigger uses the host ID as a component of the blob lease path. In this case the timer function will only be able to run in one of the sites because they're competing for the same lock. Similarly, customers often have apps deployed to different regions using the same long naming path, varying only in later name components (e.g. region/environment), and can run into this.

More information on can be found in Host ID Collisions.

@paulbatum paulbatum added this to the Triaged milestone Oct 25, 2017
@paulbatum
Copy link
Member

This is a dangerous change to make for v1 but we should fix it in v2. This would require some updates to the scale controller as I believe it has this logic duplicated.

@paulbatum paulbatum modified the milestones: Triaged, Next Nov 1, 2017
@paulbatum
Copy link
Member

I've confirmed the scale controller will need an update, but it should be straightforward once we know what the new logic for generating the host ID is.

@paulbatum
Copy link
Member

We had to punt this due to higher priority issues and the need to coordinate the change across multiple components. Revisit again in V3.

@mathewc
Copy link
Member Author

mathewc commented Feb 2, 2019

As a workaround for customers running into this issue, in Functions v2, you can set an explicit HostID in app settings, using a different ID for each environment. The app setting name to use is AzureFunctionsWebHost:hostId. For Functions v1 you can specify the ID in host.json via the hostId property. The host ID value should be unique for all apps/slots you're running. The important thing is that the IDs are under 32 characters. The restrictions for HostIds that the value must satisfy are here. Another way to generate an ID would be to take a GUID, remove the dashes and make it lower case, e.g. 1835D7B5-5C98-4790-815D-072CC94C6F71 => 1835d7b55c984790815d072cc94c6f71

@ankitkumarr
Copy link
Contributor

Had a customer issue last week, where this truncation caused collision between multiple function apps sharing the same storage account.

@fabiocav, is this already planned for V3?

@fabiocav
Copy link
Member

Deferring this work as it would have scale controller dependencies and does not align with the timing.

@fabiocav fabiocav removed the 3.x label Nov 21, 2019
@mikeurnun
Copy link

@paulbatum @fabiocav Per #1904 regarding the same detail on architecture center doc, we communicate that the Function App name length to be 1-60 - could you advise what the appropriate length that we should update it to?

@paulbatum
Copy link
Member

@mike-urnun-msft good catch. staying in the range of 1-32 will avoid this bug.

@dgard1981
Copy link

dgard1981 commented Jun 26, 2020

Though there is no specific mention in this issue, I assume that the solution of explicitly setting the host ID via the 'AzureFunctionsWebHost:hostId' App Setting is still valid for v3 Function Apps?

If so, does the host ID have to remain static, or can it change after every deployment? I ask because our CD pipeline updates App Settings through an ARM template, so if the host ID has to remain static we'd first need to query the App Settings to get the current value of 'AzureFunctionsWebHost:hostId' for the slot we are deploying to so that it can be set back with the same value.

Also, does this only apply to a Function App with a slot, or does the value of 'AzureFunctionsWebHost:hostId' have to be unique across Function Apps? For example, if you have these two Function Apps with no slots...

  • my-really-really-long-functionapp-uksouth
  • my-really-really-long-functionapp-ukwest

...could both have the same truncated value of 'my-really-really-long-functionap'?

@tyler555g
Copy link

Why is this not mentioned in the docs or warned upon creation? This has been known since 2017, but no obvious mention of it anywhere that I can find. ( I very well could be blind, so if so please correct me. )

@davidmrdavid
Copy link
Contributor

+1 I just recently got hit with a case that appears to be, at least partially, caused by this limitation.

@thibautbrard
Copy link

Running a v4 function, we're still facing the Function App Name Collision Found error when running function configuration diagnostic even though hostid is setup at slot level (linux consumption plan) using a random lowercase guid without dashes. Should we just ignore this error message? Should I setup FUNCTIONS_HOSTID_CHECK_LEVEL to Warning level?
It had also already been asked if we must keep a static hostId at each deployment or if we can generate a new one but I didn't see any answer to this question.
Thanks in advance for the clarifications.

@evandcombs
Copy link

@AlphaWong is correct - the restriction still exists, however in v4 we added detection and prevent the host from starting up in this state as described here.

This error message appeared for me, even though there was no conflict. It seems this gets triggered if there is any truncation. It does not seem to be preventing my apps from running, though. I think it may be more appropriate for this to appear as a warning when there is truncation, but no conflict detected.

On a related note, why is the limit only 32 characters? That seems rather short when basing things on names assigned by humans. I guess this is really all a result of the fatal flaw in Azure where the name of a resource doubles as the ID of the resource. This decision has created a lot of inconveniences within Azure.

@SerlokPK
Copy link

Running a v4 function, we're still facing the Function App Name Collision Found error when running function configuration diagnostic even though hostid is setup

Same problem here. Is that expected behavior, should we ignore message or some other action is required?

@progmars
Copy link

Should it be hostId or hostid also is accepted?

@thibautbrard
Copy link

Should it be hostId or hostid also is accepted?

According to the App settings reference for Azure Functions documentation, you can find hostid written in lower case. I highly recommend to follow that casing to prevent any ambiguity.

I still face the Function App Name Collision Found error message, but my functions are running well with only the following simple warning at startup: Host id explicitly set in configuration. This is not a recommended configuration and may lead to unexpected behavior. (Host.Startup category)

@progmars
Copy link

@thibautbrard

This is so confusing. the behavior suggests it should be hostId to make the warning go away.

Here's what I did:

  • open Configuration of my Azure function
  • click Advanced Edit
  • make sure the name of the setting is AzureFunctionsWebHost__hostid (change it if it has upper-case I)
  • save the configuration
  • I have Function App Diagnostics opened in another browser tab, so I refresh it
  • the error about 32 character name is present!
  • switch back to the Configuration tab in the browser and rename the setting AzureFunctionsWebHost__hostId
  • refresh the Diagnostics page
  • the 32-char error is gone!

I repeated it a few times, and it always works this way, AzureFunctionsWebHost__hostId is recognized by Diagnostics, but AzureFunctionsWebHost__hostid is not.

I'm not sure if it's only Diagnostics that's using hostId with capital I or is it Azure itself?
It would be even more confusing if Azure infrastructure is using hostid (as written in the article), but Diagnostics is checking for hostId.

Who could tell what's actually going on and what should be the name of the setting to make it both work properly and also be recognized by Diagnostics?

@thibautbrard
Copy link

@progmars that's a really interesting information that you've provided!
From my observation, hostid (lower-case) is well interpreted as it triggers the warning message at startup but we still get the error on the Diagnose and solve problems component. We assumed it was only doing a static check of the function name length and nothing else. I wish we could could get rid of this message...

I've just done a test with hostId and it actually remove the error message on diag as you said. I still get the warning Host id explicitly set in configuration at function startup, but it does not mean that it works (and unfortunately I can't test it right now)
All of this is really confusing...

@mathewc can you confirm that both hostid and hostId are working?
Would it be possible to clarify the casing in the documentation here and here as only hostId prevent displaying the error message on diagnostics?

Many thanks

@jassent
Copy link

jassent commented Sep 18, 2022

Having a conflicting hostId due to the name being truncated also caused me this error:
Azure/azure-functions-dotnet-worker#747

2022-09-18T10:24:19.474 [Information] Host initialized (208ms)
2022-09-18T10:24:19.481 [Information] Host started (219ms)
2022-09-18T10:24:19.481 [Information] Job host started
2022-09-18T10:24:19.594 [Information] HttpOptions{"DynamicThrottlesEnabled": false,"EnableChunkedRequestBinding": false,"MaxConcurrentRequests": -1,"MaxOutstandingRequests": -1,"RoutePrefix": "api"}
2022-09-18T10:24:19.810 [Information] Stopping JobHost
2022-09-18T10:24:19.812 [Information] Job host stopped
2022-09-18T10:24:19.844 [Error] Failed to start a new language worker for runtime: dotnet-isolated.System.Threading.Tasks.TaskCanceledException : A task was canceled.
at async Microsoft.Azure.WebJobs.Script.Grpc.GrpcWorkerChannel.StartWorkerProcessAsync(CancellationToken cancellationToken)
at /_/src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs : 159
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) 
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 154
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) 
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 146
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) 
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 137
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.<>c__DisplayClass56_0.<StartWorkerProcesses>b__0(??) 
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 229

Setting a unique (and shorter than 32 characters) hostid allowed the application to start.

This is frustrating because, as noted above, the default app names generated by Visual Studio means that apps can't be successfully deployed to any slot without making this change.

@scp-mb
Copy link

scp-mb commented Nov 28, 2022

Pretty much the same as above. Seeing the job host started then immediately stopped again, leading to TaskCanceledException in our functions.

Finding the root cause was an absolute pain, because there is supposed to be an error logged about host id collisions, but there never was anything logged about it. It just shuts down and starts up again immediately in a loop, with short lived functions managing to complete a small amount of work before the host is killed again.

@scp-mb
Copy link

scp-mb commented Nov 29, 2022

I'll also point out that if the function host is going to be shut down it should be done before functions start to run, instead of cancelling them mid-run

@moneygit
Copy link

I also need clarification on the case for host ID.
AzureFunctionsWebHost__hostid
or
AzureFunctionsWebHost__hostId

Documentation said all lower case, but then the App services diagnostic settings alert it as a risk, changing it to hostId the alert clears.

@asalvo
Copy link

asalvo commented Dec 29, 2022

We are deploying a Python Application using a custom Docker Container to Azure Functions. I can confirm with absolute certainty, that AzureFunctionsWebHost__hostId (uppercase I) caused our function to fail with logs similar to this comment (except mentioning python).

As soon as we changed to AzureFunctionsWebHost__hostid (lower case i) our function app started working.

With either hostId or hostid, we get the log message "Host id explicitly set in configuration. This is not a recommended configuration and may lead to unexpected behavior." as expected. However, with hostId, we never got a error logged that there was a hostId collision. However, a few log lines later, it says Starting Host (HostId=company1234-trm-prod01-un1-fa-ft (rest of log line ommited) which is a truncated name.

With hostId we do NOT get the diagnostics error. With hostid we do get the diagnostics error. This leads me to believe that the logic for the diagnostics error is incorrect (case insensitive when it needs to be case sensitive).

@graemefoster
Copy link

Same issue with not being able to find the error. As a customer all I see in my logs is the function host restarting. Can we surface this error into the standard host trace output.

@miqm
Copy link

miqm commented Feb 20, 2023

Same here, no error in logs, our readiness probe pinging admin/host/ping just stopped responding after ~10 minutes and that caused AKS to restart the pod. An error, on a critical level, would be nice.

@graemefoster
Copy link

I'll also point out that if the function host is going to be shut down it should be done before functions start to run, instead of cancelling them mid-run

100% agree - I seem to be able to get the function runtime in a real panic because of this coupled with a timer trigger. I understand that fixing the host Id is the right thing to do, but can we make this check happen before functions kick in.

@paulyuk
Copy link
Member

paulyuk commented Apr 18, 2023

@fabiocav could we please bump this one up the list? @eamonoreilly fyi

@slampunk
Copy link

spent an entire week trying to track down the root cause for this problem. Very disappointed that nothing pops up in the logs about collisions. This is a known 6 year old issue, I expect more even if it's simply a log of potential reasons for the cause.

@ddaniels-andmore
Copy link

This is a PITA and it's been 6 years now since the issue was identified. Any update on the ETA / Roadmap?

@akirayamamoto
Copy link

We just got hit by this issue. Do you have any updates?

@madmahii24
Copy link

madmahii24 commented Jan 16, 2024

I have started facing this issue suddenly since yesterday. We have V4 function apps deployed on windows elastic plan.

I have function apps having name of length 43 characters & having ambiguity of name in the trail

like abc-defg-abcdefghijklmnopqrstuv-v1-eus & abc-defg-abcdefghijklmnopqrstuv-v2-eus

I was already having below app setting added to Function App configuration:
"AzureFunctionsWebHost__hostId" with GUID generated host ID.
The same configuration was working since last 6 months & since yesterday it has started failing (function apps stops & starts intermittently).

I have tried adding below app settings:

  1. AzureFunctionsWebHost__hostId -> AzureFunctionsWebHost__hostid
  2. Removed hyphens (-) from GUID
  3. Added AzureFunctionsWebHost:hostId (upper I)
  4. Added AzureFunctionsWebHost:hostid (lower i)
  5. Added FUNCTIONS_HOSTID_CHECK_LEVEL with value 'Warning'

Host ID collision error is removed from diagnose & solve problem screen but issue isn't resolved. Function apps still stops & starts intermittently).

The only solution that worked for me was to deploy a different function app with truncated name (as we cannot rename the function app)
However this is not the solution I was expecting because I have 26 function apps & now I need to remove, rename & redeploy those again, raised a ticket with microsoft support but no response since yesterday.

@jassent
Copy link

jassent commented Jan 16, 2024

I have started facing this issue suddenly since yesterday. We have V4 function apps deployed on windows elastic plan.

I have function apps having name of length 43 characters & having ambiguity of name in the trail

like abc-defg-abcdefghijklmnopqrstuv-v1-eus & abc-defg-abcdefghijklmnopqrstuv-v2-eus

I was already having below app setting added to Function App configuration: "AzureFunctionsWebHost__hostId" with GUID generated host ID. The same configuration was working since last 6 months & since yesterday it has started failing (function apps stops & starts intermittently).

I have tried adding below app settings:

  1. AzureFunctionsWebHost__hostId -> AzureFunctionsWebHost__hostid
  2. Removed hyphens (-) from GUID
  3. Added AzureFunctionsWebHost:hostId (upper I)
  4. Added AzureFunctionsWebHost:hostid (lower i)
  5. Added FUNCTIONS_HOSTID_CHECK_LEVEL with value 'Warning'

Host ID collision error is removed from diagnose & solve problem screen but issue isn't resolved. Function apps still stops & starts intermittently).

The only solution that worked for me was to deploy a different function app with truncated name (as we cannot rename the function app) However this is not the solution I was expecting because I have 26 function apps & now I need to remove, rename & redeploy those again, raised a ticket with microsoft support but no response since yesterday.

Any chance it is a scaling/instance issue? I.e. you actually have multiple of the same hostid in use?

@marcelaction
Copy link

@madmahii24 any response from microsoft support? Running into the same issue

@mkeeney-robinson
Copy link

this is still in issue in 2024

@madmahii24
Copy link

@marcelaction MS support was not able to help. Only suggestion I got is to rename your function apps to have names lower than 32 chars.

@madmahii24
Copy link

@jassent I don't think it is a scaling issue.
If we have scaling enabled on function apps & it is generating hostID at back end then it is another issue to manage those host IDs.

@charlie-swing
Copy link

Also just ran into this issue. It's taken 4 months for my team to even discover this issue. If this is not going to be fixed anytime soon, could at least an alert be displayed in the portal for this? This doesn't seem like it would be too hard to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests