Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build is dependent on a committed ENV file, and doesn't obey system ENV vars for GQL url #718

Open
mikemurray opened this issue Jul 31, 2020 · 11 comments
Labels
bug For issues that describe a defect or regression in the released software priority high severity high

Comments

@mikemurray
Copy link
Member

Type: critical

This is a blocker for releases

Describe the bug

The BUILD_GRAPHQL_URL and/or EXTERNAL_GRAPHQL_URL env var is being baked into the JS bundle, likely due to the Next.js / webpack / static build. This bakes unchangeable configuration into the app, which leads to the built docker image on docker hub, being useless if you aren't using the default variables.

Part of the solution for enabling the Next.js static build was maintaining a .env.prod to supply those variables at build time for the production build. This env file is an anti-pattern for the following reasons.

  1. Promotes a dangerous practice of committing environment vars to a repo in a plain text file.
  2. Ties configuration to dev/prod when the build should have not care which system it's building for. The built app should run the same if you point it to your dev or production API.

ENV vars should always come from the system, and never be in a file that is committed to source control. The .env and .env.example files are a convenience local development only.

Because of the .env.prod file and the Next.js static build process, we're not able to use our generic Docker image published to Docker hub to run in our test environments, and generic users cannot use it as is without rebuilding.

Expected behavior

  • Do not require a static build in Next.js. This might be related to Optionally allow blocking fallback rendering #716
  • Remove .env.prod and make the build not dependent on that file. It should get its vars from the environment, whether you put them there in a prefix to the command, or cat them into the env, or set them in CI.
  • Fix the issue where the GraphQL urls are being baked into the built JS bundle.
    • Maybe proxy GraphQL requests through a Next.js API route so we can take advantage of system environment vars
    • Somehow make sure config doesn't get bundled
@mikemurray mikemurray added bug For issues that describe a defect or regression in the released software priority high severity high labels Jul 31, 2020
@janus-reith
Copy link
Collaborator

janus-reith commented Aug 3, 2020

I think this should go into two separate issues, as the fact pages are statically generated at build time won't compromise security of environment vars. We don't necessarily need to provide env vars from file with static generation, although those used inside the client bundle as defined in next.config.js would be inlined at build time and not change until maybe the page is revalidated.

Regarding how .env vars could be handled, I brought that up here: #691 (comment) and there: #667 (comment).
The most simple solution would maybe really be to remove the xargs -0 logic in the Dockerfile and pass them one by one to to the build

@janus-reith
Copy link
Collaborator

Regarding the static pages:

As discussed in the slack channel, skipping the render of static pages during build time currently seems to be no possibility.
What could work, is to remove the build-time generated pages at the end of the docker build, while it sounds a bit ridiculous, that would allow the CI run to the test each page during build by building it once.
I know that at least for dynamic pages that should be no issue as it checks for their presence on the filesystem and else fallback-renders them, so keeps no general track of them. Therefore I assume simply deleting them should be safe.
As basically all our pages are under a [lang] path, all of them should be dynamic, so this might work out. Not sure though which role _app.js plays there, I know it recieves props passed to each individual page, therefore I guess as it’s down the three it is reevaluated aswell.

The fact that it still renders each page on build time with the stub data could be seen as an additional test ensuring each page actually renders, although sadly none that can be cleanly skipped like usual test steps and isn’t really separated.
If we want to skip that render altogether, what we could do is, use some env during build indicating that it is build time, use that in static props, pass some bool then instead of relevant shop data, and on the highest level on the client, directly in _app.js, if that bool was passed, return null instead of the actual component.

We could use two env vars that default to true, one to remove build time pages at the end of the build, and one to skip the build time test rendering. (or just couple them into one var) What I would like about this approach, although still a bit long-winded, is that it would still allow vercel deployments, and also still allow to go the build time pages way with these env vars set to false, a way other typical nextjs user might actually expect their deployment step to behave.

We could get a bit more developer friendly this way, e.g. stating something like:
You enabled build-time page generation but missed x/y/z environment variable, please provide it or switch to runtime-mode

@dcrdev
Copy link

dcrdev commented Oct 16, 2020

Has there been any resolution to this, just encountered this when upgrading the images used by our helm chart ?

Have had to rollback for now - whilst most people would opt to build their own images, I feel that being able to use a set of stock images gives people the opportunity to experiment in a quick start sort of fashion.

For me I want to be able to store these values as secrets and be able to regularly rotate them without having to tie deployment phase credential management into ci; anything other than being able to load these values at runtime, breaks this for me.

@janus-reith
Copy link
Collaborator

@dcrdev There's multiple ways how the env vars could be handled outside of the repo, this was an initial way to have something to be able to test production builds, as there was some confusion back then on how this could be handled.
I'm happy to figure out a solution that would work best with the default internal CI flow of the team and could also serve as a default, but can't do this on my own without further feedback.

In the meantime, I'd suggest everyone struggling with this to have a look at the nextjs docs regarding handling of environment variables and static generation, and figure out a way that works best for them.
Feel free to reach out if there are open questions regarding this.

@janus-reith
Copy link
Collaborator

There is a new notFound prop that can be passed in getStaticProps, which will return a 404 page.
During runtime this can be helpful e.g. if you have a dynamic url, and some product for a certain slug does not exist.

I just realized, that this notFound also seems to work during build time, and actually results in that 404 page not being served at all, instead it will revalidate the page right away upon first request, which is great for our usecase.
(With other workarounds, we would still have the initial render coming from build time, and only have the runtime page served on subsequent request).

I made quick codesandbox example to demo this:
https://codesandbox.io/s/fervent-goodall-s2hfu
In order to get the full picture, you might need to fork it on order to restart the server and see the build in action.

@ghost
Copy link

ghost commented Mar 1, 2021

For me I want to be able to store these values as secrets and be able to regularly rotate them without having to tie deployment phase credential management into ci; anything other than being able to load these values at runtime, breaks this for me.

Per Next Docs you're going to want to do something like:

NEXT_PUBLIC_SEGMENT_ANALYTICS_WRITE_KEY=$SEGMENT_ANALYTICS_WRITE_KEY

Where ConfigMap items (not Secrets) are included in the bundle because of the prefix NEXT_PUBLIC_ prefix and the =$FOO part pulls from the current printenv. For your Secrets, keep the names of the vars the same as they were but do the =$FOO thing to pull from env and Next will do its part to ensure they aren't exposed on the client-side.

@ghost
Copy link

ghost commented Mar 1, 2021

@janus-reith could you help us understand your logic behind including items in the env key in next.config.js? Looking at it it feels as though you were trying to hack your way past using the NEXT_PUBLIC_ while at the same time continuing to utilize the validation provided by envalid. Is that correct?

@janus-reith
Copy link
Collaborator

janus-reith commented Mar 1, 2021

@balibebas No, not at all a hack or something, and not really that relevant for the earlier issue outlined above.

The NEXT_PUBLIC_ part does not really make a difference here, we were always able to make them available in the client bundle by adding them here just like we did:


We could start to use the more recent NEXT_PUBLIC_ prefix instead and not explicitly list them in next.config.js, but it doesnt really make a difference. It's no hack or workaround, but simply how you provided env vars to the browser code.
(The nextjs doc also mention that at the top, referencing to the previous docs )
The only thing NEXT_PUBLIC_ adds here is that it is easier to comprehend which vars might land on the client due to their name and that you save some lines of code in your next.config.js.
You can easily verify that by looking at the nextjs codebase, this is the only relevant place where it is used: https://github.com/vercel/next.js/blob/04f37d0978e5fc9939012c1d771ef4e6535e7787/packages/next/build/webpack-config.ts#L974_L988
Same outcome as listing them in config.env, but that being said I'd be open to consider using NEXT_PUBLIC_ and rename some public env variables.

Regarding the actual underlying issues, the logic regarding which env is used where also remains the same, unrelated to that: API functions running on the server will always use the env they currently have and not have any baked in, so that would be that from runtime usually (They shouldnt really be called during build).
Pages not using getServerSideProps (or the deprecated getInitialProps) will be created at build time, so if the about to be static page uses env variables on the client (e.g. for segment) it will be baked in for that page. if it uses getStaticProps, it will also use these env vars provided on build time on the server side but doesnt necessarily need to make it public to the client, like e.g. the graphql for data fetching (Client doesnt know BUILD_GRAPHQL_URL but receives the output of fetching data with it on the server).

getStaticProps however which ran at build time, might also execute during runtime later, either because it has a dynamic path segment and getStaticPaths with fallback: true and creates a new page that's about to be static after it was created, and/or it uses revalidate: AMOUNTINSECONDS.
Still, while the resulting page would be static, that getStaticProps function would not have static env variables baked in, and if it ran on runtime later, it would use whatever is available then.
In the case of revalidate, it would also happen that you had the page generated with somevalue: 123 but provide somevalue: 456 for runtime.
You might also have had a page that uses a static variable from process.env in the client bundle, like that aforementioned segment key which would be staticically written into that file then If that page however might also have getStaticProps and revalidate and would be regenerated on runtime, it would also use whatever segment key is provided then, as the whole pag e is rerendered.

@ghost
Copy link

ghost commented Mar 1, 2021

Perfect, thank you. That's helpful. Considering the major release seems like a good time to make a breaking change with the ENVs. Fancy a pull or are you still waiting for opinions?

@janus-reith
Copy link
Collaborator

janus-reith commented Mar 1, 2021

Perfect, thank you. That's helpful. Considering the major release seems like a good time to make a breaking change with the ENVs. Fancy a pull or are you still waiting for opinions?

I guess I'm not making the decisions, maybe wait for some core team member to chime in.
Maybe it could be done in one go when updating nextjs to v10.
Either way, personally I wouldn't consider it a big priority right now TBH.

@rondlite
Copy link

I am using a dirty fix that still lets me end up with .env.prod on the container so I do not have to change the code,

#!/bin/bash
var=( NEXT_PUBLIC_SEGMENT_ANALYTICS_WRITE_KEY NEXT_PUBLIC_SEGMENT_ANALYTICS_SKIP_MINIMIZE CANONICAL_URL ENABLE_SPA_ROUTING BUILD_GRAPHQL_URL EXTERNAL_GRAPHQL_URL INTERNAL_GRAPHQL_URL OAUTH2_ADMIN_PORT OAUTH2_ADMIN_URL OAUTH2_AUTH_URL OAUTH2_CLIENT_ID OAUTH2_CLIENT_SECRET OAUTH2_PUBLIC_LOGOUT_URL OAUTH2_HOST OAUTH2_IDP_PUBLIC_CHANGE_PASSWORD_URL OAUTH2_IDP_HOST_URL OAUTH2_TOKEN_URL PORT SEGMENT_ANALYTICS_SKIP_MINIMIZE SEGMENT_ANALYTICS_WRITE_KEY SESSION_MAX_AGE_MS SESSION_SECRET STRIPE_PUBLIC_API_KEY)

for v in ${var[@]} ; do
    echo "$v=${!v}" >> ../.env.prod
done

Called this env-populate and placed it in /bin. This file I call at a build step, while I insert the vars from a k8s secret.
Mind you, we are using a private docker hub, don't do this on containers you are publishing as public.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For issues that describe a defect or regression in the released software priority high severity high
Projects
None yet
Development

No branches or pull requests

6 participants