UTF-8 locale #483

tuk05578 · 2022-05-27T15:37:14Z

Hello,

I am running an AlphaFold run on a favoenzyme, photolyase. I have had a few successful runs of the program, but suddenly it stopped working and keeps giving me this error whenever I try to run it:

NotImplementedError                       Traceback (most recent call last)
[<ipython-input-1-bc0091fa34e2>](https://localhost:8080/#) in <module>()
    577 sequence = 'AKIGLFYGTQTGVTQTIAESIQQEFGGESIVDLNDIANADASDLNAYDYLIIGCPTWNVGELQSDWEGIYDDLDSVNFQGKKVAYFGAGDQVGYSDNFQDAMGILEEKISSLGSQTVGYWPIEGYDFNESKAVRNNQFVGLAIDEDNQPDLTKNRIKTWVSQLKSEFGL'  #@param {type:"string"}
    578 
--> 579 run_prediction(sequence)

3 frames
[/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py](https://localhost:8080/#) in _run_command(cmd, clear_streamed_output)
    166   if locale_encoding != _ENCODING:
    167     raise NotImplementedError(
--> 168         'A UTF-8 locale is required. Got {}'.format(locale_encoding))
    169 
    170   parent_pty, child_pty = pty.openpty()

NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968

If anyone can provide context or an explanation on how to fix this (I am very much NOT very knowledgable on coding and the nitty gritty details of AlphaFold), that would be much appreciated :)
Thank you for your time in advance guys!!

Best wishes,
Jared

The text was updated successfully, but these errors were encountered:

li-yq · 2022-05-29T15:52:53Z

I also met the same problem and a temporary workaround is to manually download result files via the files panel on the left.

tuk05578 · 2022-05-31T18:59:54Z

I also met the same problem and a temporary workaround is to manually download result files via the files panel on the left.

I don't seem to see a results file, and when I try to download the prediction it says is the best prediction, it doesn't actually download to my computer.

kamdiehl · 2022-06-09T07:40:00Z

Same here, it won't save to mine either. Have you found a solution yet?

sokrypton · 2022-06-17T00:51:18Z

This error appears to only appear when conda is installed. Seems conda is interfering with google colab's %shell magic.

Hans-Yolo · 2022-06-29T00:48:50Z

Just checking in as well, I'm having the same issue. I do have conda installed, I don't want to uninstall just for this. I can't download, and I also can't seem to mount my google drive as a workaround. Has anyone figured out how to grab their PDB?

tuk05578 · 2022-06-29T17:12:01Z

Hi all, sorry I haven't been replying. Life has gotten VERY busy at our lab now that it's summer time.

The work around that I found was connecting to a hosted runtime (top right corner) after a run is complete and the error code is given, and once it connects, in the top menu, I click "Runtime", and go down and press "restart and run all" and it should do it relatively quick and bring up the prediction in ChimeraX.

Let me know if it works for any of you!

Hans-Yolo · 2022-06-29T23:19:00Z

Thanks for your response, I really appreciate it! Yes this worked, but only the second time. I ran again and got the same error, then I tried this method. When I ran again, it just started over and took another hour and threw the same error. Then I tried the same thing one more time, and then it ran for 1 minute and dropped the prediction into ChimeraX.

Thanks for the fix!

Belfield · 2022-07-01T12:27:09Z

Hi all, sorry I haven't been replying. Life has gotten VERY busy at our lab now that it's summer time.

The work around that I found was connecting to a hosted runtime (top right corner) after a run is complete and the error code is given, and once it connects, in the top menu, I click "Runtime", and go down and press "restart and run all" and it should do it relatively quick and bring up the prediction in ChimeraX.

Let me know if it works for any of you!

Thanks for the suggestion but it's still not running for me. I get this:

Augustin-Zidek · 2022-08-03T09:39:37Z

Could you try adding this code just before the problematic line is called?

import os
del os.environ['LC_ALL']

Also, is this issue happening in the AlphaFold Colab, or is it in ColabFold?

li-yq · 2022-08-03T10:49:52Z

Also, is this issue happening in the AlphaFold Colab, or is it in ColabFold?

It's the AlphaFold Colab.

Belfield · 2022-08-04T11:45:29Z

Dear Augustin, Thank you for you email, in the Run AlphaFold and download prediction code below where should I add the extra code (import os del os.environ['LC_ALL'])? Could you insert in the code and I’ll try re-running the job. Best and thanks, Eric Dr Eric Belfield, Department of Biology, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK 01865 275000 ***@***.******@***.***> ***@***.*** 5. Run AlphaFold and download prediction ***@***.*** Once this cell has been executed, a zip-archive with ***@***.*** the obtained prediction will be automatically downloaded ***@***.*** to your computer. ***@***.*** In case you are having issues with the relaxation stage, you can disable it below. ***@***.*** Warning: This means that the prediction might have distracting ***@***.*** small stereochemical violations. run_relax = True ***@***.*** {type:"boolean"} # --- Run the model --- if model_type_to_use == notebook_utils.ModelType.MONOMER: model_names = config.MODEL_PRESETS['monomer'] + ('model_2_ptm',) elif model_type_to_use == notebook_utils.ModelType.MULTIMER: model_names = config.MODEL_PRESETS['multimer'] output_dir = 'prediction' os.makedirs(output_dir, exist_ok=True) plddts = {} ranking_confidences = {} pae_outputs = {} unrelaxed_proteins = {} with tqdm.notebook.tqdm(total=len(model_names) + 1, bar_format=TQDM_BAR_FORMAT) as pbar: for model_name in model_names: pbar.set_description(f'Running {model_name}') cfg = config.model_config(model_name) if model_type_to_use == notebook_utils.ModelType.MONOMER: cfg.data.eval.num_ensemble = 1 elif model_type_to_use == notebook_utils.ModelType.MULTIMER: cfg.model.num_ensemble_eval = 1 params = data.get_model_haiku_params(model_name, './alphafold/data') model_runner = model.RunModel(cfg, params) processed_feature_dict = model_runner.process_features(np_example, random_seed=0) prediction = model_runner.predict(processed_feature_dict, random_seed=random.randrange(sys.maxsize)) mean_plddt = prediction['plddt'].mean() if model_type_to_use == notebook_utils.ModelType.MONOMER: if 'predicted_aligned_error' in prediction: pae_outputs[model_name] = (prediction['predicted_aligned_error'], prediction['max_predicted_aligned_error']) else: # Monomer models are sorted by mean pLDDT. Do not put monomer pTM models here as they # should never get selected. ranking_confidences[model_name] = prediction['ranking_confidence'] plddts[model_name] = prediction['plddt'] elif model_type_to_use == notebook_utils.ModelType.MULTIMER: # Multimer models are sorted by pTM+ipTM. ranking_confidences[model_name] = prediction['ranking_confidence'] plddts[model_name] = prediction['plddt'] pae_outputs[model_name] = (prediction['predicted_aligned_error'], prediction['max_predicted_aligned_error']) # Set the b-factors to the per-residue plddt. final_atom_mask = prediction['structure_module']['final_atom_mask'] b_factors = prediction['plddt'][:, None] * final_atom_mask unrelaxed_protein = protein.from_prediction( processed_feature_dict, prediction, b_factors=b_factors, remove_leading_feature_dimension=( model_type_to_use == notebook_utils.ModelType.MONOMER)) unrelaxed_proteins[model_name] = unrelaxed_protein # Delete unused outputs to save memory. del model_runner del params del prediction pbar.update(n=1) # --- AMBER relax the best model --- # Find the best model according to the mean pLDDT. best_model_name = max(ranking_confidences.keys(), key=lambda x: ranking_confidences[x]) if run_relax: pbar.set_description(f'AMBER relaxation') amber_relaxer = relax.AmberRelaxation( max_iterations=0, tolerance=2.39, stiffness=10.0, exclude_residues=[], max_outer_iterations=3, use_gpu=True) relaxed_pdb, _, _ = amber_relaxer.process(prot=unrelaxed_proteins[best_model_name]) else: print('Warning: Running without the relaxation stage.') relaxed_pdb = protein.to_pdb(unrelaxed_proteins[best_model_name]) pbar.update(n=1) # Finished AMBER relax. # Construct multiclass b-factors to indicate confidence bands # 0=very low, 1=low, 2=confident, 3=very high banded_b_factors = [] for plddt in plddts[best_model_name]: for idx, (min_val, max_val, _) in enumerate(PLDDT_BANDS): if plddt >= min_val and plddt <= max_val: banded_b_factors.append(idx) break banded_b_factors = np.array(banded_b_factors)[:, None] * final_atom_mask to_visualize_pdb = utils.overwrite_b_factors(relaxed_pdb, banded_b_factors) # Write out the prediction pred_output_path = os.path.join(output_dir, 'selected_prediction.pdb') with open(pred_output_path, 'w') as f: f.write(relaxed_pdb) # --- Visualise the prediction & confidence --- show_sidechains = True def plot_plddt_legend(): """Plots the legend for pLDDT.""" thresh = ['Very low (pLDDT < 50)', 'Low (70 > pLDDT > 50)', 'Confident (90 > pLDDT > 70)', 'Very high (pLDDT > 90)'] colors = [x[2] for x in PLDDT_BANDS] plt.figure(figsize=(2, 2)) for c in colors: plt.bar(0, 0, color=c) plt.legend(thresh, frameon=False, loc='center', fontsize=20) plt.xticks([]) plt.yticks([]) ax = plt.gca() ax.spines['right'].set_visible(False) ax.spines['top'].set_visible(False) ax.spines['left'].set_visible(False) ax.spines['bottom'].set_visible(False) plt.title('Model Confidence', fontsize=20, pad=20) return plt # Show the structure coloured by chain if the multimer model has been used. if model_type_to_use == notebook_utils.ModelType.MULTIMER: multichain_view = py3Dmol.view(width=800, height=600) multichain_view.addModelsAsFrames(to_visualize_pdb) multichain_style = {'cartoon': {'colorscheme': 'chain'}} multichain_view.setStyle({'model': -1}, multichain_style) multichain_view.zoomTo() multichain_view.show() # Color the structure by per-residue pLDDT color_map = {i: bands[2] for i, bands in enumerate(PLDDT_BANDS)} view = py3Dmol.view(width=800, height=600) view.addModelsAsFrames(to_visualize_pdb) style = {'cartoon': {'colorscheme': {'prop': 'b', 'map': color_map}}} if show_sidechains: style['stick'] = {} view.setStyle({'model': -1}, style) view.zoomTo() grid = GridspecLayout(1, 2) out = Output() with out: view.show() grid[0, 0] = out out = Output() with out: plot_plddt_legend().show() grid[0, 1] = out display.display(grid) # Display pLDDT and predicted aligned error (if output by the model). if pae_outputs: num_plots = 2 else: num_plots = 1 plt.figure(figsize=[8 * num_plots, 6]) plt.subplot(1, num_plots, 1) plt.plot(plddts[best_model_name]) plt.title('Predicted LDDT') plt.xlabel('Residue') plt.ylabel('pLDDT') if num_plots == 2: plt.subplot(1, 2, 2) pae, max_pae = list(pae_outputs.values())[0] plt.imshow(pae, vmin=0., vmax=max_pae, cmap='Greens_r') plt.colorbar(fraction=0.046, pad=0.04) # Display lines at chain boundaries. best_unrelaxed_prot = unrelaxed_proteins[best_model_name] total_num_res = best_unrelaxed_prot.residue_index.shape[-1] chain_ids = best_unrelaxed_prot.chain_index for chain_boundary in np.nonzero(chain_ids[:-1] - chain_ids[1:]): if chain_boundary.size: plt.plot([0, total_num_res], [chain_boundary, chain_boundary], color='red') plt.plot([chain_boundary, chain_boundary], [0, total_num_res], color='red') plt.title('Predicted Aligned Error') plt.xlabel('Scored residue') plt.ylabel('Aligned residue') # Save the predicted aligned error (if it exists). pae_output_path = os.path.join(output_dir, 'predicted_aligned_error.json') if pae_outputs: # Save predicted aligned error in the same format as the AF EMBL DB. pae_data = notebook_utils.get_pae_json(pae=pae, max_pae=max_pae.item()) with open(pae_output_path, 'w') as f: f.write(pae_data) # --- Download the predictions --- !zip -q -r {output_dir}.zip {output_dir} files.download(f'{output_dir}.zip') From: Augustin Zidek ***@***.***> Sent: Wednesday, August 3, 2022 10:40 AM To: deepmind/alphafold ***@***.***> Cc: Belfield ***@***.***>; Comment ***@***.***> Subject: Re: [deepmind/alphafold] UTF-8 locale (Issue #483) Could you try adding this code just before the problematic line is called? import os del os.environ['LC_ALL'] — Reply to this email directly, view it on GitHub<#483 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AZ36DEL7JVXZPA2L6SEBYITVXI46JANCNFSM5XESGBTQ>. You are receiving this because you commented.Message ID: ***@***.******@***.***>>

gmihaila · 2022-12-30T17:04:25Z

This is more of a Google Colab issue when trying to create the output_dir zip file from the output_dir folder. I solved this by using shutil to create the zip file.
You can simply replace the line of code !zip -q -r {output_dir}.zip {output_dir} from cell 5. Run AlphaFold and download prediction with import shutil; shutil.make_archive(output_dir, 'zip', output_dir)
This will create the zip file and should not cause any issues in the future since we don't rely on the Google Colab terminal commands.

I added this fix to a pull request here: #672

tomgoddard · 2023-01-19T00:31:22Z

This problem is caused by Python somehow being switched from using the default UTF-8 text encoding to ANSI_X4.3-1968 (the technical name for ASCII text encoding). Google Colab shell magic (leading "!" to run shell commands in Python scripts) gives the reported error if the encoding is not UTF-8. The switch from UTF-8 to ASCII encoding happens when OpenMM energy minimization is run by AlphaFold. I am not sure how OpenMM causes that switch. Usually the encoding is controlled by environment variables such as LANG or LC_ALL and the settings of those are not changed when the error happens.

This bug has been reported for ColabFold run on Google Colab

sokrypton/ColabFold#237

and also for ChimeraX AlphaFold predictions run on Google Colab

https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/8313

I debugged the ChimeraX case, but was not able to find the underlying cause. The Python method locale.getpreferredencoding() is giving ANSI_X4.3-1968 when the error occurs but UTF-8 when there is no error. That Python routine uses _locale.nl_langinfo(_locale.CODESET) in Python 3.8 which is a call into C code that uses the nl_langinfo(CODESET) C library call. I did hours of testing and could not figure out why the C library call is not reporting UTF-8. Details are in the above ChimeraX ticket. Ultimately I put in a very ugly work-around monkey patching _locale.nl_langinfo(CODESET) to report UTF-8 in the ChimeraX AlphaFold code, a horrible solution.

The suggested fix in the Dec 30, 2022 comment by gmihaila of replacing the !zip shell magic works the first time AlphaFold is run. But there are other uses of shell magic that then break if you do another run in the same Google Colab session. Also another run will create output files and those will have default ASCII encoding which will cause failures (e.g. in ColabFold when it tries to write out citations with non-ascii characters.) A real fix will need to figure out how the text encoding is being changed or how to reset it to be UTF-8 after the OpenMM minimization changes it.

Augustin-Zidek · 2023-01-30T10:35:33Z

Fixed in 0d9a24b. Thanks for reporting!

colaboratory-team mentioned this issue Jul 7, 2022

ValueError: Minimization failed after 100 attempts. googlecolab/colabtools#2894

Closed

Augustin-Zidek added the colab AlphaFold colab issue label Aug 3, 2022

This was referenced Aug 19, 2022

AlphaFold Colab: Minimisation failed after 100 runs - disable amber relax - prediction does not download #555

Closed

Problem dowloading results #509

Closed

jax.tree_util.tree_multimap() is deprecated #456

Closed

gmihaila mentioned this issue Dec 30, 2022

Fixing zip command issue in Google Colab. #672

Closed

Augustin-Zidek mentioned this issue Jan 19, 2023

Prediction downloading error #685

Closed

Augustin-Zidek closed this as completed Jan 30, 2023

canergen mentioned this issue Mar 16, 2023

annotate_data crashes on cell ontology with UnicodeDecodeError [Colab] YosefLab/popV#20

Closed

isty2e mentioned this issue Jul 8, 2023

Specifying encoding in CheckPDBatoms.py osita-sunday-nnyigide/Pras_Server#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 locale #483

UTF-8 locale #483

tuk05578 commented May 27, 2022 •

edited by Augustin-Zidek

Loading

li-yq commented May 29, 2022

tuk05578 commented May 31, 2022

kamdiehl commented Jun 9, 2022

sokrypton commented Jun 17, 2022

Hans-Yolo commented Jun 29, 2022

tuk05578 commented Jun 29, 2022

Hans-Yolo commented Jun 29, 2022

Belfield commented Jul 1, 2022

Augustin-Zidek commented Aug 3, 2022 •

edited

Loading

li-yq commented Aug 3, 2022

Belfield commented Aug 4, 2022 via email

gmihaila commented Dec 30, 2022

tomgoddard commented Jan 19, 2023 •

edited

Loading

Augustin-Zidek commented Jan 30, 2023

UTF-8 locale #483

UTF-8 locale #483

Comments

tuk05578 commented May 27, 2022 • edited by Augustin-Zidek Loading

I am running an AlphaFold run on a favoenzyme, photolyase. I have had a few successful runs of the program, but suddenly it stopped working and keeps giving me this error whenever I try to run it:

li-yq commented May 29, 2022

tuk05578 commented May 31, 2022

kamdiehl commented Jun 9, 2022

sokrypton commented Jun 17, 2022

Hans-Yolo commented Jun 29, 2022

tuk05578 commented Jun 29, 2022

Hans-Yolo commented Jun 29, 2022

Belfield commented Jul 1, 2022

Augustin-Zidek commented Aug 3, 2022 • edited Loading

li-yq commented Aug 3, 2022

Belfield commented Aug 4, 2022 via email

gmihaila commented Dec 30, 2022

tomgoddard commented Jan 19, 2023 • edited Loading

Augustin-Zidek commented Jan 30, 2023

tuk05578 commented May 27, 2022 •

edited by Augustin-Zidek

Loading

Augustin-Zidek commented Aug 3, 2022 •

edited

Loading

tomgoddard commented Jan 19, 2023 •

edited

Loading