-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 locale #483
Comments
I also met the same problem and a temporary workaround is to manually download result files via the files panel on the left. |
I don't seem to see a results file, and when I try to download the prediction it says is the best prediction, it doesn't actually download to my computer. |
Same here, it won't save to mine either. Have you found a solution yet? |
This error appears to only appear when conda is installed. Seems conda is interfering with google colab's |
Just checking in as well, I'm having the same issue. I do have conda installed, I don't want to uninstall just for this. I can't download, and I also can't seem to mount my google drive as a workaround. Has anyone figured out how to grab their PDB? |
Hi all, sorry I haven't been replying. Life has gotten VERY busy at our lab now that it's summer time. The work around that I found was connecting to a hosted runtime (top right corner) after a run is complete and the error code is given, and once it connects, in the top menu, I click "Runtime", and go down and press "restart and run all" and it should do it relatively quick and bring up the prediction in ChimeraX. Let me know if it works for any of you! |
Thanks for your response, I really appreciate it! Yes this worked, but only the second time. I ran again and got the same error, then I tried this method. When I ran again, it just started over and took another hour and threw the same error. Then I tried the same thing one more time, and then it ran for 1 minute and dropped the prediction into ChimeraX. Thanks for the fix! |
Thanks for the suggestion but it's still not running for me. I get this: |
Could you try adding this code just before the problematic line is called? import os
del os.environ['LC_ALL'] Also, is this issue happening in the AlphaFold Colab, or is it in ColabFold? |
It's the AlphaFold Colab. |
Dear Augustin,
Thank you for you email, in the Run AlphaFold and download prediction code below where should I add the extra code (import os del os.environ['LC_ALL'])? Could you insert in the code and I’ll try re-running the job.
Best and thanks,
Eric
Dr Eric Belfield,
Department of Biology,
University of Oxford,
South Parks Road,
Oxford,
OX1 3RB,
UK
01865 275000
***@***.******@***.***>
***@***.*** 5. Run AlphaFold and download prediction
***@***.*** Once this cell has been executed, a zip-archive with
***@***.*** the obtained prediction will be automatically downloaded
***@***.*** to your computer.
***@***.*** In case you are having issues with the relaxation stage, you can disable it below.
***@***.*** Warning: This means that the prediction might have distracting
***@***.*** small stereochemical violations.
run_relax = True ***@***.*** {type:"boolean"}
# --- Run the model ---
if model_type_to_use == notebook_utils.ModelType.MONOMER:
model_names = config.MODEL_PRESETS['monomer'] + ('model_2_ptm',)
elif model_type_to_use == notebook_utils.ModelType.MULTIMER:
model_names = config.MODEL_PRESETS['multimer']
output_dir = 'prediction'
os.makedirs(output_dir, exist_ok=True)
plddts = {}
ranking_confidences = {}
pae_outputs = {}
unrelaxed_proteins = {}
with tqdm.notebook.tqdm(total=len(model_names) + 1, bar_format=TQDM_BAR_FORMAT) as pbar:
for model_name in model_names:
pbar.set_description(f'Running {model_name}')
cfg = config.model_config(model_name)
if model_type_to_use == notebook_utils.ModelType.MONOMER:
cfg.data.eval.num_ensemble = 1
elif model_type_to_use == notebook_utils.ModelType.MULTIMER:
cfg.model.num_ensemble_eval = 1
params = data.get_model_haiku_params(model_name, './alphafold/data')
model_runner = model.RunModel(cfg, params)
processed_feature_dict = model_runner.process_features(np_example, random_seed=0)
prediction = model_runner.predict(processed_feature_dict, random_seed=random.randrange(sys.maxsize))
mean_plddt = prediction['plddt'].mean()
if model_type_to_use == notebook_utils.ModelType.MONOMER:
if 'predicted_aligned_error' in prediction:
pae_outputs[model_name] = (prediction['predicted_aligned_error'],
prediction['max_predicted_aligned_error'])
else:
# Monomer models are sorted by mean pLDDT. Do not put monomer pTM models here as they
# should never get selected.
ranking_confidences[model_name] = prediction['ranking_confidence']
plddts[model_name] = prediction['plddt']
elif model_type_to_use == notebook_utils.ModelType.MULTIMER:
# Multimer models are sorted by pTM+ipTM.
ranking_confidences[model_name] = prediction['ranking_confidence']
plddts[model_name] = prediction['plddt']
pae_outputs[model_name] = (prediction['predicted_aligned_error'],
prediction['max_predicted_aligned_error'])
# Set the b-factors to the per-residue plddt.
final_atom_mask = prediction['structure_module']['final_atom_mask']
b_factors = prediction['plddt'][:, None] * final_atom_mask
unrelaxed_protein = protein.from_prediction(
processed_feature_dict,
prediction,
b_factors=b_factors,
remove_leading_feature_dimension=(
model_type_to_use == notebook_utils.ModelType.MONOMER))
unrelaxed_proteins[model_name] = unrelaxed_protein
# Delete unused outputs to save memory.
del model_runner
del params
del prediction
pbar.update(n=1)
# --- AMBER relax the best model ---
# Find the best model according to the mean pLDDT.
best_model_name = max(ranking_confidences.keys(), key=lambda x: ranking_confidences[x])
if run_relax:
pbar.set_description(f'AMBER relaxation')
amber_relaxer = relax.AmberRelaxation(
max_iterations=0,
tolerance=2.39,
stiffness=10.0,
exclude_residues=[],
max_outer_iterations=3,
use_gpu=True)
relaxed_pdb, _, _ = amber_relaxer.process(prot=unrelaxed_proteins[best_model_name])
else:
print('Warning: Running without the relaxation stage.')
relaxed_pdb = protein.to_pdb(unrelaxed_proteins[best_model_name])
pbar.update(n=1) # Finished AMBER relax.
# Construct multiclass b-factors to indicate confidence bands
# 0=very low, 1=low, 2=confident, 3=very high
banded_b_factors = []
for plddt in plddts[best_model_name]:
for idx, (min_val, max_val, _) in enumerate(PLDDT_BANDS):
if plddt >= min_val and plddt <= max_val:
banded_b_factors.append(idx)
break
banded_b_factors = np.array(banded_b_factors)[:, None] * final_atom_mask
to_visualize_pdb = utils.overwrite_b_factors(relaxed_pdb, banded_b_factors)
# Write out the prediction
pred_output_path = os.path.join(output_dir, 'selected_prediction.pdb')
with open(pred_output_path, 'w') as f:
f.write(relaxed_pdb)
# --- Visualise the prediction & confidence ---
show_sidechains = True
def plot_plddt_legend():
"""Plots the legend for pLDDT."""
thresh = ['Very low (pLDDT < 50)',
'Low (70 > pLDDT > 50)',
'Confident (90 > pLDDT > 70)',
'Very high (pLDDT > 90)']
colors = [x[2] for x in PLDDT_BANDS]
plt.figure(figsize=(2, 2))
for c in colors:
plt.bar(0, 0, color=c)
plt.legend(thresh, frameon=False, loc='center', fontsize=20)
plt.xticks([])
plt.yticks([])
ax = plt.gca()
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
plt.title('Model Confidence', fontsize=20, pad=20)
return plt
# Show the structure coloured by chain if the multimer model has been used.
if model_type_to_use == notebook_utils.ModelType.MULTIMER:
multichain_view = py3Dmol.view(width=800, height=600)
multichain_view.addModelsAsFrames(to_visualize_pdb)
multichain_style = {'cartoon': {'colorscheme': 'chain'}}
multichain_view.setStyle({'model': -1}, multichain_style)
multichain_view.zoomTo()
multichain_view.show()
# Color the structure by per-residue pLDDT
color_map = {i: bands[2] for i, bands in enumerate(PLDDT_BANDS)}
view = py3Dmol.view(width=800, height=600)
view.addModelsAsFrames(to_visualize_pdb)
style = {'cartoon': {'colorscheme': {'prop': 'b', 'map': color_map}}}
if show_sidechains:
style['stick'] = {}
view.setStyle({'model': -1}, style)
view.zoomTo()
grid = GridspecLayout(1, 2)
out = Output()
with out:
view.show()
grid[0, 0] = out
out = Output()
with out:
plot_plddt_legend().show()
grid[0, 1] = out
display.display(grid)
# Display pLDDT and predicted aligned error (if output by the model).
if pae_outputs:
num_plots = 2
else:
num_plots = 1
plt.figure(figsize=[8 * num_plots, 6])
plt.subplot(1, num_plots, 1)
plt.plot(plddts[best_model_name])
plt.title('Predicted LDDT')
plt.xlabel('Residue')
plt.ylabel('pLDDT')
if num_plots == 2:
plt.subplot(1, 2, 2)
pae, max_pae = list(pae_outputs.values())[0]
plt.imshow(pae, vmin=0., vmax=max_pae, cmap='Greens_r')
plt.colorbar(fraction=0.046, pad=0.04)
# Display lines at chain boundaries.
best_unrelaxed_prot = unrelaxed_proteins[best_model_name]
total_num_res = best_unrelaxed_prot.residue_index.shape[-1]
chain_ids = best_unrelaxed_prot.chain_index
for chain_boundary in np.nonzero(chain_ids[:-1] - chain_ids[1:]):
if chain_boundary.size:
plt.plot([0, total_num_res], [chain_boundary, chain_boundary], color='red')
plt.plot([chain_boundary, chain_boundary], [0, total_num_res], color='red')
plt.title('Predicted Aligned Error')
plt.xlabel('Scored residue')
plt.ylabel('Aligned residue')
# Save the predicted aligned error (if it exists).
pae_output_path = os.path.join(output_dir, 'predicted_aligned_error.json')
if pae_outputs:
# Save predicted aligned error in the same format as the AF EMBL DB.
pae_data = notebook_utils.get_pae_json(pae=pae, max_pae=max_pae.item())
with open(pae_output_path, 'w') as f:
f.write(pae_data)
# --- Download the predictions ---
!zip -q -r {output_dir}.zip {output_dir}
files.download(f'{output_dir}.zip')
From: Augustin Zidek ***@***.***>
Sent: Wednesday, August 3, 2022 10:40 AM
To: deepmind/alphafold ***@***.***>
Cc: Belfield ***@***.***>; Comment ***@***.***>
Subject: Re: [deepmind/alphafold] UTF-8 locale (Issue #483)
Could you try adding this code just before the problematic line is called?
import os
del os.environ['LC_ALL']
—
Reply to this email directly, view it on GitHub<#483 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AZ36DEL7JVXZPA2L6SEBYITVXI46JANCNFSM5XESGBTQ>.
You are receiving this because you commented.Message ID: ***@***.******@***.***>>
|
This is more of a Google Colab issue when trying to create the output_dir zip file from the output_dir folder. I solved this by using shutil to create the zip file. I added this fix to a pull request here: #672 |
This problem is caused by Python somehow being switched from using the default UTF-8 text encoding to ANSI_X4.3-1968 (the technical name for ASCII text encoding). Google Colab shell magic (leading "!" to run shell commands in Python scripts) gives the reported error if the encoding is not UTF-8. The switch from UTF-8 to ASCII encoding happens when OpenMM energy minimization is run by AlphaFold. I am not sure how OpenMM causes that switch. Usually the encoding is controlled by environment variables such as LANG or LC_ALL and the settings of those are not changed when the error happens. This bug has been reported for ColabFold run on Google Colab and also for ChimeraX AlphaFold predictions run on Google Colab https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/8313 I debugged the ChimeraX case, but was not able to find the underlying cause. The Python method locale.getpreferredencoding() is giving ANSI_X4.3-1968 when the error occurs but UTF-8 when there is no error. That Python routine uses _locale.nl_langinfo(_locale.CODESET) in Python 3.8 which is a call into C code that uses the nl_langinfo(CODESET) C library call. I did hours of testing and could not figure out why the C library call is not reporting UTF-8. Details are in the above ChimeraX ticket. Ultimately I put in a very ugly work-around monkey patching _locale.nl_langinfo(CODESET) to report UTF-8 in the ChimeraX AlphaFold code, a horrible solution. The suggested fix in the Dec 30, 2022 comment by gmihaila of replacing the !zip shell magic works the first time AlphaFold is run. But there are other uses of shell magic that then break if you do another run in the same Google Colab session. Also another run will create output files and those will have default ASCII encoding which will cause failures (e.g. in ColabFold when it tries to write out citations with non-ascii characters.) A real fix will need to figure out how the text encoding is being changed or how to reset it to be UTF-8 after the OpenMM minimization changes it. |
Fixed in 0d9a24b. Thanks for reporting! |
Hello,
I am running an AlphaFold run on a favoenzyme, photolyase. I have had a few successful runs of the program, but suddenly it stopped working and keeps giving me this error whenever I try to run it:
If anyone can provide context or an explanation on how to fix this (I am very much NOT very knowledgable on coding and the nitty gritty details of AlphaFold), that would be much appreciated :)
Thank you for your time in advance guys!!
Best wishes,
Jared
The text was updated successfully, but these errors were encountered: