Hello,
I am getting a strange error while running demix for my wastewater samples. First I ran freyja variants with a bam file were the primers were supposed to be trimmed using Dagon at Illumina basespace, and them demix without any problems. Looking at fastq QC results, with a very similar noise pattern in base composition in the first 20-22 bp in several samples, I realized that maybe the primers were not trimmed, so I trimmed them and all go well until demix, when I stared getting the key error below. Freyja was installed through conda with python3.10 on a CentOS server. The files that worked (EGP7*, without ivar primers timming) and that caused the error (None*) are attached at the end.
(freyja-env) [ricardo@sgi ~/SARS-COV-2/EG-P7]$ freyja demix None_var.txt None.txt
building mix/depth matrices
Traceback (most recent call last):
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 1320
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/bin/freyja", line 10, in
sys.exit(cli())
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/freyja/_cli.py", line 58, in demix
covcut)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/freyja/sample_deconv.py", line 52, in build_mix_and_depth_arrays
.astype(float) for kI in muts}, name=fn)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/freyja/sample_deconv.py", line 52, in
.astype(float) for kI in muts}, name=fn)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/pandas/core/indexing.py", line 925, in getitem
return self._getitem_tuple(key)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1100, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/pandas/core/indexing.py", line 838, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1164, in _getitem_axis
return self._get_label(key, axis=axis)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/pandas/core/indexing.py", line 1113, in _get_label
return self.obj.xs(label, axis=axis)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/pandas/core/generic.py", line 3776, in xs
loc = index.get_loc(key)
File "/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 1320
I checked the sample_deconv.py script, and when executing the commands with the data form the None* files, when executing line 51 it produced the same KeyError: 1320.
import pandas as pd
import re
import numpy as np
df = pd.read_csv("EGP7_trimmed_vars.tsv", sep='\t')
df_depth = pd.read_csv("EGP7_trimmed_dep.txt", sep='\t', header=None, index_col=1)
df_barcodes = pd.read_csv("/STORAGE/ricardo/anaconda3/envs/freyja-env/lib/python3.10/site-packages/freyja/data/usher_barcodes.csv", index_col=0)
df['mutName'] = df['REF'] + df['POS'].astype(str) + df['ALT']
df = df.drop_duplicates(subset='mutName')
df.set_index('mutName', inplace=True)
keptInds = set(muts) & set(df.index)
mix = df.loc[list(keptInds), 'ALT_FREQ'].astype(float)
mix.name = "EGP7_trimmed"
depths = pd.Series({kI: df_depth.loc[int(re.findall(r'\d+', kI)[0]), 3]
... .astype(float) for kI in muts}, name=fn)
Traceback (most recent call last):
File "/STORAGE/ricardo/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 1320
Changing the line from:
depths = pd.Series({kI: df_depth.loc[int(re.findall(r'\d+', kI)[0]), 3]
.astype(float) for kI in muts}, name=fn)
to, as the mutations are already defined in the mix file) :
depths = pd.Series({kI: df_depth.loc[int(re.findall(r'\d+', kI)[0]), 3]
.astype(float) for kI in mix.index}, name=fn)
made it work for the None* files, but when I ran freyja demix with this modification in the files EGP7*, the results, although similar, were not exactly the same, so I am not sure if the results after the modification can be trusted.
These are the results without the modification:
EGP7_f.tsv
summarized [('Omicron', 0.4644832999962632), ('Gamma', 0.3888890000000889), ('BA.5* [Omicron (BA.5.X)]', 0.08097169999519067), ('Other', 0.06465799598341625)]
lineages P.1 BA.2.23 XAH BA.5 BA.5.2.26 BF.6 BA.5.2.1 B.1.401 B.1.67 B.1.395 B.1.609 B.1.1 B.1.399 B.1 B.1.1.189 B.1.1.148 B.1.1.210 B.1.416.1 B.1.1.154 B.1.1.125
abundances 0.38888900 0.35424400 0.11023930 0.03866400 0.02592590 0.00819090 0.00819090 0.00519302 0.00519302 0.00519302 0.00519302 0.00519302 0.00519302 0.00519302 0.00519302 0.00519302 0.00519302 0.00519302 0.00409836 0.00343643
resid 15.014286271912404
coverage 73.62137578169414
And these are the results after the modification:
EGP7_f.tsv
summarized [('Omicron', 0.5301393001361678), ('Gamma', 0.3888889998310014), ('BA.5* [Omicron (BA.5.X)]', 0.07309767998754463), ('BQ.1* [Omicron (BQ.1.X)]', 0.0026041700001017457)]
lineages P.1.7 P.1.7.1 XV XJ XAC XAE XBE BF.11.5 BF.11.3 BF.11.4 BF.11 BF.11.1 BF.11.2 BF.24 BQ.1.4
abundances 0.19444450 0.19444450 0.17830900 0.17830900 0.08676065 0.08676065 0.02592590 0.00705128 0.00705128 0.00705128 0.00705128 0.00705128 0.00705128 0.00486408 0.00260417
resid 12.304529675192697
coverage 73.62137578169414
Thank you very much for your time,
Ricardo
EGP7_dp.txt
EGP7_var.txt
None_var.txt
None.txt