DROID / IHCC-cohorts-data-harmonization / harm-EB
No remote found
Workflow
The following workflow defines all tasks necessary to upload, preprocess, share, and map a new data dictionary.
- Upload cohort data
- Open Google Sheet
- Run automated mapping for new data dictionary
- Share Google Sheet with submitter
- Prepare data dictionary for build
- Run automated validation
- Build data dictionary
- View results
- Add data dictionary to Version Control
- Prepare git commit (click on Commit in Version menu)
- Push changes to GitHub (click on Push in Version menu), and make pull request.
- Delete Google sheet (Caution, cannot be undone)
IHCC Data Admin Tasks
Console
Action automated_mapping started at 2022-12-01T13:05:49.961Z (2022-12-01T13:05:49.961Z)
ERROR: Exit code 2
$ make -f Makefile automated_mapping
make cogs_pull
make[1]: Entering directory '/workspace'
cogs fetch
cogs pull
make[1]: Leaving directory '/workspace'
cp build/terminology.tsv templates/cogs.tsv
make build/suggestions_cogs.tsv
make[1]: Entering directory '/workspace'
python3 src/mapping-suggest/id-generator-templates.py -t templates/cogs.tsv -m build/metadata.tsv
Generating IDs for data dictionary: EB
python3 src/mapping-suggest/mapping-suggest-zooma.py -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -o build/intermediate/cogs_mapping_suggestions_zooma.tsv
Zooma config: {'zooma_annotate': 'http://192.168.0.199:8009/zooma/v2/api/services/annotate?propertyValue=', 'oxo_mapping': 'https://mapping.ihccglobal.app/api/mappings?fromId=', 'ols_term': 'https://registry.ihccglobal.app/api/terms?iri=', 'ols_oboid': 'https://registry.ihccglobal.app/api/terms?obo_id=', 'min_match_probability': 0.1, 'rescale_nlp_matches': {'low': 0, 'high': 0.9}, 'zooma_confidence_mappings': {'LOW': 0.51, 'MEDIUM': 0.76, 'GOOD': 0.98, 'HIGH': 1}}
Zooma matching successful. First two results:
term match confidence
0 Person GECKO:0000066 0.76
1 Person GECKO:0000055 0.76
python3 src/mapping-suggest/mapping-suggest-nlp.py -z data/ihcc-mapping-suggestions-zooma.tsv -c src/mapping-suggest/mapping-suggest-config.yml -t templates/cogs.tsv -g build/intermediate/gecko-xrefs-sparql.csv -o build/intermediate/cogs_mapping_suggestions_nlp.tsv
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_stochastic_gradient.py:173: FutureWarning: The loss 'log' was deprecated in v1.1 and will be removed in version 1.3. Use `loss='log_loss'` which is equivalent.
warnings.warn(
NLP matching successful. First two results:
term match confidence
0 ObjectiveInformation.weight CMO:0000012 0.838037
1 ObjectiveInformation.bmi CMO:0000021 0.693854
python3 src/mapping-suggest/merge-mapping-suggestions.py -t templates/cogs.tsv -s build/intermediate/cogs_mapping_suggestions_zooma.tsv -s build/intermediate/cogs_mapping_suggestions_nlp.tsv -o build/suggestions_cogs.tsv
['build/intermediate/cogs_mapping_suggestions_zooma.tsv', 'build/intermediate/cogs_mapping_suggestions_nlp.tsv']
Traceback (most recent call last):
File "src/mapping-suggest/merge-mapping-suggestions.py", line 59, in <module>
dfs["Suggested Categories"] = dfs[["confidence", "match", "match_label"]].agg(" ".join, axis=1)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py", line 7586, in aggregate
return self.apply(func, axis=axis, args=args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py", line 7768, in apply
return op.get_result()
File "/usr/local/lib/python3.8/dist-packages/pandas/core/apply.py", line 185, in get_result
return self.apply_standard()
File "/usr/local/lib/python3.8/dist-packages/pandas/core/apply.py", line 276, in apply_standard
results, res_index = self.apply_series_generator()
File "/usr/local/lib/python3.8/dist-packages/pandas/core/apply.py", line 290, in apply_series_generator
results[i] = self.f(v)
TypeError: sequence item 2: expected str instance, float found
make[1]: *** [Makefile:385: build/suggestions_cogs.tsv] Error 1
make[1]: Leaving directory '/workspace'
make: *** [Makefile:423: automated_mapping] Error 2
ERROR: Exit code 2