Changes

#89 (Apr 16, 2024 2:23:45 PM)

  1. [BETA] added the deployment instruction in the jenkins file for beta related to the creation of the action set to extend the results with the open apc transformative agreemnt file information — Miriam Baglioni / detail
  2. [BETA] added the deployment instruction in the jenkins file for beta related to the creation of the action set to include the results tagged with FoS without a doi — Miriam Baglioni / detail
  3. [DUMP] added Jenkins file for the deployment of the dumps — Miriam Baglioni / detail

#89 (Apr 16, 2024 2:23:45 PM)

  1. Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script. — Lampros Smyrnaios / detail
  2. added missing EOS — Antonis Lempesis / detail
  3. fixed typo in indicator query — Antonis Lempesis / detail
  4. fixed the result_country definition — Antonis Lempesis / detail
  5. added new orgs in monitor — antleb / detail
  6. - Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski". — Lampros Smyrnaios / detail
  7. Extended mapping of funder from crossref (#9169, #9277) and change the correspondece files for the irish fundrs (#9635). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake) — Miriam Baglioni / detail
  8. Upgrade the copying operation to Impala Cluster: — Lampros Smyrnaios / detail
  9. Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates". — Lampros Smyrnaios / detail
  10. Miscellaneous updates to the copying operation to Impala Cluster: — Lampros Smyrnaios / detail
  11. Minor updates to the copying operation to Impala Cluster: — Lampros Smyrnaios / detail

#88 (Mar 26, 2024 2:01:05 PM)

  1. Updated deployments of orcid collection from api — Sandro La Bruzzo / detail
  2. Change DNET_HADOOP_REPO_BRANCH default value to "beta" for BETA pipelines — Giambattista Bloisi / detail
  3. added deployment specs for dhp-stats-monitor-update, dhp-stats-monitor-irish, dhp-stats-hist-snaps — Claudio Atzori / detail
  4. removed deployment spec for dhp-stats-monitor-update — Claudio Atzori / detail
  5. added deployment specs for dhp-stats-monitor-irish, dhp-stats-hist-snaps — Claudio Atzori / detail

#88 (Mar 26, 2024 2:01:05 PM)

  1. Changes to indicators and funders definition — dpierrakos / detail
  2. Monitor Irish Stats WF — dpierrakos / detail
  3. Historical Snapshots Workflow — dpierrakos / detail
  4. Update buildIrishMonitorDB.sql — dpierrakos / detail
  5. fixed the result_country definition — Antonis Lempesis / detail
  6. Changes to beta db names — dpierrakos / detail
  7. Changes to indicators — dpierrakos / detail
  8. creating result_instances even when no pids exist for the instance — Antonis Lempesis / detail
  9. max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%. — Antonis Lempesis / detail
  10. Changed step16-createIndicatorsTables to use a spark oozie action instead of hive — antleb / detail
  11. Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf — Giambattista Bloisi / detail
  12. changed orcid ids to all capital — antleb / detail
  13. added 2 new institutions in monitor — antleb / detail
  14. mapping of project PIDs — Michele Artini / detail
  15. [BulkTagging] added check to verify if field is present in the pathMap — Miriam Baglioni / detail
  16. Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf — Claudio Atzori / detail
  17. using distinct apcs per publication to avoid huge sums — Antonis Lempesis / detail
  18. fixed the irish result subset — antleb / detail
  19. selecting distinct peer_reviewed — antleb / detail
  20. new plugin to collect from a dump of BASE — Michele Artini / detail
  21. comments — Michele Artini / detail
  22. updated sql query for filtering BASE records — Michele Artini / detail
  23. filter by base types — Michele Artini / detail
  24. Commit monitor-updates-wf — dpierrakos / detail
  25. code cleanup — Antonis Lempesis / detail
  26. code cleanup — antleb / detail
  27. code cleanup — antleb / detail
  28. openorgs wf updated — miconis / detail
  29. default parameters for openorgs updated — miconis / detail
  30. Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script. — Lampros Smyrnaios / detail
  31. Automatically select the ACTIVE HDFS NODE for Impala cluster, in all "copyDataToImpalaCluster.sh" scripts. — Lampros Smyrnaios / detail
  32. resolving conflicts on step16-createIndicatorsTables.sql — Claudio Atzori / detail
  33. adjusted pom files — Claudio Atzori / detail

#86 (Feb 21, 2024 2:29:49 PM)

  1. Fixed problem on missing author in crossref Mapping — Sandro La Bruzzo / detail
  2. [orcid-enrichment] change the value of parameters. — Miriam Baglioni / detail
  3. code formatting — Claudio Atzori / detail
  4. [orcid enrichment] fixed directory cleanup before distcp — Claudio Atzori / detail
  5. [graph cleaning] rule out datasources without an officialname — Claudio Atzori / detail
  6. [actiosets] introduced support for the PromoteAction strategy — Claudio Atzori / detail
  7. [actiosets] fixed join type — Claudio Atzori / detail
  8. Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup — Giambattista Bloisi / detail
  9. [graph raw] fixed mapping of the original resource type from the Datacite format — Claudio Atzori / detail
  10. fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite) — Claudio Atzori / detail

#85 (Jan 26, 2024 4:19:42 PM)

  1. added deployment procedures for biodb_aggregation, ebi_links_aggregation, pubmed_aggregation — Claudio Atzori / detail
  2. enrichment with subworkflows — Claudio Atzori / detail
  3. added workflow for updating the dedup pivot history database — Claudio Atzori / detail
  4. updated deployment spec for PROD — Claudio Atzori / detail

#85 (Jan 26, 2024 4:19:42 PM)

  1. first version of the workflow single step — Miriam Baglioni / detail
  2. adjusting workflow definition — Miriam Baglioni / detail
  3. removed not needed parameter — Miriam Baglioni / detail
  4. - — Miriam Baglioni / detail
  5. [doiboost - preprocess] remove transition to orcid preparation from sequence of steps at the beginning of the workflow — Miriam Baglioni / detail
  6. - — Miriam Baglioni / detail
  7. updated the transformation Baseline workflow to include mdstore rollback/commit action — Sandro La Bruzzo / detail
  8. updated the transformation Baseline workflow to include mdstore rollback/commit action — Sandro La Bruzzo / detail
  9. uploaded input parameters on CreateBaseline WF — Sandro La Bruzzo / detail
  10. uploaded input parameters on CreateBaseline WF — Sandro La Bruzzo / detail
  11. updated workflow for generation of Scholix Datasource's to use mdstore transactions — Sandro La Bruzzo / detail
  12. added needed parameter — Miriam Baglioni / detail
  13. - — Miriam Baglioni / detail
  14. refactoring after compiletion — Miriam Baglioni / detail
  15. added metaresourcetype to the result hive DB view — Claudio Atzori / detail
  16. added metaresourcetype to the result hive DB view — Claudio Atzori / detail
  17. adjustments for country propagation — Miriam Baglioni / detail
  18. adding the bulkTag parameter file in the folder for the oozie workflow for bulkTagging. Changes the path in the class — Miriam Baglioni / detail
  19. changed the path to the parameter file in the class for entitytoorganization propagation — Miriam Baglioni / detail
  20. added properties file in the forlder for the workflow of orcid propagation. Changes the path in the classes implementing the propagationchanged the path to the parameter file in the class for entitytoorganization propagation — Miriam Baglioni / detail
  21. changed in the classes the path for the property files for the propagation of community from project — Miriam Baglioni / detail
  22. added properties file in the forlder for the workflow of project to result propagation. Changes the path in the classes implementing the propagation — Miriam Baglioni / detail
  23. added properties file in the forlder for the workflow of result to community from organization propagation. Changes the path in the classes implementing the propagation — Miriam Baglioni / detail
  24. added properties file in the forlder for the workflow of result to community from semrel propagation. Changes the path in the classes implementing the propagation — Miriam Baglioni / detail
  25. added properties file in the forlder for the workflow of result to organization from inst repo propagation. Changes the path in the classes implementing the propagation — Miriam Baglioni / detail
  26. SparkCreateSimRels: — Giambattista Bloisi / detail
  27. Do no longer use dedupId information from pivotHistory Database — Giambattista Bloisi / detail
  28. Generate "merged" dedup id relations also for records that are filtered out by the cut parameters — Giambattista Bloisi / detail
  29. Use dedup_wf_002 in place of dedup_wf_001 to make explicit a different algorithm has been used to generate those kind of ids — Giambattista Bloisi / detail
  30. Create dedup record for "merged" pivots — Giambattista Bloisi / detail
  31. refined mapping for the extraction of the original resource type — Claudio Atzori / detail
  32. fix issue on FoS integration. Removing the null values from FoS — Miriam Baglioni / detail
  33. Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions — Giambattista Bloisi / detail
  34. [enrichment single step] refactoring to fix issue in disappeared result type — Miriam Baglioni / detail
  35. [enrichment single step] refactoring to fix issues in disappeared result type — Miriam Baglioni / detail
  36. [enrichment single step] remove parameter from execution — Miriam Baglioni / detail
  37. - — Miriam Baglioni / detail
  38. [enrichment single step] moving parameter file in correct location — Miriam Baglioni / detail
  39. [enrichment single step] adding <end> element in wf definition — Miriam Baglioni / detail
  40. increased shuffle partitions for publications in the country propagation workflow — Claudio Atzori / detail
  41. [orcid enrichment] drop paths before copying the non-modifyed contents — Claudio Atzori / detail
  42. [graph provision] obtain context info from the context API instead from the ISLookUp service — Claudio Atzori / detail
  43. code formatting — Claudio Atzori / detail
  44. [graph provision] updated param specification for the XML converter job — Claudio Atzori / detail
  45. [collection] increased logging from the oai-pmh metadata collection process — Claudio Atzori / detail
  46. [graph provision] retrieve all the context information by adding all=true to the requests issued to thr API — Claudio Atzori / detail
  47. added code of conduct and contributing files — Claudio Atzori / detail
  48. minor — Claudio Atzori / detail
  49. Update 'CONTRIBUTING.md' — Claudio Atzori / detail
  50. max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%. — Claudio Atzori / detail
  51. [collection] increased logging from the oai-pmh metadata collection process — Claudio Atzori / detail
  52. Fixed problem on missing author in crossref Mapping — Sandro La Bruzzo / detail

#84 (Dec 15, 2023 11:47:10 AM)

  1. added step resulttocommunityfromproject to the BETA deployment — Claudio Atzori / detail
  2. added deploy specs for stats_actionset, download_orcid_dump, horizontal orcid enrichment — Claudio Atzori / detail
  3. switched stage names — Claudio Atzori / detail
  4. added deployment specs for download_orcid_dump, update_actionset_statsdb, orcidEnrichment — Claudio Atzori / detail

#84 (Dec 15, 2023 11:47:10 AM)

  1. changes to use the API instead of the IS the get the information for the communities to be used during bulktagging and context propagation — Miriam Baglioni / detail
  2. refactoring — Miriam Baglioni / detail
  3. [raw graph] adopting the new COAR based vocabularies for the resource typing — Claudio Atzori / detail
  4. used the API instead of the IS for bulktagging and propagation for community through organization. Added a new propagation step for communities through projects. Still using the API and not the IS — Miriam Baglioni / detail
  5. [raw graph] WIP: mapping original resource types — Claudio Atzori / detail
  6. testing and fix some issues — Miriam Baglioni / detail
  7. new spark parrameter updated — Sandro La Bruzzo / detail
  8. [raw graph] mapping original resource types — Claudio Atzori / detail
  9. more NPE checks — Claudio Atzori / detail
  10. [graph raw] URL Validator to accept double slashes — Claudio Atzori / detail
  11. Add actionset creation for pubmed affiliations — Serafeim Chatzopoulos / detail
  12. fixing issue on propagation organization. added --config to workflow definition. added oozie_app to communtiy project — Miriam Baglioni / detail
  13. Change the description of the workflow — Serafeim Chatzopoulos / detail
  14. StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded — dpierrakos / detail
  15. Renaming input param for crossref input path — Serafeim Chatzopoulos / detail
  16. Adjust tests to new WF input params — Serafeim Chatzopoulos / detail
  17. [graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898 — Claudio Atzori / detail
  18. [graph cleaning] cleanup — Claudio Atzori / detail
  19. test for project propagation — Miriam Baglioni / detail
  20. removed not needed test class — Miriam Baglioni / detail
  21. - — Miriam Baglioni / detail
  22. refactoring and test — Miriam Baglioni / detail
  23. changing test for new implementation — Miriam Baglioni / detail
  24. refactoring — Miriam Baglioni / detail
  25. - — Miriam Baglioni / detail
  26. Clear working dir in bipranker workflow — Serafeim Chatzopoulos / detail
  27. Changes to actionsets — dpierrakos / detail
  28. Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables — Sandro La Bruzzo / detail
  29. - — Miriam Baglioni / detail
  30. Changes for tables and creation of the new indicator indi_is_result_accessible — dpierrakos / detail
  31. [graph cleaning] applying coar based vocabularies in bulk — Claudio Atzori / detail
  32. Update StatsAtomicActionsJob.java — dpierrakos / detail
  33. [graph cleaning] added cleaning for result.publisher and result.instance.license — Claudio Atzori / detail
  34. code formatting — Claudio Atzori / detail
  35. Implemented ORCID Enrichment — Sandro La Bruzzo / detail
  36. changed the parameter from production to baseURL. Fixed issue in tagging configuration — Miriam Baglioni / detail
  37. refactoring — Miriam Baglioni / detail
  38. Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped — Sandro La Bruzzo / detail
  39. added comment — Sandro La Bruzzo / detail
  40. Changed implementation of check similarity to verify exact match of name instead of the first char — Sandro La Bruzzo / detail
  41. added test — Sandro La Bruzzo / detail
  42. added instanceTypeMapping original field in the mapping of — Sandro La Bruzzo / detail
  43. added vocabulary in instanceTypeMapping for — Sandro La Bruzzo / detail
  44. removed Orcid intersection on DOIBoost — Sandro La Bruzzo / detail
  45. Added copy of the untouched entities of the graph — Sandro La Bruzzo / detail
  46. code formatting — Sandro La Bruzzo / detail
  47. Update StatsAtomicActionsJob.java — dpierrakos / detail
  48. Removed unused function — Sandro La Bruzzo / detail
  49. Changes to indicators — dpierrakos / detail
  50. Add new indicator — dpierrakos / detail
  51. New institutions added — dpierrakos / detail
  52. using objectSubType as originalType in Crossref2Oaf, code formatting — Claudio Atzori / detail
  53. code formatting — Claudio Atzori / detail
  54. fixed doiboost process workflow, removed references to the ProcessORCID step — Claudio Atzori / detail
  55. Extracted the correct original type to pass to instanceTypeMapping in Crossref Mapping — Sandro La Bruzzo / detail
  56. code formatting — Claudio Atzori / detail
  57. [graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton — Claudio Atzori / detail
  58. avoid NPEs in Vocabulary.getTermBySynonym — Claudio Atzori / detail
  59. avoid NPEs — Claudio Atzori / detail
  60. avoid NPEs — Claudio Atzori / detail
  61. [bulktagging] fixed workflow parameters — Claudio Atzori / detail
  62. [community_organization propagation] fixed workflow parameters — Claudio Atzori / detail
  63. added serialization for the new fields imported for the Irish tender — Claudio Atzori / detail
  64. [dedup] added isLookupUrl to the graph consistency workflow definition, required now by the entity grouping phase — Claudio Atzori / detail
  65. [orcid enrichment] fixed workflow definition — Claudio Atzori / detail
  66. [bulktagging] setting first step of bulktaggin as the copy of the entities and relations not involved in the tagging' — Miriam Baglioni / detail
  67. [community_result_propagation] adjusting starting poit of workflow — Miriam Baglioni / detail
  68. [enrichment] passing the community API base URL — Claudio Atzori / detail
  69. logging typo — Claudio Atzori / detail
  70. [graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum — Claudio Atzori / detail
  71. code formatting — Claudio Atzori / detail
  72. [graph provision] added tests for the new model fields — Claudio Atzori / detail
  73. [cleaning] allow enriched orcids to pass the cleaning, rule out non-orcid author pids — Claudio Atzori / detail
  74. code formatting — Claudio Atzori / detail
  75. [graph provision] added tests for new peerreviewed field — Claudio Atzori / detail