top of page

Data Curation Workbench

June 21, 2023 

The ETC is seeking information from companies interested in developing a system that impacts current challenges with managing data including, but not limited to a) accessing internal and 3rd party data; b) leveraging "“big data ontologies” to relate disparate data sets"; c) Build reports, dashboards or other “artifacts” which advance the scientific process; d) Build data integrity into “artifacts” that ease the burden for emergency data integrity reviews; and e) Notify artifact creators and other pertinent contributors of data changes. 

Download the Request for Proposal and submit your response.

RFP ISSUED:  August 21, 2023 

QUESTIONS on RFP DUE to ETC:  September 6, 2023

RFP RESPONSES DUE to ETC:  September 27, 2023

EXTENSION for RFP Responses: December 1, 2023 

Q&A - Note see FAQ document for answers to common questions

Updated Aug 31, 2023

  • Potential sponsors are listed but are these companies for which the solution may be provided as part of an already approved project? 

    • The project is approved in the sense there is a critical mass of ETC members interested in the project to warrant the release of the RFP but funding has not been committed yet.  The way that ETC works is our Working Groups identify project ideas and when there is critical mass of companies interested in a particular project, an RFI/RFP is released to ultimately select a collaborator and gather information as to what will be required to deliver the project.  Once the collaborator is selected, the project will be fully scoped and a Statement of Work (SOW) prepared outlining the cost to ETC, timeline, deliverables.  The SOW is then presented to the ETC Board of Directors to approve the project via securing of individual company resources (e.g., funding, SMEs, etc.). ETC has a fall and spring funding cycles where SOWs are presented to the Board.

  • Is this RFP for a solution set that will be developed for ETC as a view into the ‘art of the possible’? 

    • The ultimate goal is the solution will be developed into a commercial product offering that can be purchased from the collaborator.  An RFP respondent could take the approach of demonstrating the ‘art of the possible’ as a proof of concept, first step towards a commercial solution.  ETC will work with the selected collaborator to finalize the project scope.

  • Data Lake Access: Should we assume that we will have access to ETC's existing data lakes for the purpose of this project? Alternatively, would you prefer that we set up our own data lake that simulates those commonly used by our pharmaceutical partners? 

    • The ETC doesn't have a data lake.  It wouldn't be feasible to access member company data lakes and so for purposes of this pilot we will need vendors to supply or simulate a data backend that we can provide sample data to.

  • Tenant Configuration: Could you please specify your preferences regarding tenant configurations? Would ETC prefer a single tenant per partner setup, or are you open to a multi-tenant configuration? 

    • This is a great question.  We will have to decide as a work group which path to take.  Single tenant might be preferable, but we would need to verify that it wouldn't break any confidentiality rules.

  • What is the frequency of the data capturing from different data sources as process equipment’s, scientific equipment’s etc.? For what duration this data needs to be historized (5years/ 10 years / 15 years)? 

    • This would vary in frequency, we would expect some use cases to sweep data near real-time.  For purposes of this evaluating data would only need to be retained for the pilot period.  Data should be destroyed after the pilot ends.

  • Are there specific security measures or compliance standards that must be adhered to in handling and managing data? 

    • Initially basic access control and privacy is expected.  Some long-term, clinical or regulatory applications would look to be compliant with CFR 21 Part 11 and cGMPs

  • How do you currently address data privacy concerns within your data management processes? 

    • Data privacy is required and is implemented by all member companies.  

  • Can a bidder build customized product during the engagement or is it mandatory to have product in place first? 

    • Yes, we welcome configuration and customization options during the pilot period.  

  • We are considering to build / deliver the entire solution using AWS tech stack with accommodating custom requirements from the sponsor.  Is it allowed / acceptable? 

    • Yes, this is acceptable.  However, integration with O365 cloud products is required.

  • The Database structure figure shows a “Big Data & ML/AI Platform” component. Should the proposal also include development of such models and numerical methods or just hosting the models that will be provided by ETC partners in the Workbench? Also, would ETC provide resources (e.g. Azur, AWS) for hosting numerical solvers (e.g. Python, Matlab)? 

    • ETC would not provide cloud resources.  The vendor would be expected to provide an environment where we can handle large amounts of data (likely cloud) and have the potential to develop ML/AI applications.  Mature, applicable AI/ML components are not necessarily expected.  We are looking for an expandable platform that can grow as our business needs evolve.

  • The Deliverables asks for “The system must meet all regulatory requirements for security and audit trails section”. Does the “Regulatory Requirements” means compliance with cGMP data integrity protocol? Will the Data Curation Workbench be used for cGMP QMS?

    • Security, audit trails and data integrity are extremely important to member companies.  While it may not be necessary to show full maturity in this space during the pilot, it will be necessary to provide all these things for production solutions.  I say this, with the exception of cGMPs, as all member companies would not need cGMP compliance in Research and Early Development.  

  • What are the data sources in scope? Could you share an inventory?

    • Data Sources could be categorized as: 1) MS Office Internal Content (typically Excel)  2) 3rd Party Content (PDF, or O365) 3) Data Lake(s) or other company specific legacy solutions

  • How many ELN systems would need to be connected to this workbench?

    • This is dependent on member companies and their existing ecosystem.  

  • What are some of the wearable device

    •  The group has discussed several devices such as Realwear, Vuzix and HaloLens.

  • In the section 2.3.1 #19, could you share which wearable devices are in scope? Would the WG share sample data with the ZS team for reference during the engagement?

    • Answered above.  Yes we can share examples should we proceed.

  • Is there a preference for a tool for dashboards?

    •  No

  • Which different ERP modules are we supposed to expect (creating a unified tool for different ERPs will be challenge, we might need an intermediary to collect and contextualize the data, need to understand this requirement)

    •  I don't believe we have a requirement to integrate with any ERPs.  We are focused on Lab Data

  • In section 2.3.1 #20, could you please elaborate on the functionalities needed for note taking? Is this for adding notes to the experimental data stored in the data lakes?

    • Robust audit trails capable of establishing a chain of custody on data, electronic signatures, multi-user notebooks (i.e. two or more scientists collaborating) are just a few.

ETC TEAM GENERAL NOTE - Overall, please don't feel the need to check every box.  We are looking for ideas on how to solve these problems and are purposely not being prescriptive.  Thanks!

Previous Post

Improving IVIVC

bottom of page