Knowledge Graphs

Research Metadata

Research metadata is very valuable for our facility. It tracks activities and data provenance in our facility. It records details of failed experiments – something we want to show users to save their time and resources. It grounds generative AI and makes it easier to produce the most relevant training sets for machine learning tools we would like to translate and deploy from clinical settings. All of this has the potential to significantly augment the efficiency and rigor of research conducted in our facility.

Challenge and Approach

Scientific research is dynamic, and that poses major challenges for data management. How do you capture and curate datasets in a consistent way when every experiment is different? We strive to do this in a way that makes the stored data FAIR (Findable, Accessible, Interoperable, and Reusable). We address these complex requirements with a novel solution: Knowledge Graphs.

Knowledge graphs interface well with traditional data storage approaches, like relational databases and spreadsheets while opening the door to some completely new ways of storing and interacting with data.

Building Knowledge Graphs

We use Neo4j and the OpenCypher query language to manage graph databases. Neo4j comes with a community of users and relevant open source tools. We construct data pipelines using Python and share them in our GitHub for transparency. Python-based data pipelines provide a common language for data science while it also eanbles working with data in basically any format, whether that is JSON, CSV, XLSX, XML or any of the variety of header files associated with DICOM, TIFF, or other imaging modalities.

Using these combined we can choose between adopting the data source’s data schema, or design our own and restructure metadata to fit our own language and relationships. Regardless, we can populate a knowledge graph with either/or and then explore and add connections to that data in semantically meaningful ways.

Get in touch and get involved

We would love to share more with you about knowledge graphs. Whether you think they might be useful in your work or are generally interested – please reach out. A couple more ways to get involved below.

Learning to See

This is being done in conjunction with our Learning to See program, with the intention of building smart imaging tools fine-tuned for images generated in our facility.

Focus Groups

As we rollout new uses for knowledge graphs, we will organize focus groups to introduce features and seek feedback from potential users. This technology has the potential to support a wide range of data needs at the KI, so if you would like to participate, let us know!