What is DVC?
🔗 Git for Data (What is DVC?)
Data Version Control (DVC) is an open-source tool for data science and machine learning teams to manage datasets, ML models, and experiments in Git. Key parts include:
DVC was created in 2017 to address gaps in ML tools, and has evolved into a successful open source project with 150+ contributors and thousands of users.
Some interesting highlights from the community:
Recently, version 1.0 was released. DVC 1.0 is inspired by discussions and contributions from our community of data scientists, ML engineers, developers and software engineers. Read up on new features, like data visualization and data transfer optimizations, in our release blog post.
Today, the project remains under active development. Now that the data management layer has reached a stable form, the DVC team is focusing on the data scientist’s experience. Our goal is to become Git for ML - a holistic tool to capture the ML experiments lifecycle following a Git-like philosophy. That means, no complicated infrastructure, databases, or dependencies on third-party external APIs.
We welcome contributors from different backgrounds and levels of experience! We’ll be happy to guide and help with the contribution - either to the core project, documentation, tutorials, or blog. We’ve participated in the programs like Google Season of Docs (similar to Google Summer of Code) and have substantial experience guiding and mentoring folks who do their first contributions. As well as we have an established community of experienced contributors.
DVC is written in Python. It’s a command line tool that deals with large files and Git internals (among other things) but is built for ML engineers and data scientists in mind as end users. You may be a great match for us if you want to see and learn software engineering best practices in a mature project, and if you’re curious how ML teams operationalize their workflow.
The best way to start is to check the project’s issue tracker (or this one for the website and docs). There are usually plenty of low hanging fruits, which are tagged with “first good issue” and “hacktoberfest” labels. We recommend getting in touch with us about the issue (to confirm the details, get help, etc.) in one of the support channels or in the #tool_dvc channel in ODS.