Skip to main content

With years tucked under my belt in cloud data, I've seen and evaluated many platforms. This review of Databricks cloud data software aims to provide you with a clear picture, presenting its ins and outs. By the end, you'll have the knowledge to determine if it's the right fit for your needs.

Review of Databricks showing TensorBoard UI ins AWS
The screenshot shows how the TensorBoard UI started in a populated log directory when integrating the Databricks platform with AWS.

Databricks Product Overview

Databricks provides a unified analytics platform that accelerates innovation by unifying data science, engineering, and business. Targeted at data professionals, it streamlines workflow processes and reduces the time needed to extract insights.

This cloud data platform addresses the disjointed nature of many data analytics pipelines, ensuring smoother transitions from raw data to actionable insights. Among its standout features are collaborative notebooks, a vast array of integration capabilities, and advanced MLflow support.

Pros

  • Collaborative Notebooks: These allow multiple users to work on data simultaneously, fostering real-time collaboration and efficient data analysis.
  • Integration Capabilities: Databricks integrates smoothly with popular data storage and processing tools, reducing the friction often encountered in data processing and software development.
  • Advanced MLflow Support: Databricks offers enhanced machine learning tracking and model management, which bolsters the entire ML lifecycle.

Cons

  • Learning Curve: New users might find the platform a tad overwhelming due to its vast feature set.
  • Resource Intensity: Some tasks, especially when not optimized, can consume significant resources, affecting overall system performance.
  • Limited Customization: While Databricks offers a wealth of optimization features, it sometimes lacks deep customization options that specialized platforms provide.

Expert Opinion

Having sifted through numerous cloud data software options, I opine that Databricks holds a prominent position in the industry. It offers a blend of features that cater to both beginners and advanced users, although its pricing can be a tad steep for smaller outfits. The interface, while comprehensive, can be daunting for newcomers. However, its integration capabilities and onboarding process make up for it.

While it excels in areas like collaborative work and machine learning support, it could benefit from more customization options. In judging its capabilities, I'd recommend Databricks for larger teams and organizations where collaboration across departments is crucial.

Databricks: The Bottom Line

Databricks distinguishes itself from the myriad of data platforms through its emphasis on collaborative analytics. It bridges the gap between data professionals across various domains, ensuring that insights and analyses are truly holistic. Its integration capabilities are top-notch, ensuring that users rarely, if ever, find themselves locked out of their preferred tools. Furthermore, the advanced MLflow support stands as a testament to its commitment to staying at the forefront of machine learning advancements.

Databricks Deep Dive

Product Specifications

  1. Unified analytics platform - Yes
  2. Machine learning integration - Yes
  3. Real-time data processing - Yes
  4. Batch processing - Yes
  5. Streamlined workflow - Yes
  6. Data visualization - Yes
  7. Collaborative notebooks - Yes
  8. Scalable clusters - Yes
  9. Data versioning - Yes
  10. Managed MLflow - Yes
  11. Delta Lake support - Yes
  12. Automation of job scheduling - Yes
  13. Role-based access control - Yes
  14. Data pipeline builder - Yes
  15. Advanced search and filter - Yes
  16. Interactive dashboards - Yes
  17. Data warehousing - Yes
  18. Data science workspace - Yes
  19. Multi-language support - Yes
  20. Third-party integrations - Yes
  21. API Access - Yes
  22. Data import/export - Yes
  23. Audit logs - Yes
  24. Security protocols - Yes
  25. Customizable alerts - Yes

Feature Overview

  1. Unified Analytics Platform: Databricks combines data engineering and data science functionalities, ensuring an interconnected data ecosystem.
  2. Machine Learning Integration: It supports the full machine learning lifecycle, streamlining model creation, training, and deployment.
  3. Collaborative Notebooks: Solutions architects can work together in real time, enhancing collaboration and data analysis efficiency.
  4. Delta Lake Support: This ensures reliability and performance on big data with ACID transactions on your data lakehouse architecture.
  5. Managed MLflow: Offers a centralized repository to manage the machine learning lifecycle.
  6. Data Visualization: Built-in visualization tools allow for immediate insights, reducing reliance on third-party applications.
  7. Automated Job Scheduling: This ensures efficient resource management and timely execution of tasks.
  8. Data Pipeline Builder: Users can craft, test, and deploy data pipelines seamlessly.
  9. Role-Based Access Control: Enhanced security with granular permissions ensuring data integrity and protection.
  10. Scalable Clusters: Databricks can easily scale up or down based on workload, ensuring resource efficiency.

Standout Functionality

  1. Collaborative Notebooks: While other platforms support collaboration, Databricks’ real-time joint workspace enhances team synergy.
  2. Delta Lake Support: The integration of Delta Lake, with ACID transaction capabilities, is not as prevalent in similar software.
  3. Managed MLflow: Centralizing and enhancing the entire machine learning process sets Databricks apart from many competitors.

Integrations

Databricks offers out-of-the-box integrations with popular data sources and tools, such as AWS, Azure, and Google Cloud. Native integrations like Delta Lake, MLflow, and Redash enhance its data analytics and machine learning capabilities.

Databricks provides a robust API, allowing for custom integrations and greater flexibility in building applications. Additionally, there are numerous add-ons available to expand the platform’s capabilities.

Pricing

Databricks pricing is structured to suit various user needs.

  • Standard Tier: At $20/user/month, this offers core functionalities for teams just starting out.
  • Professional Tier: Priced at $50/user/month, it provides advanced integrations and functionalities for larger teams.
  • Enterprise Tier: “Pricing upon request”, tailored for extensive enterprise requirements, offering the complete suite of features and enhanced support.

Ease of Use

Databricks offers a user-friendly interface, but given its comprehensive suite of tools, there's an inherent learning curve. The onboarding process is detailed, ensuring users familiarize themselves with the platform. However, certain features, like setting up clusters, may pose challenges for novices.

Customer Support

Databricks provides prompt customer support across various channels like email, phone, and live chat. They have an extensive library of documentation, webinars, and tutorials. Occasionally, users have mentioned longer wait times during peak hours, but overall, the quality of support remains commendable.

Databricks Use Case

Who would be a good fit for Databricks?

Databricks fits well for large-scale enterprises and mid-sized businesses in industries like finance, healthcare, and e-commerce. Its loyal customers are data scientists and DevOps who appreciate the platform’s scalability, machine-learning capabilities, and seamless app integration potential.

Who would be a bad fit for Databricks?

Startups or small businesses with limited data warehouse might find Databricks lakehouse overwhelming and resource-intensive. Companies that require a simple, straightforward end-to-end data analytics tool without the complexities of machine learning and big data processing might find it excessive.

Databricks FAQs

Does Databricks support real-time data processing?

Yes, Databricks supports both batch and real-time data processing.

Can multiple users collaborate on a single project?

Yes, Databricks offers collaborative notebooks for real-time teamwork.

Is there support for machine learning?

Yes, Databricks integrates tools to support the entire machine-learning models.

Is there a free tier available?

No, Databricks doesn’t offer a free tier, but they do have various pricing plans to fit different needs.

Does Databricks support real-time data processing?

Yes, Databricks supports both batch and real-time data processing.

Can multiple users collaborate on a single project?

Yes, Databricks offers collaborative notebooks for real-time teamwork.

Is there support for machine learning?

Yes, Databricks integrates tools to support the entire machine-learning lifecycle.

How does Databricks handle security?

Databricks has robust security protocols, including role-based access control, audit logs, and compliance certifications.

Can I integrate third-party tools with Databricks?

Yes, Databricks offers a range of third-party integrations and also provides an API for custom integrations.

Delta Lake is a storage layer that brings ACID transactions to data lakes. Databricks integrates with Delta Lake, enhancing dataset reliability and performance.

Alternatives to Databricks

  • Snowflake: Best for businesses looking for a data platform focused primarily on data ingestion and SQL capabilities.
  • Google BigQuery: Suitable for those deeply invested in Google Cloud services and prefer seamless integration.
  • Azure Data Lake: Ideal for businesses that have their infrastructure on Microsoft's Azure platform.

Databricks Company Overview & History

Databricks offers a unified open-source analytics platform, used by organizations like Comcast, Shell, and Regeneron. This private company, headquartered in San Francisco, California, has a mission to solve the world's toughest problems using data and AI.

Founded in 2013 by the original creators of Apache Spark, Databricks has since achieved multiple milestones, like the integration of Delta Lake and reaching unicorn status in funding rounds.

Summary

After diving deep into Databricks, it’s evident that its capabilities stand out, especially for businesses that prioritize collaboration and advanced data science features. Its pricing may be on the higher side for smaller entities, but the suite of features offered can justify the cost for many. If you’ve had experience with Databricks, feel free to share your insights below.

Paulo Gardini Miguel
By Paulo Gardini Miguel

Paulo is the Director of Technology at the rapidly growing media tech company BWZ. Prior to that, he worked as a Software Engineering Manager and then Head Of Technology at Navegg, Latin America’s largest data marketplace, and as Full Stack Engineer at MapLink, which provides geolocation APIs as a service. Paulo draws insight from years of experience serving as an infrastructure architect, team leader, and product developer in rapidly scaling web environments. He’s driven to share his expertise with other technology leaders to help them build great teams, improve performance, optimize resources, and create foundations for scalability.