Easy Read Time: 13 Minutes

Trifacta

Trifacta makes the users to proficiently discover and keep the data ready by implementing Machine learning (ML) algorithm in order to deliver an innovative UX and architecture. It empowers the era of innovative research comprising of human and computer intervention, “accessible data management”, and ML to make the data preparation procedure quick and more spontaneous. Over half of a millions of users at more than 7000 companies around the globe, that includes all the leading brands such as “Deutsche Boerse”, “Google”, “Kaiser Permanente”, “New York Life”, and “PepsiCo”, are unleashing the potential and perspective of their data by using Trifacta’s market-leading “data wrangling solutions” [1].

Trifacta has developed this data wrangling software for “data assessment and evaluation” along with automotive data preparation analysis. Company has focused on constructing such a software which can provide assistance to the organizations or even the individual users to unlock the potential of the data under client’s use. It is done by delivering an innovative approach for data exploration and data preparation analysis.

Trifacta Working Architecture

Trifacta, instead of offering mapping based ETL paradigm, provides its users to interactively assess the data quality by leveraging intelligent solutions in order to speed up the data cleaning and alteration. Its cloud based architecture gives “integrated security”, “elastic scalability” and inbuilt collaboration [2].

  • Trifacta platform collects the data from wide variety of data resources, i.e., data warehouse, data lakes, application and files.
  • It then explores and assess the data to easily identify and fix the flawed data.
  • In the next step, it cleans and transforms the data through powerful transformation tool.
  • Afterwards, results are obtained and are published to data warehouses and data lakes and then straight to applications being at any volume.
  • It operationalizes the data transformation workflows making self-driven data automation easy and simple.
  • As an ending point, the data is shared on cloud environment such that it could be assessed through the browser.

Figure 1. Trifacta’s Working Architecture for AWS Database [3]

Trifacta Funding

Trifacta platform has raised a total of $224.3M in funding with 8 rounds where the recent funding was raised on September 12, 2019 via “Series E” round. It has 27 investors in which “BMW I Ventures” and “Cathay Innovation” are the latest. Both of them make the highest investment in Trifacta [4].

Why Trifcata Raises more money?

Trifacta is raising more funds because [5]:

  • Its users are bringing up the revolution to their approaches and techniques by making use of Human and computer intervention and ML, in order to autonomously managing data pipelines.
  • It provides quick, scaled and diversified data integration by using cloud environment.
  • It implements cloud computing and AI cloud based algorithms in its data driven technology.
  • It keeps transforming its technology processes in order to bring innovation to gain the real market value to retain their customers.

Keeping in view the above mentioned business strategies, Trifacta’s “data wrangling tool” has gained more than 10,000 new customers and organizations within the past year.

Trifacta Data Ingestion

Key features of Trifecta data ingestion are:

  • Data collection from source
  • Data filtration and sanitization
  • Routing the data to one or more stores

Paxata

Paxata is an automotive “Adaptive data preparation platform” that assists business analysts in collecting, exploring, transforming and combining the data freely. It transforms and changes the raw data into informative set in order to empower the organization to be more intelligent. Paxata’s technology solution prepares the data set for ad-hoc analytics without any manual intervention or traditional dealing. This platform has been created along with the data management layer keeping up the data in the “Hadoop distributed file system (HDFS)” and a real time data preparatory engine in the pipeline [6].

Paxata Working Architecture

Paxata works as an application layer combined with HTML5 UI that is generated on a platform consisting of three layers [7]:

  • Java web service layer
  • Data preparatory engine wrapping Apache Spark along with some additional functionalities in order to optimize the performance and approachability of Spark.
  • “Data management layer” that keeps the data in the HDFS.

Figure 2. Working Architecture of Paxata and Application Layers

Automotive Data Preparation for Everyone

Paxata being the pioneer developer of automotive data preparation, offers solutions that are easy to use and are driven by smart procedures and algorithms permitting analysts to prepare, regulate, combine, clean and functionalize their data [7].

AI Assistance

Based on the data ingestion capabilities, Paxata also offers data profiling, preparation and intelligent algorithms for functionalizing and automating the workflows. It is capable of enabling “DataOps” by eliminating the requirements or need for highly technical team that manually creates the boxes and lines over the screen. In lieu of these smart and intelligent algorithms, Paxata technology user can produce a functional pipeline autonomously and without any need scarce IT resources. By implementation of Paxata technology, the obtained results are comprised of full service smart system that takes the data in, makes outlines of it, and transforms it to a neat and understandable information asset. Following all the procedures, it then automatically operationalizes the system and workflow on a single platform [8].

Enterprise-Grade Data Preparation

Paxata’s DataRobot solution offers AI platform in order to accelerate the performance together with AI intelligence system from raw data to the ROI. “Paxata Data Preparation” is strongly incorporated into the “DataRobot AI platform” that delivers “self-service data preparation” not only for expert data scientists and engineers but also for economists and business analysts and for the citizen data scientists alike [6].

Paxata Funding

Paxata platform has raised a total of $61M in funding with 5 rounds where the recent funding was raised on Nov 13, 2017 via “Corporate Round”. It has 14 investors in which Accenture and DTCP are the latest. Both of them make the highest investment in Paxata [9].

Data Retention

Data retention using this technology is possible only in exceptional circumstances. Paxata datasets can be restored after deletion from the backup but it is stored for a very short duration. The minimum retention time period or backup period is 1 to 2 days [10].

Paxata Data Analytics

Paxata provides the data analytics for up to 80% of the data using Machine learning algorithms where ML technique collects the data and then prepares, explores, cleans the raw data and combines by giving a regular shape making the data more useful and reliable [11].

Paxata Self-Driven Data Preparation Chart

C:\Users\PC\Downloads\7a0bbaf3d6a7fe7685bbdfd656fdf127.png

Figure 3. Paxata Data Preparation Chart with key differentiators [12]

Ascend.io

Acsend.io is the data engineering company that enables the data management teams to generate a self-driven data pipeline automation 7x faster with no requirement of code. It lessens the overall infrastructure expenses by 50% or more than this by providing a high “data processing efficiency” [13]. Ascend “unified data engineering platform” systemizes the operational execution, data preservation and functions of data pipelines automatically by releasing the “scarce data engineering” aptitude to encounter the alternating needs of “accelerated business transformations”.

Ascend.io is the platform for providing a quick big data generation without making use of or building the code for big data.

Ascend.io Working Architecture

Ascend.io is a “SaaS” providing platform that executes in built-in cloud environment such as AWS, Azure and GCP just like an isolated and private deployment. It’s a “micro services architecture” that is constructed on “Kubernetes & Apache Spark” consisted of numerous “cloud-hosted Kubernetes clusters” that are aimed for on-demand processing. Ascend.io architecture is not a multi-tenant architecture. Moreover, data is not shared among the tenant environments [14].

Ascend.io works as an autonomous pipeline engine that is comprised of three layers solving the three key areas of the building data pipelines [15].

  • User defined “blueprints of pipelines”
  • Deciphering the blueprints into specific tasks and infrastructures
  • Keep on maintaining the bidirectional feedback in order to fulfil the needs and requirements to make the “creation of data pipeline” happen.

Ascend.io “DataAware intelligence” monitors and maintains each kind of records of data processing, code alterations, user activities. All of these makes the data pipelines to be able to execute at ideal efficiency along with combined lineage audibility, tracking and governance.

Figure 4. Working Architecture of Ascend.io Autonomous Pipeline Engine [15]

Ingestion and Quick Data Pipeline Creation

Ascend data engineering platform has the ability to join and link to any type of data lake, database, API, warehouse or data stream without needing any code. IN lieu of no code requirement or with 95% less code, this platform can generate the data pipelines automatically by writing in Python, YAML, or SQL interchangeably [16].

Automatic Data Execution and Integration

Ascend data engineering process improves and enhances the data pipelines and architectural framework by dynamic and vibrant resourcing, intelligent tenacity and progressive deduplication. It joins and integrates the data pipelines with the Business intelligent tools, big data arrangements and notebooks that are chosen by users.

Autonomous Governance

Ascend Data pipeline system tracks the used codes, data and its users with the collection of granular observations, and reporting keeping in view the security capabilities of the system or data.

Ascend.io Funding

Ascend.io platform has raised a total of $19M in funding with 2 rounds where the recent funding was raised on May 1, 2017 via “Series A” round. It has 8 investors in which Accel and 8VC are the latest. Both of them make the highest investment in Ascend.io [17].

Data Retention

Ascend.io retains its data for 1 day (24 hours). Immediately after the relationship termination, Ascend.io deletes the data within the 7 days. However, upon request, the data can be stored or retained for 90 days [14]. Moreover, Ascend.io can deliver the “proof of deletion” of the cloud account where the data was preserved.

Ascend.io Self-Driven Data Engineering flow

Figure 5. Ascend.io Data Engineering flow with key differentiators [15]

Acceldata

Acceldata is a data management tool that measures, observes and models the data that keeps on moving constantly through complex pipelines by making use of numerous technologies, algorithms and cloud providers. This technology platform offers a single dashboard that combines the signals thru several layers of workloads, infrastructure, data, and also the usage to detect, identify, predict and fix the data and its quality issues impacting significantly to the business continuity and its outcomes [18]. It provides real time monitoring for modern data based applications.

Acceldata Working Architecture

Acceldata is “data observability” platform for technology based Analytics and AI. Acceldata consists of “set of services” that are implemented by making use of Docker. Acceldata hosts the production images on the “Docker repository” that is hosted on “Elastic container registry”. Acceldata provides assistance to every kind of distributions by deploying “custom built abstraction layer” and join them with the basic “cluster services” [19].

Acceldata Modules

Acceldata services consist of several modules:

  • Base software that includes basic connectors, Acceldata database, along with the time series database.
  • Authentication facilities
  • The alert engine
  • Log collector and the indexing facilities
  • Kafka collectors

Figure 6. Acceldata Cluster Architecture [19]

Acceldata Alert Engine Architecture

Within the Acceldata architecture, there is an alert engine tool that has many sub divisions comprising of:

  • A scheduler that triggers the assessment of alert.
  • An evaluator that takes the queries to “time series DB” and provides results to meter.
  • A meter that looks after the evaluated results and send them to “notification router” upon crossing the threshold.
  • A “notification router” routes the message to other “notification systems” on the basis of alert classification.

Figure 7. Acceldata Alert Engine Architecture [19]

Acceldata Funding

Acceldata platform has raised a total of $10.5M in funding with 2 rounds where the recent funding was raised on October 15, 2020 via “Series A” round. It has 3 investors in which “Light speed India Partner” and “Sorenson Ventures” are the latest investors. Both of them make the highest investment in Acceldata [20].

Acceldata Data Ingestion

Acceldata provides the reliable data ingestion. It tracks each component of the system comprehensively and monitors the data flow. It allows the user to observe, identify and rectify the streaming data problems. It integrates log analysis and incident management profiles [21].

Data Retention

Acceldata has the ability to retain the data for [22]:

Retention Type Retention Days
Low 1 week
Medium Up to 3 months
High Up to 1 year

Scalyr

Scalyr is a provider of “server log monitoring tool”. It is one of the pioneers who built the very first “Event data cloud”. It is used for “log and data analytics”, security, incident management, forensics, compliance, observability and also as a monitoring tool. Scalyr ingests and keeps a huge amount of “structured or unstructured machine data” stored. Scalyr architecture is adjusted for high cardinality, high data searching, and high data dimensionality and storing it at a small cost with quick and fast services. It’s “Event data cloud” delivers the fully featured and managed SaaS solution for “log analytics”. This data cloud is used by APIs in order to replace the “Elastic Search – under the hood” to empower other SaaS and statistical analytics facilities [23]. It collects logs, server metrics and useful information and passes them to the Scalyr’s server over SSL.

Scalyr Working Architecture

Scalyr’s architecture is improvised to flourish the messy and raw data and confusion at scale. The aim of its architecture is to bring the best ratio of performance/scale/cost.

Key features of Scalyr architecture are the following [24]:

  • Source-skeptic and Flexible ingestion pipeline
  • Storage and figure out layers which can scale autonomously
  • Columnar database
  • Multi-tenant calculation with horizontal arrangement
  • Innate summary provision comprised of time series
  • Scalyr UI for event data analysis and analytics
  • Programmatic APIs enabling cloud services and conventional customized applications to use Scalyr’s powerful analytics engine.

In short, Architecture of Scalyr technology can be divided into three layers or sections, i.e.,

  • Data Ingestion
  • Data analytics and storage
  • User Interface

All of these sections are combined together to form the “event data cloud”, that is used as an amenity, via APIs, in order to empower customized applications and SaaS analytics.

Figure 8. Scalyr’s Architecture [25]

Scalyr Funding

Scalyr platform has raised a total of $27.6M in funding with over 5 rounds where the recent funding was raised on May 10, 2018 via “Venture – Series Unknown” round. It has 9 investors in which “Bloomberg Beta” and “Susa Ventures” are the latest investors. Both of them make the highest investment in Scalyr [26].

Scalyr Data Ingestion

It is capable of providing over 200TB/day per customer and can scale up to petabytes. The ingestion pipeline makes data availability and access possible within or less than a second [25].

  • Scalyr agree to take all kind of “structured, semi-structured and unstructured data” from any type of digital arrangement or service.
  • Parsers excerpt structured fields from “event logs” in real time.
  • Scalyr provides backings to any event data that includes “cloud infrastructure”, “containerized applications”, “traditional servers”, and IoT endpoints.
  • Streams data from a wide-ranging shipper, queues, agents, distributed stream processing, and APIs.

Scalyr Log Retention

Scalyr has the ability to retain the data for one month (30 days). After that it automatically deletes the log data [27].

Scalyr Event Data Cloud Chart

Figure . Scalyr’s Event data cloud Chart [28]

Devo

Devo is the cloud native security analytics and logging company that provides its data the highest value by empowering operational security teams. It is the only company that offers a dominant combination of “real time visibility”, significant performance analytics, multitenancy, scalability and reduced TCO that are essential for observation, monitoring and providing safety to the business functionalities as innovative set ups or enterprises start to come up on a fast track by shifting to the cloud platform [29].

Devo Working Architecture

Devo ingesting and data storing

Devo ingests the data that is sent from any kind of data resource. The resources are configured to send the data directly to Devo on applying Devo tag to the events and launching a secure network or else the data is sent to Devo relay, that is installed in the user’s network. From here, it applies set of rules to link the Devo tags to the internal events. Afterwards, it compresses and sends the events over an encrypted network to the Devo [30].

These events are received by Devo’s “event load balancer”, where the data is decrypted and is distributed across the accessible data nodes. No ingest delay occurs while indexing and data parsing as it happens at regular fixed time intervals. As data is stored in the same format as at the receiving time, therefore parsing happens only at “query time”. The event is saved in a file by Devo domain name, and tags. It is then kept open to admit subsequent events. All event data in the data nodes is compressed at a ratio of 10:1.

Figure 10. Devo Event log Architecture [30]

Devo Funding

Devo company has raised a total of $131M in funding with over 5 rounds where the recent funding was raised on September 15, 2020 via “Series D” round. It has 7 investors in which “Insight Partners” and “Georgeon” are the latest investors. Both of them make the highest investment in Devo [31].

Why Devo Raises more money?

DEvo has raised more in the business market from the past years because users are getting what they actually want from Devo solutions.

  • By means of Devo solutions, companies executing security analytics and log management platforms are able to speed up the business intelligent data ingestion, log collection and data interpretation.
  • By means of Devo, companies do not have to struggle with scalability restrictions.
  • Price constraints has been reduced by Devo solutions.
  • Devi platform is affordable, cloud agnostic, fast and above all cloud native.

Devo Data Retention

Devo has the ability to retain the data for almost three months (90 days). Initially it was 400 days but due to licensing cost, it was then reduced [32].

Cribl

Cribl is a real time data pipeline that was launched in order to provide easiness and simplify the bug data along with log analytics at scale. This platform is run by experienced former team of Splunk, that offers the users with high level of monitoring, observation, data control and regulation and intelligence. Cribl, assists the users to lessens the low value data, provide data enhancement with more useful context, optimizes data routing, securing and providing safety to data on compliance basis and to deliver privacy mandates. In lieu of all these points, Cribl provides administrators the complete control over data in motion [33].

Cribl Architecture

Cribl Architecture includes:

  • Data Routing
  • Parsing and structuring the data
  • Eliminating the unused data and reducing it to the useful data
  • Extracting only signals through processing the log data
  • Simplifying data ingestion by collecting all the data under one platform/ tool.
  • Easily deploying the data through intuitive management interface using rich UI for configuration and monitoring.

Figure . Cribl Architecture [34]

Cribl Funding

Cribl has raised a total of $42.4M in funding with over 2 rounds where the recent funding was raised on October 15, 2020 via “Series B” round. It has 2 investors in which “Sequoia Capital” and “CRV” are the latest investors. Both of them make the highest investment in Cribl [35].

Cribl Data Retention

Crible provides a lengthy data retention time period by removing 25 to 30% data before getting licensed or ingested [34].

Summarized Technology Comparison

In this section, some significant parameters of the above mentioned tech based competitive companies have been presented in order to have an understandable and comparative analysis of each company in a summarized form:

Competitor Raised Money ($) Data Pipeline Automation Data Collection Data Ingestion Retention ETL Data Analytics
Paxata 61M Yes Yes Yes 1 day No Yes
Ascend.io 19M Yes No Yes 1 day* Yes
Acceldata 10.5M Yes Yes Yes Up to 1 year Yes Yes
Scalyr 27.6M Yes Yes Yes 30 days No Yes
Devo 131M No Yes Yes 90 days Yes Yes
Cribl 42.4M Yes Yes Yes Lengthy Retention No Yes
Trifacta 224.3M Yes Yes NA No Yes

*1 day (low retention), 1 week (medium retention), 90 days (high retention)

References

[1] “From Messy Files to Automated Analytics,” Trifacta. https://www.trifacta.com/ (accessed Dec. 08, 2020).

[2] “How It Works,” Trifacta. https://www.trifacta.com/products/how-it-works/ (accessed Dec. 08, 2020).

[3] M. E. M. K. Logistics, “Driving Analytics with AWS Database,” Trifacta. https://www.trifacta.com/solutions/aws/ (accessed Dec. 08, 2020).

[4] “Trifacta – Crunchbase Company Profile & Funding,” Crunchbase. https://www.crunchbase.com/organization/trifacta (accessed Dec. 08, 2020).

[5] Trifacta, “Trifacta Raises $100 Million to Support Explosive Growth of Data Wrangling for AI and the Cloud,” GlobeNewswire News Room, Sep. 12, 2019. http://www.globenewswire.com/news-release/2019/09/12/1914712/0/en/Trifacta-Raises-100-Million-to-Support-Explosive-Growth-of-Data-Wrangling-for-AI-and-the-Cloud.html (accessed Dec. 08, 2020).

[6] “Data Preparation Tool,” DataRobot. https://www.datarobot.com/platform/paxata-dataprep/ (accessed Dec. 07, 2020).

[7] “Technology | Paxata,” Jun. 23, 2015. https://www.paxata.com/technology/ (accessed Dec. 07, 2020).

[8] “Paxata Accelerates Enterprise Data Prep with New Intelligent Automation of Data Projects,” Mar. 18, 2019. https://www.businesswire.com/news/home/20190318005067/en/Paxata-Accelerates-Enterprise-Data-Prep-with-New-Intelligent-Automation-of-Data-Projects (accessed Dec. 07, 2020).

[9] “Paxata – Funding, Financials, Valuation & Investors,” Crunchbase. https://www.crunchbase.com/organization/paxata/company_financials (accessed Dec. 07, 2020).

[10] “[PDF] Paxata Security Overview – Free Download PDF.” https://silo.tips/download/paxata-security-overview (accessed Dec. 07, 2020).

[11] “About Paxata | Self-Service Data Preparation | Paxata.” https://www.paxata.com/company/ (accessed Dec. 07, 2020).

[12] “Self-Service Data Prep Application | Paxata | Paxata | Data science, Data, Data visualization,” Pinterest. https://www.pinterest.com/pin/60657926214018688/ (accessed Dec. 07, 2020).

[13] Ascend.io, “Ascend.io and Qubole Partnership Enables Data Pipelines with 95% Less Code.” https://www.prnewswire.com/news-releases/ascendio-and-qubole-partnership-enables-data-pipelines-with-95-less-code-301151251.html (accessed Dec. 07, 2020).

[14] “Security,” Ascend Developer Hub. https://developer.ascend.io/docs/security (accessed Dec. 07, 2020).

[15] “Self-Optimizing Pipelines: Ascend Autonomous Pipeline Engine,” Ascend.io. https://www.ascend.io/architecture (accessed Dec. 07, 2020).

[16] “Unified Data Engineering | Ascend Data Pipeline Platform,” Ascend.io. https://www.ascend.io/ (accessed Dec. 07, 2020).

[17] “Ascend.io – Funding, Financials, Valuation & Investors,” Crunchbase. https://www.crunchbase.com/organization/ascend-io/company_financials (accessed Dec. 07, 2020).

[18] “Company,” acceldata. https://www.acceldata.io/about-us/ (accessed Dec. 07, 2020).

[19] “Architecture | Acceldata Platform documentation.” https://docs.acceldata.dev/docs/gettingstarted/architecture (accessed Dec. 07, 2020).

[20] “acceldata – Funding, Financials, Valuation & Investors,” Crunchbase. https://www.crunchbase.com/organization/acceldata/company_financials (accessed Dec. 07, 2020).

[21] “Streaming Performance,” acceldata. https://www.acceldata.io/data-ingestion/ (accessed Dec. 07, 2020).

[22] “Hardware Sizing Guide | Acceldata Platform documentation.” https://docs.acceldata.dev/docs/installation/sizing (accessed Dec. 07, 2020).

[23] “Who is Scalyr?,” Scalyr. https://www.scalyr.com/company (accessed Dec. 07, 2020).

[24] “Scalyr: Column-Oriented Log Management with Steve Newman,” Software Engineering Daily, Oct. 05, 2018. https://softwareengineeringdaily.com/2018/10/05/scalyr-column-oriented-log-management-with-steve-newman/ (accessed Dec. 07, 2020).

[25] “Architecture,” Scalyr. https://www.scalyr.com/architecture (accessed Dec. 07, 2020).

[26] “Scalyr – Funding, Financials, Valuation & Investors,” Crunchbase. https://www.crunchbase.com/organization/scalyr/company_financials (accessed Dec. 07, 2020).

[27] N. Gohring, “For Scalyr, it’s about speed, scale and simplicity in log management,” p. 4.

[28] “Blazing-Fast Log Management and Observability for Modern Apps,” Scalyr. https://www.scalyr.com (accessed Dec. 08, 2020).

[29] “About Devo: The Cloud-Native Data Analytics and Security Company,” Devo.com. https://www.devo.com/about/ (accessed Dec. 08, 2020).

[30] “How Devo works.” https://docs.devo.com/confluence/ndt/the-devo-data-analytics-platform/how-devo-works (accessed Dec. 08, 2020).

[31] “Devo – Funding, Financials, Valuation & Investors,” Crunchbase. https://www.crunchbase.com/organization/logtrust-s-l/company_financials (accessed Dec. 08, 2020).

[32] T. Palmer, “ESG Technical Review: Devo Data Operations Platform.” https://www.esg-global.com/validation/esg-technical-review-devo-data-operations-platform (accessed Dec. 08, 2020).

[33] “About Us,” Cribl. https://cribl.io/about-us/ (accessed Dec. 08, 2020).

[34] “Cribl LogStream – Xiaa Solutions.” https://xiaa.co.uk/page-cribl.php (accessed Dec. 08, 2020).

[35] “Cribl – Funding, Financials, Valuation & Investors,” Crunchbase. https://www.crunchbase.com/organization/cribl/company_financials (accessed Dec. 08, 2020).