Carsales drives real time data – Software – Cloud


Carsales has rolled out a project designed to raise sales leads in near-real time, by consolidating its incoming data into a single pipeline.



Carsales’ Adam Carbone.

Data engineering lead Adam Carbone told the Snowflake Data World Tour in Melbourne the company had followed a familiar evolution, ending up with multiple tech teams siloed in different parts of the business.

“These teams were building their own pipelines independently working off different data platforms, using different tools, different practices”, Carbone explained, which “led to this fairly messy looking architecture”.

The challenge Carbone had was to get leads like vehicle sourcing from dealers into Carsales’ sales platform as quickly as possible, something increasingly hampered by the complex systems the company had evolved.

Internal databases had to be processed using tools like Python, Talend or Oracle’s data integration to feed an Oracle data warehouse and an AWS S3 bucket; while internal feeds, external feeds, and data in various Google services ran through Spark and Airflow tools to reach the same destinations.

The Oracle warehouse and S3 bucket were then processed by Amazon Redshift to feed a Salesforce Marketing Cloud and a Tableau reporting system.

This led to heavily fragmented data, he said, without a “single source of truth”; moreover, his team “had to spend far too much of our time maintaining all of this, bug fixing and that kind of thing. 

“We’d really, rather be spending our time doing the fun stuff, which is building cool new things.”

After an evaluation of the existing architecture and some proof-of-concept work, Carbone said, Carsales chose Snowflake for its data platform, with the sources – databases, RabbitMQ and web analytics – landing in an S3 bucket that feeds Snowflake, which then feeds the Salesforce Marketing Cloud, Tableau, and a new self-service business intelligence and analytics capability built with ThoughtSpot.

Carsales’ partners provided another impetus for the transformation.

A team wanted to embed a Thoughtspot dashboard – “what they called a live board” – into Carsales’ dealer inventory management system called Autogate.

The old architecture couldn’t keep pace with the partner’s real-time data requirements. Since the batch pipeline was only updating every four to six hours; as Carbone’s colleague, data platforms lead Naresh Sajnani told the conference, that’s been cut down to a median six minutes’ data latency.

Rapid fault detection

Sajnani told the conference the other key capability enabled by the near-real time pipeline is rapid fault detection, because faults cause the pipeline latency to increase.

“There are a lot of moving parts, in terms of streams and tasks,” Sajnani said; “we need some sort of mechanism to track this pipeline for any failures.

Also, the previous Tableau dashboards ran on legacy platforms, he explained.

“We wanted … an automated approach to finding out if one of our systems [is] down,” Carbone said.

Sajnani’s team built an anomaly detection system it dubbed Anoma Lee, “an automated approach to finding out if one of our systems is down”, he said.

With the transformation in place, he said, the efficient integration Snowflake provided “allows us to … create an interface between Snowflake objects and an external messaging system”.

Anoma Lee provides both task monitoring and stream monitoring, and if a threshold is hit, it raises a notification in SMS and a pager message.



Source link