Skip links

Moving from the city & working remotely @ Scraping hub

6 months of working remotely at scrapinghub.com

A big thanks to Ian Duffy for allowing us to post this post here so that all case studies can be in one place, and we can raise awareness of this form of employment with our local communities. You can follow him on Twitter here.
lifestyle

ScrapingHub is a distributed company that builds tooling and a platform to extract data from the web. The company was incorporated in Ireland in 2010, from day 1 it was a distributed team with many staff in Uruguay and other parts of the world. Since then the company has grown and now has a team of 180 people located all around the world, last year the companies revenue was 12 million. I joined in March 2019, it was my first time to work remotely. Below I reflect on my past 6 months, the remote experience and some of the things I’ve been working on.

What is Scrapinghub?

The mission is “to provide our customers the data they need to innovate and grow their businesses.” The founders of the company are the creators of Scrapy, a very popular open source and collaborative framework for extracting data from the web. ScrapingHub started as a platform for running Scrapy spiders. It has since grown to include three more products:

  • Crawlera: A specially designed proxy for web scraping to ensure you can crawl quickly and reliably
  • Splash: A headless browser to enable customers to extract data from JavaScript websites
  • AutoExtract: delivers next-generation web scraping capabilities backed by an AI-enabled data extraction engine. This enables customers to crawl many websites without needing to write custom CSS and XPath selectors for each one.

What is your role?

I am a DevOps engineer on the AutoExtract product. I work closely with two backend developers to deliver an API that enables customers to access our AI-enabled data extraction engine. I’m dedicated to providing infrastructure services for the team, this includes things like compute, deployment mechanisms, monitoring, alerting and logging, and so on.

What is your background and what brought you to ScrapingHub?

I graduated from Computer Applications at Dublin City University 5 years ago. During my time in college, I had a great interest in infrastructure and automation. Upon graduating, I worked as a Cloud Engineer in a bank for a year before joining an ecommerce company under the title of Software Engineer.

At this company, I worked on a team that acquired data from the web. However, we never neatly solved the problem, our solution would never scale. This was due to the initial work to create XPaths and CSS selectors along with the ongoing maintenance and monitoring to ensure the sites haven’t changed and these are now broken. During my time in this role, I had heard of ScrapingHub as they are one of the leaders in the web scraping space. It was always a company I was curious to know more about and it was unusual to hear of an Irish company that is distributed.

After this, I was employee number one an Irish Start-Up, again with the title of Software Engineer. As you can imagine in a Start-Up environment there is lots of work to do and the role was a diverse one that included many different aspects.

My journey to ScrapingHub began with a call from a recruiter in Stelfox who told me about the role. He had my attention at the word ScrapingHub, I knew immediately this was a chance to experience remote working for the first time. During the conversation, my interest only began to grow. The recruiter described the AutoExtract product to me and I was curious to know more – they’ve figured a way to do web scraping without the need to write XPaths and CSS selectors, a problem I previously faced.

What was the interview and acceptance process like?

As ScrapingHub is a distributed company all interviews are done remotely, you do not meet a Shubber (someone who works for ScrapingHub) in person at any stage. For me, the interview process consisted of three interviews each lasting no more than 30 minutes.

My first interview was with a ScrapingHub recruiter. I got the impression that the interview focused on whether or not you and ScrapingHub are a good fit for each other. It’s a time for them to ask you questions to see if you align with their company values and for you to ask questions to see if its the right role for you.

My second interview was the hardest out of the three. It was with a member of the infrastructure team, it was very technical focused and enjoyable experience. For me, the interview was a stream of questions covering different key areas of the role. The questions started easy and then increased in difficulty. For example, the containers section started with “What flag do you pass to docker to expose a port?” and ended on “Describe to the best of your knowledge how containers work at a kernel level?”. When I failed to have a good answer for a question the interviewer was nice about it and discussed it in more detail, I liked this and I walked away from the interview having learned something.

The final interview was with the head of engineering. I believe this interview was a mix of both culture fit and technical ability. It took the form of a friendly conversation based on resume items and previous work along with a few technical questions thrown in along the way.

A couple of days later I heard back from the Stelfox recruiter to inform me that my interview had been successful. From this point on out, it was a standard procedure – receive a contract, provide personal details, etc.

Setting myself up for success

As I’m based in Ireland ScrapingHub provided me with a laptop that was shipped to my address before I was due to start. I was excited about the job and remote aspect of it. I believed having a good working area was going to be key to my success.

I spent a couple of days trawling second hand sites and researching office furniture and equipment. At the time I was living in a small 1 bedroom apartment in Dublin, Ireland and space was limited. I focused on what I felt were the key elements to making a reasonable space where I could work contently. I ended up with the following:

I’ve since made use of the remote flexibility and moved away from Dublin, Ireland to a bigger property in Cork, Ireland. In doing so I’ve improved my workspace by having a dedicated office with a standing desk and manymanymanymany cable management aids.

The first week

Three days before I was due to start I received an email containing my staff credentials. This introduced me to some of the tools used for management and communication:

  • JIRA – For managing work and tracking time.
  • Confluence An internal knowledge base for the team to collaboratively maintain.
  • GSuite – Email, calendar, and meetings.
  • Github – Git repositories
  • BambooHR – HR System

Before officially starting I signed into Slack and received a warm welcome from all my future co-workers. I also signed into my GSuite account where my calendar revealed a hint of what the first few days were going to be like.

One of my major concerns early on was keeping to a routine. To help with this I decided to continue waking up at the same time as I did in previous jobs. I wanted to use the time I saved by having no commute to complete any tasks that involved leaving the house. For example, on day 1 I did the grocery shopping for the week.

With the groceries unpacked and put away, it was time to get to work. My first morning started at 9.30am, it consisted of HR introducing me to my manager. He told me a little bit about the team and followed up with inviting me to team-specific slack channels and scheduling introductions with the team.

After this at 10 am was a general company HR onboarding session. New hires are brought on in batches so I wasn’t alone on these calls, for me there were 3 other new hires on the call. Immediately after this, we had a 1:1 call with HR to discuss our benefits (pension, shares, holidays, health insurance, etc.).

Before finishing up for lunch we had an IT onboarding session. This meeting was to ensure we could access all the necessary systems and that we knew different company policies – How do I change my password? What do I do if I forgot my password? How do I encrypt my disk? What do I do in the event a device is stolen? And so on.

ScrapingHub is very focused on wellbeing. For March they ran an event which focused on this. As you can see from my calendar, the next meeting was “Stay sane while working remote” with another later in the week of “Mindful Meditation” both of these events were March specials. As a newbie it was interesting to experience these, it gave me a glimpse of how ScrapingHub encourages camaraderie and enables people to get to know each other outside of work topics. Outside of March this space for general chit-chat is maintained, each week there’s a standing watercooler meeting for the different time zones for people to get together and chat about absolutely anything. Additionally, each month we have “Shub talks” where people show something they’ve been working on or share some piece of knowledge – I’m pleased to say within my 6 months at the company I have had the opportunity to speak at one.

Opensource has been a big part of ScrapingHub since day 1, this is demonstrated by Scrapy being launched as an opensource project by the company back in 2009. My final meeting of the day showed how opensource is still a very big part of ScrapingHub. Shubbers are encouraged to contribute to opensource as much as possible, we are given 5 hours per week to engage in anything opensource related. Today ScrapingHub has a handful of opensource projects, check out some of the more popular ones: SplashELI5deep-deep, and SpiderMon. Additionally, ScrapingHub supports the opensource ecosystem by part-taking in Google Summer of Code.

Day two continued with more meetings. The morning section was introductions to the team and a system overview. Following these, the newbie group had an introduction to the founders of the company. By the end of the day, I had been assigned a ticket in JIRA and was working away on it.

What have you been working on?

Up until now, infrastructure at ScrapingHub ran on dedicated servers rented from Hetzner. Teams package their applications up as Docker containers and uses a ChatOps deployment mechanism to run them on Mesos. Additional services such as a database or kafka require input from the infrastructure team.

Previously to me joining the company, a decision was made to move AutoExtract onto Google Cloud Platform with their managed Kubernetes offering. The idea behind this was to be able to enable us to quickly increase compute to meet customer demand.

My main task was to make this happen. Thankfully due to the adoption of Docker and Mesos within the company this challenge I had a bit of a head start.

AutoExtract is build-up of multiple components that communicate together via Kafka. It is exposed to customers via an HTTP API. Breaking this down I established what was required to get the system running, this went as follows:

  • A mechanism to deploy and destroy a single component – Helm fits this use case nicely. I created a helm chart for each of the components. The chart is stored, versioned and released with the component’s source code. I extended the pre-existing ChatOp deployment mechanism to support communicating with helm.
  • HTTP/HTTPS connectivity to pods – Ingress Nginx solved this problem nicely. I paired it with cert manager to automatically provision SSL certificates from letsencrypt and External DNS to automatically create DNS records.
  • Monitoring of the application and cluster – Deploying Prometheus provided a straightforward way to achieve this.
  • Authentication for internal applications – OAuth2 Proxy integrated nicely with ingress nginx to provide authentication against the company’s GSuite accounts.
  • Kafka – I made use of Confluent.Cloud’s managed kafka offering for this. Allowed us to get Kafka on a consumption pricing model and avoid an ongoing maintenance overhead.

Today I’m happy to say that the AutoExtract product runs entirely on Google Cloud Platform.

What is your average week like?

Something that amazed me about working remotely is how much more productive I am. ScrapingHub enables me to have a large amount of autonomy over my work as well as my time. I am in control of how I wish to structure my day, this allows for lots of flexibility – nothing is stopping me taking an hour here and there throughout general working hours (9 – 5 pm) and replacing them with hours late into the evening.

My average week consists of a total of 3 mandatory meetings. Every Monday my manager does a 1:1 meeting with all members of the team, these last on average 15 minutes. On Tuesday we have a backend team sync up and on Thursday we have a full team sync up, each of these meetings last 1 hour and take a similar format to a daily standup with an allowance for more discussion where necessary.

For the rest of the week, I’m free to work away on whatever task has been assigned to me. Everyone is always available via Slack and when more in-depth communication is necessary ad-hoc video calls are started.

Frequently asked questions?

Do you ever find remote work isolating or miss human contact?

No, the team I’m on is very cohesive. Everyone is very supportive and has a clear invested interest in the success of the product and thus the task you may be working on.

For times when I wish to get out of the house, ScrapingHub has hotdesk spaces in WeWork in Dublin and Republic of Work in Cork. The Bank of Ireland Workbench spaces or public libraries can be useful spots to work from. For the most part, I only make use of them when I have plans to meet a friend for lunch.

In addition to this, it’s important to note that despite only being in the company for 6 months I was given the opportunity to meet all of my teammates in person along with the founders of the company and many members of the leadership team.

Do you find a cost saving in working from home?

Yes, I no longer spend any money on commuting or purchasing lunch. On average I’ve worked this out at about a €80 saving per week. In addition to this, I’ve recently moved away from Dublin and reduced my rent cost significantly.

Have you kept to your goal of wanting to keep a routine and doing something with the saved commuting time?

I would like to say yes, but sometimes extra sleep wins. I’m aiming to bring this back on track soon.

What has been your biggest personal challenge with switching to remote work?

Definitely communication, specifically video calls. I found seeing my face on screen for the first few days very odd. These days I’ve become comfortable with that and just struggle with knowing the correct time to speak or interrupt to add to something while someone else is talking.

What do you most enjoy about working remote?

I really like the hassle free element of it, I’m no longer setting alarm clocks, checking public transport times, wondering if I need to bring an umbrella today, and so on. Starting and finishing work is as simple as entering and exiting a room in my house.

How does knowledge sharing within your team occur?

For me, this is mainly done through Slack, the team are very supportive and willing to help as much as they can. I try to return this as much has I can and believe I have mostly been successful given the teams ability to adopt kubernetes.

If you’ve got any other questions feel free to reach out.

Interested in going remote with ScrapingHub?

I’ve enjoyed my time at ScrapingHub so far and I’m sure you might too. If you enjoyed my experience above and would like to try it out yourself take a look at the jobs page or reach out to a ScrapingHub Recruiter.

Leave a comment

Name*

Website

Comment