Studio Ghibli CLI Data Gem

This was the first major project I ever completed solo. I have to admit, going into this the anxiety was high. But like all things in software development, the key is to just start — pick the simplest issue you can address and go from there.

The Project Idea stage was a tough one. I had a couple of great ideas that I have been wanting to implement for personal use for a long time — the first scraping dates and show times off local independent movie theater pages and aggregating them in a central location; another doing something similar with all of the labels that I follow on bandcamp and aggregating them each week to find out what has been released. I reached out to theaters as well as bandcamp but no dice on API’s that I could easily access. The flip side of that coin was dense HTML that was gong to be really difficult scrape with minimal classes and lots of JS. So, with a heavy heart I abandoned both ideas and went searching through the intewebz for some sort of API.

The requirements for this project were fairly simple — consume and external data source, build a CLI program that went at least 1 level deep, and bundle it into a Gem. The objective was to demonstrate your knowledge of collaborating objects. I quickly found an API that had some really great data available and would make for a fun project — a Studio Ghibli API on heroku. Looked like a labor of love for a fan as some of the later pieces had a few “TODO”’s here and there, but all of the data was really clean and organized.

Day 1: I woke early, amped up on a bunch of coffee. I had decided that I would demonstrate both my knowledge of scraping and consuming an API building a 2 part app — one that dealt with the films themselves via API and another that scraped information off wikipedia for a history on the company. Straight tout of the gate I knew I could bang out the ‘About’ section. It would only end up being two objects: A Scraper and an Bio class to expose the scraped text. 5 hours in, I had my basic gem setup, my repo configured, I had banged out the bio class and build out the scraper - all I had to do was fire up Nokogiri and pull down my info from wikipedia…and then…failure. Turns out wikipedia is a nightmare to scrape. If you can find a table on a certain page, you can pretty reliably scrape the data off of that, but due to the flexibility of public contribution/editing you can’t reliably scrape all of the data out of a section without having to go back and check on it every so often. So instead, I decided to focus on MVP (minimum viable product) for the API - returning films to the user, and then showing a more elaborate detail about each film.

MVP came together really quickly. I was able to get that up and running by the time I had my initial project planning meeting the next day. But since I hadn’t been able to reliably scrape the wikipedia fan page, I was feeling like I needed to dive a little deeper to add some pizazz to this project. Turns out that adding features adds additional complexity to the architecture of your app. Even though I had correctly structured objects out of the gate, my CLI controller class became increasingly more complex and long. I found a way to reuse several blocks of code for user input sanitation, and common menus shared across views, but I definitely had to take a step back and think it all through. As a very visual person, having whiteboard to draw out the relationships for this project would have been really helpful. Another interesting issue that I ran into were the pluralization of species — I was worried that breaking convention with naming might introduce odd behavior into my app, but was pleasantly surprised that ruby didn’t really seem to care.

Once the app was up and more or less functional, it was time to test test test. TODO: maybe v 1.2 will have some tests. Running through every possible scenario with each of the menus was exhausting and extremely time consuming. I just kept a list of ‘known issues’ and ticked them off one by one. After all of the bugs had been squashed, I set out to refactor some of the long hash parsing methods I had in my API — methods for creating each set of objects once the API call was finished. They all shared a similar pattern, but had long lists of random data that they were parsing based on the API response. I remembered solving a few labs with mixed data by passing in a hash, and so I refactored these methods to do so. One interesting design choice I made here was using:

instance_variable_set("@#{k}", v) unless v.nil? vs self.send(“#{k}=",v)

in my initialize method because I only need to read data from my objects and didn’t want to expose additional data being sent from the API.

I also noticed I was making a lot of API — once every time a sub menu was selected, despite having created the objects. Passing in the data via hash allowed me to abstract 4 methods into 1, that I then moved from the API class and into each object class. So, I killed two birds with one stone — I created a create_or_find method inside each object class that called to the API for a response if no objects for that class currently existed, and restructured my initial call to these methods right at the start of my program. This saved not only on my API calls, but it dried up my API class to a short and sweet 9 lines of code, and sped up the User Experience. In my object classes, I was able to further refactor the code to be identical across all classes that I knew I could later abstract out into a Module.

But before I got to the Modules I had one final ‘feature’ on my list to build. It certainly wasn’t a requirement for this project, but by now I knew that I had structured my code in such a way that I thought I could pull it off. The detail views for each object had a list of urls I was displaying that linked to other related objects. I noticed as I was building this out that the last part of the url was actually the object’s ID. So I set out to build a find_by_id method that would allow me to search for items(s). This took a few tries. First I struggled with the regex to strip out just the ID, and then with some of the data being in an array and some just a single string. I managed to figure it out, however, and was able to abstract it down to be the same across all classes.

Finally it was time for the last bit of refactoring — the Module piece. I quickly blew through all of my testing again to make sure my app was fully functional. The Module that I built I named findable, because it consisted of four identical class methods that had to do with finding objects. My initial thought was to build a concerns folder for this Module, but I had a really hard time with the name spacing - so much so that I reached out during a CLI office hours and tried to find a work around. Eventually, I just settled with leaving the Module at the same level as the rest of the classed and just ‘extending’ my already namespaced class with that Module - fingers crossed that will be an acceptable solution.

All in all this was a really fun project. Lots of anxiety after my initial failure, but I was able to get right back on my feet and still build a pretty fun little app. If you’d like to check out my repo you can find it here

Addendum: I went back later, after brushing up on RSpec, and decided to add in some tests. At first I set out to test each Model individually. I wrote out some tests, but after an afternoon of putt-sing around I realized that the functionality that needed to be tested was actually just two things: Findable and the CLI. Findable was responsible for calling the API and then building out the objects for the tool — and if I was being honest with myself the CLI was just user input that was actually dependable on Findable. so I broke out the module testing into each Model — I was testing 3 key things: that the Model data was not persistent prior to creation, that call to find_or_create returned an array of hash objects, and lastly that the keys of the hash were getting set to the expected class methods so that the program could read the data and display it. Quickly I realized that the tests for this were a cut and paste for each and every model with the exception of a few small details. I weighed and researched building out something really abstract that you could pass in a few simple values and have it perform the same on each Model, but decided that since this was testing, for readability I would just stick to extracting out some shared methods into the spec helper.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store