Here is my feedback after preparing and passing the AWS Database Specialty certification. There are tips about the exam but also some thoughts that came to my mind during the preparation when I had to mind-shift from a multi-purpose database system to purpose-built database services.
This exam was in beta between last December/January and then was planned for production starting April 6, 2020. I initially planned to take the exam this first day but COVID situation, and then family reasons, I had to re-schedule two times. PearsonVUE is really good for that: free re-cancel, and the ability to take the exam at home. This last point was a bit stressful for me and here is my little feedback about taking the exam from home:
- Wi-Fi: an ethernet cable is always more reliable. When working remotely, it happens that I have to re-connect or re-start the router (or ask kids to do it as they work from home as well and they know how internet access is important) but you can’t leave the room or talk during the exam.
- Room: it must be closed, and nobody enters for 3 hours. A Post-It on the door is a nice reminder for kids. I also asked them to be quiet (and this without playing Fortnite because I want full bandwidth). When working, I can put headsets to concentrate, but that’s not allowed for the exam.
- Clean desk: no paper, no second monitor,… that is not a problem. The problem is: I work in a room that is messy. For online conferences, the webcam is framed correctly to hide this. But for the exam, you have to take pictures of the room. But no worry, they are not there to judge your home and the stack of laundry to iron that is just behind 😉
- I sometimes look around or put my hand on my mouth when concentrating 🤔. That’s forbidden: they watch you by webcam and open a chat conversation to tell you to avoid that.
The best is that, even if the full score is sent a few days later, the PASS status is immediately visible. I find that extremely good during this lockdown period: I hope that my enthusiasm when passing the exam will give inspiration to kids. Achievement is a good reward and motivation to go further in life.
So, I passed the exam at the beginning of its availability and I do that on purpose. When you wait, you find a lot of “brain dumps” (aka illegal leaks) of questions all over the internet when you are looking for information, and I hate that. This lowers the value of the certification. I really don’t get the point. If someone just wants the diploma he can just design one with photoshop. It is illegal but using dumps as well. The quality, price, and recognition of certifications suffer when people are cheating.
So, very few resources available. Nothing yet on https://linuxacademy.com/ which has a good reputation for AWS Certification preparations.
Note that I’ve also followed the Exam Readiness: AWS Certified Database – Specialty and it was for me a complete waste of time.
The best I’ve found is reading the FAQs referenced at https://aws.amazon.com/certification/certification-prep/ > AWS Specialty Certifications > Database – Specialty > Read AWS whitepapers and FAQs > STUDY TIP: Focus on the following FAQs
I’ve read Vladimir Mukhin feedback: https://medium.com/@vlad_13843/aws-certified-database-specialty-unofficial-exam-guide-4e38951481f5, good list of topic but too many links to documents and videos for me. I think that if I need more than 2 or 3 days to prepare for this kind of exam, then the exam is not for me. I don’t want to get a certification based only on recently learned knowledge.
For this Database specialty, you need:
- A strong background in databases (you have to include NoSQL in the database catalog) from past experience.
- A good understanding of IT concepts: network, security, encryption, high-availability, disaster recovery (they have funny names in this cloud but the concepts are the same).
- And of course, know the AWS services, that’s the part that you can learn for the certification (I already had a good idea about Aurora and DynamoDB internal architecture).
But the best skill for these certification exams in general (those with questions and no hands-on) is logic. I took the same approach as with my Oracle certification exams before: read the question and all answers, think about what they want you to answer, think about what you would answer. Good if that matches. Then re-read all to find the words in the questions which make one answer possible or not possible. For example, many questions start with “An online gaming company…” and this is the usual example for DynamoDB. When you see Disaster Recovery then you can eliminate the Multi-AZ and focus on multi-Region. There are many questions where multiple answers are possible, but they ask you the best one, and the question should mention on which criteria: cost, availability,… And remember that the people who write the questions are proud of their product. If a new Auto-Scaling option has been recently added to a service, there is good chance that they want to be sure that you know about it. So in addition to the mentioned criteria, I implicitly add the marketing one.
Let me give another trick I’m used to with Oracle exams and which seems to work there as well. When you have answers with a shape like:
- Enable A and run X
- Enable A and run Y
- Enable A and run Z
- Enable B and run Y
There is a good chance that “Enable A and run Y” is the answer they expect. Of course, this, as the previous tips, are not an absolute truth. I use them only when, after thinking with logic, I hesitate between two answers. But stay calm: they don’t put answers to trick you but just to validate your skills.
Actually, the best I’ve read before the exam was:
On those exams, you can mark questions for review. I do not mark those where I’m sure about the answer. I don’t want to review them even when I have plenty of time. When you go back on those, you are in the mood of finding mistakes, and there’s a risk that you have doubts and change a good answer to the wrong one. Trust yourself: when you know the answer from the beginning, then it is the right answer.
For the questions where I have a doubt, I don’t waste time and mark them for review. Sometimes, another question later rings a bell and helps you for another one. I usually have 30% questions marked. I mark too many at the beginning (like not trusting myself, or being stressed by the time), and maybe at the end (when I see I have enough time to review a lot later). But I never leave a question without an answer. At the end, I review and unmark when done.
Important: being at full concentration level for 3 hours is hard. This means that there is a higher chance that your first answer is better than during the review. Change your answer only when you have new elements (like you thought about it when at another question) or because you know you didn’t spend enough time on the first pass.
That’s probably the most important if, like me, you started on databases with Oracle or other RDBMS. You can be scared when looking at the multiplication of services in AWS. They have a reason and if you understand it then it will be easier to remember what they do and how they do it. And all is in the name: Amazon Web Services.
AW[S] as Service: The right tool for the right job
The ‘S’ in AWS means ‘Service’ and it is actually the idea of MicroServices. You think of a database as the integration of CPU, Memory and Storage and you know how the database vendors try to avoid any latency between those components: memory is a shared segment attached to your session process, and when I/O is required, preference goes to direct I/O to bypass the filesystem services for performance and durability reasons. With AWS all layers are different services. Whether you run your database yourself on EC2 or it is managed with RDS, it will involve many different services: EBS mount, Aurora clustered storage, backups on S3,… In the case of Aurora, even the shared memory buffers are running separately and the database writer is behind the network layer. And in addition to those many layers, you can add some Elasticache in front, replicate with DMS, monitor with CloudWatch, audit with CloudTrail,… The philosophy is the opposite of a multi-purpose database: you build your system from many blocks.
As an illustration, look at this architecture best practice for something as simple as WordPress:
"You forgot another Auto Scaling Group, plus a Lambda function" says @bitnami.
— Corey Quinn (@QuinnyPig) April 16, 2020
We are in a completely different world than what made MySQL popular: the LAMP full-stack bundle for Web Services to keep infrastructure simple. Paradoxically, MySQL (and the MySQL upper layer in Aurora) is the database engine that is the most used in RDS.
A[W]S as Web: scale, scale, scale
The ‘W’ in AWS stands for Web. The services are accessible through the internet, which means: worldwide network with unreliable latency. When you start watching presentations about AWS and especially the NoSQL services like DynamoDB you hear things like: “SQL does not scale”, “Joins does not scale”,… They even illustrate this by mentioning that they moved out of Oracle because they had performance issue with it, and they accepted the lack of consistency because they had availability issues with it… But think about this: how many customers and how many transactions they had before they decided this move? It seems that the ‘old monolith’ database scaled very well with their growing business during years. When I read those kind of messages, I stay away and just remember what is needed to get the right answer at the exam. If you are a gaming company (that’s not a bank…) and want to store user scores (this fits to key-value) accessible worldwide with millisecond latency (this is physically incompatible with two-phase-commit) and have a very simple user case (always access top score per user) but growing business with impossible capacity planning (this fits auto-scale) then DynamoDB is the solution expected.
I think that the “scale” message from cloud vendors addressing startup companies is not really about the number of transactions or storage size. It is more about scaling an organization where a small young people team grows to multi-national. That’s the problem Amazon faced with their Oracle databases: hundreds or thousands of databases for many (micro-)services, incompatible with very short dev release schedules and growing ops infrastructure. This is what didn’t scale: the organization, not the transactions per second. But that’s not for the certification exam. For the exam you need to know about reserved, provisioned, on-demand, and auto-scaling, and serverless to fit what you know about the capacity required over time.
[A]WS as Amazon: a marketplace, from books to IT resources
The ‘A’ is for Amazon, an online marketplace for books, and then for pretty everything in a large variety, and an incredible quantity sold on Black Friday. Today, “cloud” is the selling message for startup companies, and Amazon takes a lot of examples from their own business. But don’t forget that, at that time, their startup company has grown on on-premises infrastructure. That’s the big difference between you and them: for Amazon the AWS public cloud services are running on their own datacenters. They sell some of their network/compute/storage capacity to amortize their CapExp. You go there for the opposite reason: close your datacenter and run your IT on OpEx only.
Those considerations go beyond the exam scope but thinking about this helped me in two ways:
- Every training or certification provided by a vendor contains some marketing messages (maybe not even on purpose but they are written by people within the company culture) and it is important to keep this in mind to focus on the expected answers.
- Think about uses cases like an online retail company would: “Suggesting similar products” -> Graph -> AWS Neptune. “Store the buyer’s cart” -> key/value -> DynamoDB. “Access the catalog worldwide” -> low latency eventually consistent reads -> Regions, Global, Read Replica.
If you plan to also take the Cloud Practitioner exam, I recommend passing it before, because for the Database Specialty you need to know about VPC, EC2, CloudFormation,… I’m saying that but I did the opposite with the idea that starting with the most difficult releases all the stress for the second exam 😉
When looking more deeply at NoSQL for this exam, I had very interesting discussion and you may be interested by these Twitter conversations:
When was your Oracle experience?
It has always (at least since Oracle7, 24 years ago) been immediate to add a nullable column. Same with not null columns (with default) since 12c (7 years ago): pic.twitter.com/pmoPFX9Q7o
— Franck Pachot (@FranckPachot) April 23, 2020
Ok, maybe talking about different things.
I was saying: you don't save space by normalizing one DB.
You were saying: RDBMS gives the possibility to run one multi-purpose DB to replace many purpose-build hierarchical representations and this saved disk space.
Is that correct?
— Franck Pachot (@FranckPachot) April 23, 2020
Furthermore, in 1974 Codd cites that of those 6 goals of normalization, the two most important ones are (1) separation of logical + physical layers and (2) reducing integrity issues. https://t.co/kfabNZPh7z pic.twitter.com/43HOkFgjrr
— Andy Pavlo (@andy_pavlo) April 23, 2020