from google.cloud import storage
import os
# Connect to GCS
client = storage.Client.from_service_account_json("path_to_service_account.json")
# Access bucket
bucket_name = "<your_bucket>"
bucket = client.get_bucket(bucket_name)
print(f"Connected to bucket: {bucket_name}")
# List blobs
blobs = bucket.list_blobs(prefix="raw-data/")
for blob in blobs:
print(blob.name)
# Read a file content
file_path = "raw-data/sample.csv"
blob = bucket.blob(file_path)
if blob.exists():
content = blob.download_as_text()
print(content)
else:
print("File not found")
# Upload a file
upload_blob = bucket.blob("raw-data/new_file.csv")
upload_blob.upload_from_filename("local_file.csv")
print("File uploaded successfully")
Modern cloud architectures rely heavily on object storage platforms that provide durability, elasticity, and global accessibility. Azure Data Lake Storage (ADLS) offers a hierarchical namespace built on top of blob technology, enabling fine-grained access control and optimized performance for analytics workloads within the Microsoft ecosystem. It is commonly structured with logical containers and folders that support raw, refined, and curated layers, helping teams organize content according to processing stages.
Amazon S3, designed by Amazon Web Services, delivers highly durable object storage with virtually unlimited capacity. Data is stored inside buckets, and each object is identified through a unique key path. Its regional design allows organizations to choose geographic locations that meet compliance or latency requirements, while lifecycle rules help manage archival and cost optimization strategies.
Google Cloud Storage (GCS) follows a similar object-based model, organizing information within buckets that reside in specific regions, dual-regions, or multi-regions depending on availability needs. It integrates closely with analytics and machine learning services inside Google Cloud, offering consistent performance for structured and unstructured workloads alike.
Across these platforms, storage location selection depends on regulatory considerations, proximity to compute resources, disaster recovery strategy, and financial planning. Although the terminology varies—containers, buckets, or hierarchical directories—the underlying concept remains consistent: scalable object storage that separates compute from persistence, allowing enterprises to design flexible and resilient data ecosystems.